[jira] [Resolved] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc
[ https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37032. - Fix Version/s: 3.2.1 3.3.0 Resolution: Fixed Issue resolved by pull request 34307 [https://github.com/apache/spark/pull/34307] > Remove unuseable link in spark-3.2.0's doc > -- > > Key: SPARK-37032 > URL: https://issues.apache.org/jira/browse/SPARK-37032 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0, 3.2.1 > > > Four links is empty > !image-2021-10-18-10-48-21-437.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc
[ https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37032: --- Assignee: angerszhu > Remove unuseable link in spark-3.2.0's doc > -- > > Key: SPARK-37032 > URL: https://issues.apache.org/jira/browse/SPARK-37032 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > Four links is empty > !image-2021-10-18-10-48-21-437.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray
[ https://issues.apache.org/jira/browse/SPARK-37037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-37037: -- Description: BinaryType use `TypeUtils.compareBinary` to compare two byte array, however it's slow since it compares byte array using unsigned int comparison byte by bye. We can compare them using `Platform.getLong` with unsigned long comparison if they have more than 8 bytes. And here is some histroy about this [https://github.com/apache/spark/pull/6755/files#r32197461 .|https://github.com/apache/spark/pull/6755/files#r32197461] was: BinaryType use `TypeUtils.compareBinary` to compare two byte array, however it's slow since it compares byte array byte by bye. We can compare them using `Platform.getLong` if they have more than 8 bytes. And here is some histroy about this [https://github.com/apache/spark/pull/6755/files#r32197461 .|https://github.com/apache/spark/pull/6755/files#r32197461] > Improve byte array sort by unify compareTo function of UTF8String and > ByteArray > > > Key: SPARK-37037 > URL: https://issues.apache.org/jira/browse/SPARK-37037 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > BinaryType use `TypeUtils.compareBinary` to compare two byte array, however > it's slow since it compares byte array using unsigned int comparison byte by > bye. > We can compare them using `Platform.getLong` with unsigned long comparison if > they have more than 8 bytes. And here is some histroy about this > [https://github.com/apache/spark/pull/6755/files#r32197461 > .|https://github.com/apache/spark/pull/6755/files#r32197461] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35925) Support DayTimeIntervalType in width-bucket function
[ https://issues.apache.org/jira/browse/SPARK-35925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35925: Assignee: (was: Apache Spark) > Support DayTimeIntervalType in width-bucket function > > > Key: SPARK-35925 > URL: https://issues.apache.org/jira/browse/SPARK-35925 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > At now, width-bucket support the type [DoubleType, DoubleType, DoubleType, > LongType], > we hope that support[DayTimeIntervaType, DayTimeIntervaType, > DayTimeIntervaType, LongType] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35925) Support DayTimeIntervalType in width-bucket function
[ https://issues.apache.org/jira/browse/SPARK-35925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35925: Assignee: Apache Spark > Support DayTimeIntervalType in width-bucket function > > > Key: SPARK-35925 > URL: https://issues.apache.org/jira/browse/SPARK-35925 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > > At now, width-bucket support the type [DoubleType, DoubleType, DoubleType, > LongType], > we hope that support[DayTimeIntervaType, DayTimeIntervaType, > DayTimeIntervaType, LongType] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35925) Support DayTimeIntervalType in width-bucket function
[ https://issues.apache.org/jira/browse/SPARK-35925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429818#comment-17429818 ] Apache Spark commented on SPARK-35925: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/34309 > Support DayTimeIntervalType in width-bucket function > > > Key: SPARK-35925 > URL: https://issues.apache.org/jira/browse/SPARK-35925 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > At now, width-bucket support the type [DoubleType, DoubleType, DoubleType, > LongType], > we hope that support[DayTimeIntervaType, DayTimeIntervaType, > DayTimeIntervaType, LongType] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray
XiDuo You created SPARK-37037: - Summary: Improve byte array sort by unify compareTo function of UTF8String and ByteArray Key: SPARK-37037 URL: https://issues.apache.org/jira/browse/SPARK-37037 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: XiDuo You BinaryType use `TypeUtils.compareBinary` to compare two byte array, however it's slow since it compares byte array byte by bye. We can compare them using `Platform.getLong` if they have more than 8 bytes. And here is some histroy about this [https://github.com/apache/spark/pull/6755/files#r32197461 .|https://github.com/apache/spark/pull/6755/files#r32197461] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429812#comment-17429812 ] Apache Spark commented on SPARK-37035: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34308 > Improve error message when use vectorize reader > --- > > Key: SPARK-37035 > URL: https://issues.apache.org/jira/browse/SPARK-37035 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Priority: Major > > Vectorized reader won't show which file read failed. > > None-vectorize parquet reader > {code} > cutionException: Encounter error while reading parquet files. One possible > cause: Parquet column cannot be converted in the corresponding files. Details: > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value > at 1 in block 0 in file hdfs://path/to/failed/file > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) > at > org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) > ... 15 more > {code} > Vectorize parquet reader > {code} > 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID > 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled > (Stage cancelled) > : An error occurred while calling o362.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 > in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage > 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at
[jira] [Assigned] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37035: Assignee: (was: Apache Spark) > Improve error message when use vectorize reader > --- > > Key: SPARK-37035 > URL: https://issues.apache.org/jira/browse/SPARK-37035 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Priority: Major > > Vectorized reader won't show which file read failed. > > None-vectorize parquet reader > {code} > cutionException: Encounter error while reading parquet files. One possible > cause: Parquet column cannot be converted in the corresponding files. Details: > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value > at 1 in block 0 in file hdfs://path/to/failed/file > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) > at > org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) > ... 15 more > {code} > Vectorize parquet reader > {code} > 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID > 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled > (Stage cancelled) > : An error occurred while calling o362.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 > in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage > 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at >
[jira] [Assigned] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37035: Assignee: Apache Spark > Improve error message when use vectorize reader > --- > > Key: SPARK-37035 > URL: https://issues.apache.org/jira/browse/SPARK-37035 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > Vectorized reader won't show which file read failed. > > None-vectorize parquet reader > {code} > cutionException: Encounter error while reading parquet files. One possible > cause: Parquet column cannot be converted in the corresponding files. Details: > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value > at 1 in block 0 in file hdfs://path/to/failed/file > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) > at > org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) > ... 15 more > {code} > Vectorize parquet reader > {code} > 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID > 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled > (Stage cancelled) > : An error occurred while calling o362.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 > in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage > 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at >
[jira] [Commented] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.
[ https://issues.apache.org/jira/browse/SPARK-37036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429811#comment-17429811 ] Haejoon Lee commented on SPARK-37036: - I'm working on this > Add util function to raise advice warning for pandas API on Spark. > -- > > Key: SPARK-37036 > URL: https://issues.apache.org/jira/browse/SPARK-37036 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Pandas API on Spark has some features that potentially cause the performance > degradation or an unexpected behavior e.g. `sort_index`, `index_col`, > `to_pandas`, etc. > > We should raise the proper advice warning for those functions so that users > can adjust their pandas-on-Spark code base more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.
Haejoon Lee created SPARK-37036: --- Summary: Add util function to raise advice warning for pandas API on Spark. Key: SPARK-37036 URL: https://issues.apache.org/jira/browse/SPARK-37036 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee Pandas API on Spark has some features that potentially cause the performance degradation or an unexpected behavior e.g. `sort_index`, `index_col`, `to_pandas`, etc. We should raise the proper advice warning for those functions so that users can adjust their pandas-on-Spark code base more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429809#comment-17429809 ] angerszhu commented on SPARK-37035: --- raise a pr soon > Improve error message when use vectorize reader > --- > > Key: SPARK-37035 > URL: https://issues.apache.org/jira/browse/SPARK-37035 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: angerszhu >Priority: Major > > Vectorized reader won't show which file read failed. > > None-vectorize parquet reader > {code} > cutionException: Encounter error while reading parquet files. One possible > cause: Parquet column cannot be converted in the corresponding files. Details: > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value > at 1 in block 0 in file hdfs://path/to/failed/file > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) > at > org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) > ... 15 more > {code} > Vectorize parquet reader > {code} > 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID > 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled > (Stage cancelled) > : An error occurred while calling o362.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 > in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage > 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at >
[jira] [Updated] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-37035: -- Description: Vectorized reader won't show which file read failed. None-vectorize parquet reader {code} cutionException: Encounter error while reading parquet files. One possible cause: Parquet column cannot be converted in the corresponding files. Details: at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://path/to/failed/file at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) ... 15 more {code} Vectorize parquet reader {code} 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled (Stage cancelled) : An error occurred while calling o362.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at
[jira] [Updated] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-37035: -- Description: Vectorized reader won't show which file read failed. Ono-vectorize parquet reader ``` cutionException: Encounter error while reading parquet files. One possible cause: Parquet column cannot be converted in the corresponding files. Details: at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://R2/projects/data_notificationmart/dwd_traceid_sent_civ_first_di/tz_type=local/grass_region=TW/grass_date=2021-10-13/noti_type=AR/part-00013-22bdd509-4469-47f7-a37e-50fddd4266a7-c000.zstd.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) ... 15 more ``` Vectorize parquet reader {code 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled (Stage cancelled) : An error occurred while calling o362.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at
[jira] [Updated] (SPARK-37035) Improve error message when use vectorize reader
[ https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-37035: -- Description: Vectorized reader won't show which file read failed. was: Vectorized reader won't show which file read failed. No-vectorize parquet reader Ono-vectorize parquet reader ```cutionException: Encounter error while reading parquet files. One possible cause: Parquet column cannot be converted in the corresponding files. Details: at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://R2/projects/data_notificationmart/dwd_traceid_sent_civ_first_di/tz_type=local/grass_region=TW/grass_date=2021-10-13/noti_type=AR/part-00013-22bdd509-4469-47f7-a37e-50fddd4266a7-c000.zstd.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) ... 15 more``` Vectorize parquet reader```21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled (Stage cancelled): An error occurred while calling o362.showString.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at
[jira] [Created] (SPARK-37035) Improve error message when use vectorize reader
angerszhu created SPARK-37035: - Summary: Improve error message when use vectorize reader Key: SPARK-37035 URL: https://issues.apache.org/jira/browse/SPARK-37035 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0, 3.1.2 Reporter: angerszhu Vectorized reader won't show which file read failed. No-vectorize parquet reader Ono-vectorize parquet reader ```cutionException: Encounter error while reading parquet files. One possible cause: Parquet column cannot be converted in the corresponding files. Details: at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://R2/projects/data_notificationmart/dwd_traceid_sent_civ_first_di/tz_type=local/grass_region=TW/grass_date=2021-10-13/noti_type=AR/part-00013-22bdd509-4469-47f7-a37e-50fddd4266a7-c000.zstd.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181) ... 15 more``` Vectorize parquet reader```21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled (Stage cancelled): An error occurred while calling o362.showString.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351) at org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at
[jira] [Updated] (SPARK-37034) What's the progress of vectorized execution for spark?
[ https://issues.apache.org/jira/browse/SPARK-37034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaoli updated SPARK-37034: --- Description: Spark has support vectorized read for ORC and parquet. What's the progress of other vectorized execution, e.g. vectorized write, join, aggr, simple operator (string function, math function)? Hive support vectorized execution in early version (https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution) As we know, Spark is replacement of Hive. I guess the reason why Spark does not support vectorized execution maybe the difficulty of design or implementation in Spark is larger. What's the main issue for Spark to support vectorized execution? was: Spark has support vectorized read for ORC and parquet. What's the progress of other vectorized execution, e.g. vectorized write, join, aggr, simple operator (string function, math function)? Hive support vectorized execution in [early version|[https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution]|https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution].] As we know, Spark is replacement of Hive. I guess the reason why Spark does not support vectorized execution maybe the difficulty of design or implementation in Spark is larger. What's the main issue for Spark to support vectorized execution? > What's the progress of vectorized execution for spark? > -- > > Key: SPARK-37034 > URL: https://issues.apache.org/jira/browse/SPARK-37034 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: xiaoli >Priority: Major > > Spark has support vectorized read for ORC and parquet. What's the progress of > other vectorized execution, e.g. vectorized write, join, aggr, simple > operator (string function, math function)? > Hive support vectorized execution in early version > (https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution) > As we know, Spark is replacement of Hive. I guess the reason why Spark does > not support vectorized execution maybe the difficulty of design or > implementation in Spark is larger. What's the main issue for Spark to support > vectorized execution? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37034) What's the progress of vectorized execution for spark?
xiaoli created SPARK-37034: -- Summary: What's the progress of vectorized execution for spark? Key: SPARK-37034 URL: https://issues.apache.org/jira/browse/SPARK-37034 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.2.0 Reporter: xiaoli Spark has support vectorized read for ORC and parquet. What's the progress of other vectorized execution, e.g. vectorized write, join, aggr, simple operator (string function, math function)? Hive support vectorized execution in [early version|[https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution]|https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution].] As we know, Spark is replacement of Hive. I guess the reason why Spark does not support vectorized execution maybe the difficulty of design or implementation in Spark is larger. What's the main issue for Spark to support vectorized execution? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37002) Introduce the 'compute.eager_check' option
[ https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dch nguyen updated SPARK-37002: --- Summary: Introduce the 'compute.eager_check' option (was: Introduce the 'compute.check_identical_indices' option) > Introduce the 'compute.eager_check' option > -- > > Key: SPARK-37002 > URL: https://issues.apache.org/jira/browse/SPARK-37002 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-36968 > [https://github.com/apache/spark/pull/34235] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37033) Inline type hints for python/pyspark/resource/requests.py
dch nguyen created SPARK-37033: -- Summary: Inline type hints for python/pyspark/resource/requests.py Key: SPARK-37033 URL: https://issues.apache.org/jira/browse/SPARK-37033 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: dch nguyen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37033) Inline type hints for python/pyspark/resource/requests.py
[ https://issues.apache.org/jira/browse/SPARK-37033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429802#comment-17429802 ] dch nguyen commented on SPARK-37033: working on this! > Inline type hints for python/pyspark/resource/requests.py > - > > Key: SPARK-37033 > URL: https://issues.apache.org/jira/browse/SPARK-37033 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc
[ https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37032: Assignee: (was: Apache Spark) > Remove unuseable link in spark-3.2.0's doc > -- > > Key: SPARK-37032 > URL: https://issues.apache.org/jira/browse/SPARK-37032 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Four links is empty > !image-2021-10-18-10-48-21-437.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc
[ https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37032: Assignee: Apache Spark > Remove unuseable link in spark-3.2.0's doc > -- > > Key: SPARK-37032 > URL: https://issues.apache.org/jira/browse/SPARK-37032 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > Four links is empty > !image-2021-10-18-10-48-21-437.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc
[ https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429800#comment-17429800 ] Apache Spark commented on SPARK-37032: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/34307 > Remove unuseable link in spark-3.2.0's doc > -- > > Key: SPARK-37032 > URL: https://issues.apache.org/jira/browse/SPARK-37032 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Four links is empty > !image-2021-10-18-10-48-21-437.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc
angerszhu created SPARK-37032: - Summary: Remove unuseable link in spark-3.2.0's doc Key: SPARK-37032 URL: https://issues.apache.org/jira/browse/SPARK-37032 Project: Spark Issue Type: Improvement Components: docs Affects Versions: 3.2.0 Reporter: angerszhu Four links is empty !image-2021-10-18-10-48-21-437.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
[ https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36151: Assignee: Josh Rosen > Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release > -- > > Key: SPARK-36151 > URL: https://issues.apache.org/jira/browse/SPARK-36151 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Josh Rosen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
[ https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36151. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34306 [https://github.com/apache/spark/pull/34306] > Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release > -- > > Key: SPARK-36151 > URL: https://issues.apache.org/jira/browse/SPARK-36151 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Affects Version/s: 3.3.0 3.2.0 > Reuse CachedDNSToSwitchMapping for yarn container requests > --- > > Key: SPARK-36964 > URL: https://issues.apache.org/jira/browse/SPARK-36964 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: gaoyajun02 >Priority: Major > > Similar to SPARK-13704, In some cases, YarnAllocator add container requests > with locality preference can be expensive, it may call the topology script > for rack awareness. > When submit a very large job in a very large Yarn cluster, the topology > script may take signifiant time to run. And this blocks receiving > YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from > spark dynamic executor allocation thread, which may blocks the > ExecutorAllocationListener, and then result in executorManagement queue > backlog. > > Some logs: > {code:java} > 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation > ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 > INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error > reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures > timed out after [120 seconds]. This timeout is controlled by > spark.rpc.askTimeout at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) > at > org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745)Caused by: > java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at > org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 > more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation > ExecutorAllocationManager: Unable to reach the cluster manager to request > 1922 total executors! > 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation > ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 > INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error > reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures > timed out after [120 seconds]. This timeout is controlled by > spark.rpc.askTimeout at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) > at >
[jira] [Commented] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
[ https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429791#comment-17429791 ] PengLei commented on SPARK-36928: - working on this later > Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray > > > Key: SPARK-36928 > URL: https://issues.apache.org/jira/browse/SPARK-36928 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in > Columnar* classes, and write tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
[ https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36151: Assignee: Apache Spark > Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release > -- > > Key: SPARK-36151 > URL: https://issues.apache.org/jira/browse/SPARK-36151 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
[ https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36151: Assignee: (was: Apache Spark) > Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release > -- > > Key: SPARK-36151 > URL: https://issues.apache.org/jira/browse/SPARK-36151 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
[ https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429785#comment-17429785 ] Apache Spark commented on SPARK-36151: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/34306 > Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release > -- > > Key: SPARK-36151 > URL: https://issues.apache.org/jira/browse/SPARK-36151 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37026. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34301 [https://github.com/apache/spark/pull/34301] > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: Build, ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.3.0 > > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests
[ https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429751#comment-17429751 ] Apache Spark commented on SPARK-37031: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/34305 > Unify v1 and v2 DESCRIBE NAMESPACE tests > > > Key: SPARK-37031 > URL: https://issues.apache.org/jira/browse/SPARK-37031 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Priority: Major > > Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and > v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests
[ https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429752#comment-17429752 ] Apache Spark commented on SPARK-37031: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/34305 > Unify v1 and v2 DESCRIBE NAMESPACE tests > > > Key: SPARK-37031 > URL: https://issues.apache.org/jira/browse/SPARK-37031 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Priority: Major > > Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and > v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests
[ https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37031: Assignee: (was: Apache Spark) > Unify v1 and v2 DESCRIBE NAMESPACE tests > > > Key: SPARK-37031 > URL: https://issues.apache.org/jira/browse/SPARK-37031 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Priority: Major > > Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and > v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests
[ https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37031: Assignee: Apache Spark > Unify v1 and v2 DESCRIBE NAMESPACE tests > > > Key: SPARK-37031 > URL: https://issues.apache.org/jira/browse/SPARK-37031 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Major > > Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and > v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests
Terry Kim created SPARK-37031: - Summary: Unify v1 and v2 DESCRIBE NAMESPACE tests Key: SPARK-37031 URL: https://issues.apache.org/jira/browse/SPARK-37031 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Terry Kim Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36853) Code failing on checkstyle
[ https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shockang updated SPARK-36853: - Attachment: image-2021-10-18-01-57-00-714.png > Code failing on checkstyle > -- > > Key: SPARK-36853 > URL: https://issues.apache.org/jira/browse/SPARK-36853 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Abhinav Kumar >Priority: Trivial > Attachments: image-2021-10-18-01-57-00-714.png, > spark_mvn_clean_install_skip_tests_in_windows.log > > > There are more - just pasting sample > > [INFO] There are 32 errors reported by Checkstyle 8.43 with > dev/checkstyle.xml ruleset. > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 107). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 116). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 104). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 125). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 109). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 134). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 114). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 143). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 119). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 152). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 124). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 161). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 129). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 170). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 134). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 179). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 139). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 188). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 144). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 197). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 149). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 206). > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] > (naming) MethodName: Method name 'Once' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] > (naming) MethodName: Method name 'AvailableNow' must match pattern
[jira] [Commented] (SPARK-36853) Code failing on checkstyle
[ https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429743#comment-17429743 ] Shockang commented on SPARK-36853: -- Due to the existence of the following issue: [SPARK-37030|https://issues.apache.org/jira/browse/SPARK-37030], maven build failed in windows! I annotated the doubtful code about bash and re executed the command: {code:java} mvn -DskipTests clean install {code} !image-2021-10-18-01-57-00-714.png! For your reference, I have attached the build log. [~hyukjin.kwon] Can this issue be split into multiple subtasks? Because there are 131 errors. > Code failing on checkstyle > -- > > Key: SPARK-36853 > URL: https://issues.apache.org/jira/browse/SPARK-36853 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Abhinav Kumar >Priority: Trivial > Attachments: image-2021-10-18-01-57-00-714.png, > spark_mvn_clean_install_skip_tests_in_windows.log > > > There are more - just pasting sample > > [INFO] There are 32 errors reported by Checkstyle 8.43 with > dev/checkstyle.xml ruleset. > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 107). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 116). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 104). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 125). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 109). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 134). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 114). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 143). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 119). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 152). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 124). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 161). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 129). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 170). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 134). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 179). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 139). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 188). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 144). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 197). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 149). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 206). > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR]
[jira] [Updated] (SPARK-36853) Code failing on checkstyle
[ https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shockang updated SPARK-36853: - Attachment: spark_mvn_clean_install_skip_tests_in_windows.log > Code failing on checkstyle > -- > > Key: SPARK-36853 > URL: https://issues.apache.org/jira/browse/SPARK-36853 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Abhinav Kumar >Priority: Trivial > Attachments: spark_mvn_clean_install_skip_tests_in_windows.log > > > There are more - just pasting sample > > [INFO] There are 32 errors reported by Checkstyle 8.43 with > dev/checkstyle.xml ruleset. > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 107). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 116). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 104). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 125). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 109). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 134). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 114). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 143). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 119). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 152). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 124). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 161). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 129). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 170). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 134). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 179). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 139). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 188). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 144). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 197). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) > LineLength: Line is longer than 100 characters (found 149). > [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) > LineLength: Line is longer than 100 characters (found 206). > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] > (naming) MethodName: Method name 'ProcessingTime' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] > (naming) MethodName: Method name 'Once' must match pattern > '^[a-z][a-z0-9][a-zA-Z0-9_]*$'. > [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] > (naming) MethodName: Method name 'AvailableNow' must match pattern >
[jira] [Updated] (SPARK-37030) Maven build failed in windows!
[ https://issues.apache.org/jira/browse/SPARK-37030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shockang updated SPARK-37030: - Description: I pulled the latest Spark master code on my local windows 10 computer and executed the following command: {code:java} mvn -DskipTests clean install{code} Build failed! !image-2021-10-17-22-18-16-616.png! {code:java} Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (default) on project spark-core_2.12: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "bash" (in directory "C:\bigdata\spark\core"): CreateProcess error=2{code} It seems that the plugin: maven-antrun-plugin cannot run because of windows no bash. The following code comes from pom.xml in spark-core module. {code:java} org.apache.maven.plugins maven-antrun-plugin generate-resources run {code} was: I pulled the latest Spark master code on my local windows 10 computer and executed the following command: {code:java} mvn -DskipTests clean install{code} Build failed! !image-2021-10-17-21-55-33-844.png! {code:java} Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (default) on project spark-core_2.12: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "bash" (in directory "C:\bigdata\spark\core"): CreateProcess error=2{code} It seems that the plugin: maven-antrun-plugin cannot run because of windows no bash. The following code comes from pom.xml in spark-core module. {code:java} org.apache.maven.plugins maven-antrun-plugin generate-resources run {code} > Maven build failed in windows! > -- > > Key: SPARK-37030 > URL: https://issues.apache.org/jira/browse/SPARK-37030 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 > Environment: OS: Windows 10 Professional > OS Version: 21H1 > Maven Version: 3.6.3 > >Reporter: Shockang >Priority: Minor > Fix For: 3.2.0 > > Attachments: image-2021-10-17-22-18-16-616.png > > > I pulled the latest Spark master code on my local windows 10 computer and > executed the following command: > {code:java} > mvn -DskipTests clean install{code} > Build failed! > !image-2021-10-17-22-18-16-616.png! > {code:java} > Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run > (default) on project spark-core_2.12: An Ant BuildException has occured: > Execute failed: java.io.IOException: Cannot run program "bash" (in directory > "C:\bigdata\spark\core"): CreateProcess error=2{code} > It seems that the plugin: maven-antrun-plugin cannot run because of windows > no bash. > The following code comes from pom.xml in spark-core module. > {code:java} > > org.apache.maven.plugins > maven-antrun-plugin > > > generate-resources > > > > > > > > > > > > run > > > > > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37030) Maven build failed in windows!
[ https://issues.apache.org/jira/browse/SPARK-37030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shockang updated SPARK-37030: - Attachment: image-2021-10-17-22-18-16-616.png > Maven build failed in windows! > -- > > Key: SPARK-37030 > URL: https://issues.apache.org/jira/browse/SPARK-37030 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 > Environment: OS: Windows 10 Professional > OS Version: 21H1 > Maven Version: 3.6.3 > >Reporter: Shockang >Priority: Minor > Fix For: 3.2.0 > > Attachments: image-2021-10-17-22-18-16-616.png > > > I pulled the latest Spark master code on my local windows 10 computer and > executed the following command: > {code:java} > mvn -DskipTests clean install{code} > Build failed! > !image-2021-10-17-21-55-33-844.png! > {code:java} > Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run > (default) on project spark-core_2.12: An Ant BuildException has occured: > Execute failed: java.io.IOException: Cannot run program "bash" (in directory > "C:\bigdata\spark\core"): CreateProcess error=2{code} > It seems that the plugin: maven-antrun-plugin cannot run because of windows > no bash. > The following code comes from pom.xml in spark-core module. > {code:java} > > org.apache.maven.plugins > maven-antrun-plugin > > > generate-resources > > > > > > > > > > > > run > > > > > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37030) Maven build failed in windows!
Shockang created SPARK-37030: Summary: Maven build failed in windows! Key: SPARK-37030 URL: https://issues.apache.org/jira/browse/SPARK-37030 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.0 Environment: OS: Windows 10 Professional OS Version: 21H1 Maven Version: 3.6.3 Reporter: Shockang Fix For: 3.2.0 I pulled the latest Spark master code on my local windows 10 computer and executed the following command: {code:java} mvn -DskipTests clean install{code} Build failed! !image-2021-10-17-21-55-33-844.png! {code:java} Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (default) on project spark-core_2.12: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "bash" (in directory "C:\bigdata\spark\core"): CreateProcess error=2{code} It seems that the plugin: maven-antrun-plugin cannot run because of windows no bash. The following code comes from pom.xml in spark-core module. {code:java} org.apache.maven.plugins maven-antrun-plugin generate-resources run {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27312) PropertyGraph <-> GraphX conversions
[ https://issues.apache.org/jira/browse/SPARK-27312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-27312: -- Assignee: (was: Weichen Xu) > PropertyGraph <-> GraphX conversions > > > Key: SPARK-27312 > URL: https://issues.apache.org/jira/browse/SPARK-27312 > Project: Spark > Issue Type: Story > Components: Graph, GraphX >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > As a user, I can convert a GraphX graph into a PropertyGraph and a > PropertyGraph into a GraphX graph if they are compatible. > * Scala only > * Whether this is an internal API is pending design discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables
[ https://issues.apache.org/jira/browse/SPARK-37029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37029: Assignee: Apache Spark > Modify the assignment logic of dirFetchRequests variables > - > > Key: SPARK-37029 > URL: https://issues.apache.org/jira/browse/SPARK-37029 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.2, 3.2.0 >Reporter: jinhai >Assignee: Apache Spark >Priority: Major > > In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate > dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the > MapStatus object generated in the shuffle write phase had already generated > the BlockManagerId object according to externalShuffleServiceEnabled in the > BlockManager.initialize method. > So we don't need to judge it again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables
[ https://issues.apache.org/jira/browse/SPARK-37029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37029: Assignee: (was: Apache Spark) > Modify the assignment logic of dirFetchRequests variables > - > > Key: SPARK-37029 > URL: https://issues.apache.org/jira/browse/SPARK-37029 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.2, 3.2.0 >Reporter: jinhai >Priority: Major > > In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate > dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the > MapStatus object generated in the shuffle write phase had already generated > the BlockManagerId object according to externalShuffleServiceEnabled in the > BlockManager.initialize method. > So we don't need to judge it again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables
[ https://issues.apache.org/jira/browse/SPARK-37029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429681#comment-17429681 ] Apache Spark commented on SPARK-37029: -- User 'manbuyun' has created a pull request for this issue: https://github.com/apache/spark/pull/34304 > Modify the assignment logic of dirFetchRequests variables > - > > Key: SPARK-37029 > URL: https://issues.apache.org/jira/browse/SPARK-37029 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.2, 3.2.0 >Reporter: jinhai >Priority: Major > > In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate > dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the > MapStatus object generated in the shuffle write phase had already generated > the BlockManagerId object according to externalShuffleServiceEnabled in the > BlockManager.initialize method. > So we don't need to judge it again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36904) The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH
[ https://issues.apache.org/jira/browse/SPARK-36904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Laskowski resolved SPARK-36904. - Resolution: Invalid I finally managed to find the root cause of the issue which is {{conf/hive-site.xml}} in {{HIVE_HOME}} with the driver configured (!) Sorry for a false alarm. > The specified datastore driver ("org.postgresql.Driver") was not found in the > CLASSPATH > --- > > Key: SPARK-36904 > URL: https://issues.apache.org/jira/browse/SPARK-36904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: Spark 3.2.0 (RC6) > {code:java} > $ ./bin/spark-shell --version > > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/ > Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.12 > Branch heads/v3.2.0-rc6 > Compiled by user jacek on 2021-09-30T10:44:35Z > Revision dde73e2e1c7e55c8e740cb159872e081ddfa7ed6 > Url https://github.com/apache/spark.git > Type --help for more information. > {code} > Built from [https://github.com/apache/spark/commits/v3.2.0-rc6] using the > following command: > {code:java} > $ ./build/mvn \ > -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \ > -DskipTests \ > clean install > {code} > {code:java} > $ java -version > openjdk version "11.0.12" 2021-07-20 > OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7) > OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode) > {code} >Reporter: Jacek Laskowski >Priority: Critical > Attachments: exception.txt > > > It looks similar to [hivethriftserver built into spark3.0.0. is throwing > error "org.postgresql.Driver" was not found in the > CLASSPATH|https://stackoverflow.com/q/62534653/1305344], but reporting here > for future reference. > After I built the 3.2.0 (RC6) I ran `spark-shell` to execute `sql("describe > table covid_19")`. That gave me the exception (a full version is attached): > {code} > Caused by: java.lang.reflect.InvocationTargetException: > org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" > plugin to create a ConnectionPool gave an error : The specified datastore > driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check > your CLASSPATH specification, and the name of the driver. > at jdk.internal.reflect.GeneratedConstructorAccessor64.newInstance(Unknown > Source) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:330) > at > org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:203) > at > org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:162) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:285) > at jdk.internal.reflect.GeneratedConstructorAccessor63.newInstance(Unknown > Source) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) > at > org.datanucleus.NucleusContextHelper.createStoreManagerForProperties(NucleusContextHelper.java:133) > at > org.datanucleus.PersistenceNucleusContextImpl.initialise(PersistenceNucleusContextImpl.java:422) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:817) > ... 171 more > Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the > "BONECP" plugin to create a ConnectionPool gave an error : The specified > datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. > Please check your CLASSPATH specification, and the name of the driver. > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117) > at >
[jira] [Created] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables
jinhai created SPARK-37029: -- Summary: Modify the assignment logic of dirFetchRequests variables Key: SPARK-37029 URL: https://issues.apache.org/jira/browse/SPARK-37029 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.2.0, 3.1.2 Reporter: jinhai In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the MapStatus object generated in the shuffle write phase had already generated the BlockManagerId object according to externalShuffleServiceEnabled in the BlockManager.initialize method. So we don't need to judge it again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-37018) Spark SQL should support create function with Aggregator
[ https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37018: --- Comment: was deleted (was: I'm working.) > Spark SQL should support create function with Aggregator > > > Key: SPARK-37018 > URL: https://issues.apache.org/jira/browse/SPARK-37018 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL not support create function with Aggregator and deprecated > UserDefinedAggregateFunction. > If we remove UserDefinedAggregateFunction, Spark SQL should provide a new > option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37018) Spark SQL should support create function with Aggregator
[ https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37018: Assignee: (was: Apache Spark) > Spark SQL should support create function with Aggregator > > > Key: SPARK-37018 > URL: https://issues.apache.org/jira/browse/SPARK-37018 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL not support create function with Aggregator and deprecated > UserDefinedAggregateFunction. > If we remove UserDefinedAggregateFunction, Spark SQL should provide a new > option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37018) Spark SQL should support create function with Aggregator
[ https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429657#comment-17429657 ] Apache Spark commented on SPARK-37018: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34303 > Spark SQL should support create function with Aggregator > > > Key: SPARK-37018 > URL: https://issues.apache.org/jira/browse/SPARK-37018 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL not support create function with Aggregator and deprecated > UserDefinedAggregateFunction. > If we remove UserDefinedAggregateFunction, Spark SQL should provide a new > option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37018) Spark SQL should support create function with Aggregator
[ https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429658#comment-17429658 ] Apache Spark commented on SPARK-37018: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34303 > Spark SQL should support create function with Aggregator > > > Key: SPARK-37018 > URL: https://issues.apache.org/jira/browse/SPARK-37018 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Spark SQL not support create function with Aggregator and deprecated > UserDefinedAggregateFunction. > If we remove UserDefinedAggregateFunction, Spark SQL should provide a new > option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37018) Spark SQL should support create function with Aggregator
[ https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37018: Assignee: Apache Spark > Spark SQL should support create function with Aggregator > > > Key: SPARK-37018 > URL: https://issues.apache.org/jira/browse/SPARK-37018 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Spark SQL not support create function with Aggregator and deprecated > UserDefinedAggregateFunction. > If we remove UserDefinedAggregateFunction, Spark SQL should provide a new > option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37028: Assignee: Apache Spark > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Assignee: Apache Spark >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37028: Assignee: (was: Apache Spark) > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429652#comment-17429652 ] Apache Spark commented on SPARK-37028: -- User 'weixiuli' has created a pull request for this issue: https://github.com/apache/spark/pull/34302 > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem, but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executor link in the Web UI. (was: Add a 'kill' executor link in Web UI.) > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or it has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org