[jira] [Resolved] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc

2021-10-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37032.
-
Fix Version/s: 3.2.1
   3.3.0
   Resolution: Fixed

Issue resolved by pull request 34307
[https://github.com/apache/spark/pull/34307]

> Remove unuseable link in spark-3.2.0's doc
> --
>
> Key: SPARK-37032
> URL: https://issues.apache.org/jira/browse/SPARK-37032
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> Four links is empty
> !image-2021-10-18-10-48-21-437.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc

2021-10-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37032:
---

Assignee: angerszhu

> Remove unuseable link in spark-3.2.0's doc
> --
>
> Key: SPARK-37032
> URL: https://issues.apache.org/jira/browse/SPARK-37032
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Four links is empty
> !image-2021-10-18-10-48-21-437.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray

2021-10-17 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-37037:
--
Description: 
BinaryType use `TypeUtils.compareBinary` to compare two byte array, however 
it's slow since it compares byte array using unsigned int comparison byte by 
bye.

We can compare them using `Platform.getLong` with unsigned long comparison if 
they have more than 8 bytes. And here is some histroy about this 
[https://github.com/apache/spark/pull/6755/files#r32197461 
.|https://github.com/apache/spark/pull/6755/files#r32197461]

  was:
BinaryType use `TypeUtils.compareBinary` to compare two byte array, however 
it's slow since it compares byte array byte by bye.

We can compare them using `Platform.getLong` if they have more than 8 bytes. 
And here is some histroy about this 
[https://github.com/apache/spark/pull/6755/files#r32197461 
.|https://github.com/apache/spark/pull/6755/files#r32197461]


> Improve byte array sort by unify compareTo function of UTF8String and 
> ByteArray 
> 
>
> Key: SPARK-37037
> URL: https://issues.apache.org/jira/browse/SPARK-37037
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> BinaryType use `TypeUtils.compareBinary` to compare two byte array, however 
> it's slow since it compares byte array using unsigned int comparison byte by 
> bye.
> We can compare them using `Platform.getLong` with unsigned long comparison if 
> they have more than 8 bytes. And here is some histroy about this 
> [https://github.com/apache/spark/pull/6755/files#r32197461 
> .|https://github.com/apache/spark/pull/6755/files#r32197461]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35925) Support DayTimeIntervalType in width-bucket function

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35925:


Assignee: (was: Apache Spark)

> Support DayTimeIntervalType in width-bucket function
> 
>
> Key: SPARK-35925
> URL: https://issues.apache.org/jira/browse/SPARK-35925
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> At now, width-bucket support the type [DoubleType, DoubleType, DoubleType, 
> LongType],
> we hope that support[DayTimeIntervaType, DayTimeIntervaType, 
> DayTimeIntervaType, LongType]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35925) Support DayTimeIntervalType in width-bucket function

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35925:


Assignee: Apache Spark

> Support DayTimeIntervalType in width-bucket function
> 
>
> Key: SPARK-35925
> URL: https://issues.apache.org/jira/browse/SPARK-35925
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
>
> At now, width-bucket support the type [DoubleType, DoubleType, DoubleType, 
> LongType],
> we hope that support[DayTimeIntervaType, DayTimeIntervaType, 
> DayTimeIntervaType, LongType]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35925) Support DayTimeIntervalType in width-bucket function

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429818#comment-17429818
 ] 

Apache Spark commented on SPARK-35925:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/34309

> Support DayTimeIntervalType in width-bucket function
> 
>
> Key: SPARK-35925
> URL: https://issues.apache.org/jira/browse/SPARK-35925
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> At now, width-bucket support the type [DoubleType, DoubleType, DoubleType, 
> LongType],
> we hope that support[DayTimeIntervaType, DayTimeIntervaType, 
> DayTimeIntervaType, LongType]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37037) Improve byte array sort by unify compareTo function of UTF8String and ByteArray

2021-10-17 Thread XiDuo You (Jira)
XiDuo You created SPARK-37037:
-

 Summary: Improve byte array sort by unify compareTo function of 
UTF8String and ByteArray 
 Key: SPARK-37037
 URL: https://issues.apache.org/jira/browse/SPARK-37037
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: XiDuo You


BinaryType use `TypeUtils.compareBinary` to compare two byte array, however 
it's slow since it compares byte array byte by bye.

We can compare them using `Platform.getLong` if they have more than 8 bytes. 
And here is some histroy about this 
[https://github.com/apache/spark/pull/6755/files#r32197461 
.|https://github.com/apache/spark/pull/6755/files#r32197461]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429812#comment-17429812
 ] 

Apache Spark commented on SPARK-37035:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34308

> Improve error message when use vectorize reader
> ---
>
> Key: SPARK-37035
> URL: https://issues.apache.org/jira/browse/SPARK-37035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Vectorized reader won't show which file read failed.
>  
> None-vectorize parquet reader 
> {code}
> cutionException: Encounter error while reading parquet files. One possible 
> cause: Parquet column cannot be converted in the corresponding files. Details:
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file hdfs://path/to/failed/file
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>   ... 15 more
> {code}
> Vectorize parquet reader
> {code}
> 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 
> 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled 
> (Stage cancelled)
> : An error occurred while calling o362.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 
> in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 
> 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): 
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 

[jira] [Assigned] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37035:


Assignee: (was: Apache Spark)

> Improve error message when use vectorize reader
> ---
>
> Key: SPARK-37035
> URL: https://issues.apache.org/jira/browse/SPARK-37035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Vectorized reader won't show which file read failed.
>  
> None-vectorize parquet reader 
> {code}
> cutionException: Encounter error while reading parquet files. One possible 
> cause: Parquet column cannot be converted in the corresponding files. Details:
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file hdfs://path/to/failed/file
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>   ... 15 more
> {code}
> Vectorize parquet reader
> {code}
> 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 
> 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled 
> (Stage cancelled)
> : An error occurred while calling o362.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 
> in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 
> 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): 
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> 

[jira] [Assigned] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37035:


Assignee: Apache Spark

> Improve error message when use vectorize reader
> ---
>
> Key: SPARK-37035
> URL: https://issues.apache.org/jira/browse/SPARK-37035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Vectorized reader won't show which file read failed.
>  
> None-vectorize parquet reader 
> {code}
> cutionException: Encounter error while reading parquet files. One possible 
> cause: Parquet column cannot be converted in the corresponding files. Details:
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file hdfs://path/to/failed/file
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>   ... 15 more
> {code}
> Vectorize parquet reader
> {code}
> 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 
> 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled 
> (Stage cancelled)
> : An error occurred while calling o362.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 
> in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 
> 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): 
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> 

[jira] [Commented] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.

2021-10-17 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429811#comment-17429811
 ] 

Haejoon Lee commented on SPARK-37036:
-

I'm working on this

> Add util function to raise advice warning for pandas API on Spark.
> --
>
> Key: SPARK-37036
> URL: https://issues.apache.org/jira/browse/SPARK-37036
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Pandas API on Spark has some features that potentially cause the performance 
> degradation or an unexpected behavior e.g. `sort_index`, `index_col`, 
> `to_pandas`, etc.
>  
> We should raise the proper advice warning for those functions so that users 
> can adjust their pandas-on-Spark code base more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.

2021-10-17 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-37036:
---

 Summary: Add util function to raise advice warning for pandas API 
on Spark.
 Key: SPARK-37036
 URL: https://issues.apache.org/jira/browse/SPARK-37036
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Haejoon Lee


Pandas API on Spark has some features that potentially cause the performance 
degradation or an unexpected behavior e.g. `sort_index`, `index_col`, 
`to_pandas`, etc.

 

We should raise the proper advice warning for those functions so that users can 
adjust their pandas-on-Spark code base more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429809#comment-17429809
 ] 

angerszhu commented on SPARK-37035:
---

raise a pr soon


> Improve error message when use vectorize reader
> ---
>
> Key: SPARK-37035
> URL: https://issues.apache.org/jira/browse/SPARK-37035
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Vectorized reader won't show which file read failed.
>  
> None-vectorize parquet reader 
> {code}
> cutionException: Encounter error while reading parquet files. One possible 
> cause: Parquet column cannot be converted in the corresponding files. Details:
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file hdfs://path/to/failed/file
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>   ... 15 more
> {code}
> Vectorize parquet reader
> {code}
> 21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 
> 10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled 
> (Stage cancelled)
> : An error occurred while calling o362.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 
> in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 
> 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): 
> java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>   at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
>   at 
> org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> 

[jira] [Updated] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37035:
--
Description: 
Vectorized reader won't show which file read failed.

 
None-vectorize parquet reader 
{code}
cutionException: Encounter error while reading parquet files. One possible 
cause: Parquet column cannot be converted in the corresponding files. Details:
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 1 in block 0 in file hdfs://path/to/failed/file
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
... 15 more
{code}


Vectorize parquet reader
{code}
21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 
10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled 
(Stage cancelled)
: An error occurred while calling o362.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 
in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 
17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): 
java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
at 
org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at 

[jira] [Updated] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37035:
--
Description: 
Vectorized reader won't show which file read failed.

 
Ono-vectorize parquet reader 
```
cutionException: Encounter error while reading parquet files. One possible 
cause: Parquet column cannot be converted in the corresponding files. Details:
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 1 in block 0 in file 
hdfs://R2/projects/data_notificationmart/dwd_traceid_sent_civ_first_di/tz_type=local/grass_region=TW/grass_date=2021-10-13/noti_type=AR/part-00013-22bdd509-4469-47f7-a37e-50fddd4266a7-c000.zstd.parquet
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
... 15 more
```


Vectorize parquet reader
{code
21/10/15 18:01:54 WARN TaskSetManager: Lost task 1881.0 in stage 16.0 (TID 
10380, ip-10-130-169-140.idata-server.shopee.io, executor 168): TaskKilled 
(Stage cancelled)
: An error occurred while calling o362.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 963 
in stage 17.0 failed 4 times, most recent failure: Lost task 963.3 in stage 
17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, executor 99): 
java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
at 
org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 

[jira] [Updated] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37035:
--
Description: 
Vectorized reader won't show which file read failed.

 

 

  was:
Vectorized reader won't show which file read failed.

 

No-vectorize parquet reader Ono-vectorize parquet reader ```cutionException: 
Encounter error while reading parquet files. One possible cause: Parquet column 
cannot be converted in the corresponding files. Details: at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in 
block 0 in file 
hdfs://R2/projects/data_notificationmart/dwd_traceid_sent_civ_first_di/tz_type=local/grass_region=TW/grass_date=2021-10-13/noti_type=AR/part-00013-22bdd509-4469-47f7-a37e-50fddd4266a7-c000.zstd.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
 ... 15 more```

Vectorize parquet reader```21/10/15 18:01:54 WARN TaskSetManager: Lost task 
1881.0 in stage 16.0 (TID 10380, ip-10-130-169-140.idata-server.shopee.io, 
executor 168): TaskKilled (Stage cancelled): An error occurred while calling 
o362.showString.: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 963 in stage 17.0 failed 4 times, most recent failure: Lost task 
963.3 in stage 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, 
executor 99): java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
 at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at 
org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
 at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
 at 
org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
 Source) at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source) at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
 at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 

[jira] [Created] (SPARK-37035) Improve error message when use vectorize reader

2021-10-17 Thread angerszhu (Jira)
angerszhu created SPARK-37035:
-

 Summary: Improve error message when use vectorize reader
 Key: SPARK-37035
 URL: https://issues.apache.org/jira/browse/SPARK-37035
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.1.2
Reporter: angerszhu


Vectorized reader won't show which file read failed.

 

No-vectorize parquet reader Ono-vectorize parquet reader ```cutionException: 
Encounter error while reading parquet files. One possible cause: Parquet column 
cannot be converted in the corresponding files. Details: at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:193)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in 
block 0 in file 
hdfs://R2/projects/data_notificationmart/dwd_traceid_sent_civ_first_di/tz_type=local/grass_region=TW/grass_date=2021-10-13/noti_type=AR/part-00013-22bdd509-4469-47f7-a37e-50fddd4266a7-c000.zstd.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
 ... 15 more```

Vectorize parquet reader```21/10/15 18:01:54 WARN TaskSetManager: Lost task 
1881.0 in stage 16.0 (TID 10380, ip-10-130-169-140.idata-server.shopee.io, 
executor 168): TaskKilled (Stage cancelled): An error occurred while calling 
o362.showString.: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 963 in stage 17.0 failed 4 times, most recent failure: Lost task 
963.3 in stage 17.0 (TID 10351, ip-10-130-75-201.idata-server.shopee.io, 
executor 99): java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
 at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at 
org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
 at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
 at 
org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getLong(MutableColumnarRow.java:120)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown
 Source) at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source) at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:351)
 at 
org.apache.spark.sql.execution.FileSourceScanExec$$anonfun$doExecute$2$$anonfun$apply$2.apply(DataSourceScanExec.scala:349)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) 
at org.apache.spark.scheduler.Task.run(Task.scala:123) at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at 

[jira] [Updated] (SPARK-37034) What's the progress of vectorized execution for spark?

2021-10-17 Thread xiaoli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaoli updated SPARK-37034:
---
Description: 
Spark has support vectorized read for ORC and parquet. What's the progress of 
other vectorized execution, e.g. vectorized write,  join, aggr, simple operator 
(string function, math function)? 

Hive support vectorized execution in early version 
(https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution) 
As we know, Spark is replacement of Hive. I guess the reason why Spark does not 
support vectorized execution maybe the difficulty of design or implementation 
in Spark is larger. What's the main issue for Spark to support vectorized 
execution?

  was:
Spark has support vectorized read for ORC and parquet. What's the progress of 
other vectorized execution, e.g. vectorized write,  join, aggr, simple operator 
(string function, math function)? 

Hive support vectorized execution in [early 
version|[https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution]|https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution].]
 As we know, Spark is replacement of Hive. I guess the reason why Spark does 
not support vectorized execution maybe the difficulty of design or 
implementation in Spark is larger. What's the main issue for Spark to support 
vectorized execution?


> What's the progress of vectorized execution for spark?
> --
>
> Key: SPARK-37034
> URL: https://issues.apache.org/jira/browse/SPARK-37034
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: xiaoli
>Priority: Major
>
> Spark has support vectorized read for ORC and parquet. What's the progress of 
> other vectorized execution, e.g. vectorized write,  join, aggr, simple 
> operator (string function, math function)? 
> Hive support vectorized execution in early version 
> (https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution) 
> As we know, Spark is replacement of Hive. I guess the reason why Spark does 
> not support vectorized execution maybe the difficulty of design or 
> implementation in Spark is larger. What's the main issue for Spark to support 
> vectorized execution?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37034) What's the progress of vectorized execution for spark?

2021-10-17 Thread xiaoli (Jira)
xiaoli created SPARK-37034:
--

 Summary: What's the progress of vectorized execution for spark?
 Key: SPARK-37034
 URL: https://issues.apache.org/jira/browse/SPARK-37034
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: xiaoli


Spark has support vectorized read for ORC and parquet. What's the progress of 
other vectorized execution, e.g. vectorized write,  join, aggr, simple operator 
(string function, math function)? 

Hive support vectorized execution in [early 
version|[https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution]|https://cwiki.apache.org/confluence/display/hive/vectorized+query+execution].]
 As we know, Spark is replacement of Hive. I guess the reason why Spark does 
not support vectorized execution maybe the difficulty of design or 
implementation in Spark is larger. What's the main issue for Spark to support 
vectorized execution?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37002) Introduce the 'compute.eager_check' option

2021-10-17 Thread dch nguyen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dch nguyen updated SPARK-37002:
---
Summary: Introduce the 'compute.eager_check' option  (was: Introduce the 
'compute.check_identical_indices' option)

> Introduce the 'compute.eager_check' option
> --
>
> Key: SPARK-37002
> URL: https://issues.apache.org/jira/browse/SPARK-37002
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-36968
> [https://github.com/apache/spark/pull/34235]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37033) Inline type hints for python/pyspark/resource/requests.py

2021-10-17 Thread dch nguyen (Jira)
dch nguyen created SPARK-37033:
--

 Summary: Inline type hints for python/pyspark/resource/requests.py
 Key: SPARK-37033
 URL: https://issues.apache.org/jira/browse/SPARK-37033
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37033) Inline type hints for python/pyspark/resource/requests.py

2021-10-17 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429802#comment-17429802
 ] 

dch nguyen commented on SPARK-37033:


working on this!

> Inline type hints for python/pyspark/resource/requests.py
> -
>
> Key: SPARK-37033
> URL: https://issues.apache.org/jira/browse/SPARK-37033
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37032:


Assignee: (was: Apache Spark)

> Remove unuseable link in spark-3.2.0's doc
> --
>
> Key: SPARK-37032
> URL: https://issues.apache.org/jira/browse/SPARK-37032
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Four links is empty
> !image-2021-10-18-10-48-21-437.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37032:


Assignee: Apache Spark

> Remove unuseable link in spark-3.2.0's doc
> --
>
> Key: SPARK-37032
> URL: https://issues.apache.org/jira/browse/SPARK-37032
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Four links is empty
> !image-2021-10-18-10-48-21-437.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429800#comment-17429800
 ] 

Apache Spark commented on SPARK-37032:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34307

> Remove unuseable link in spark-3.2.0's doc
> --
>
> Key: SPARK-37032
> URL: https://issues.apache.org/jira/browse/SPARK-37032
> Project: Spark
>  Issue Type: Improvement
>  Components: docs
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Four links is empty
> !image-2021-10-18-10-48-21-437.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37032) Remove unuseable link in spark-3.2.0's doc

2021-10-17 Thread angerszhu (Jira)
angerszhu created SPARK-37032:
-

 Summary: Remove unuseable link in spark-3.2.0's doc
 Key: SPARK-37032
 URL: https://issues.apache.org/jira/browse/SPARK-37032
 Project: Spark
  Issue Type: Improvement
  Components: docs
Affects Versions: 3.2.0
Reporter: angerszhu


Four links is empty

!image-2021-10-18-10-48-21-437.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release

2021-10-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36151:


Assignee: Josh Rosen

> Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
> --
>
> Key: SPARK-36151
> URL: https://issues.apache.org/jira/browse/SPARK-36151
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release

2021-10-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36151.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34306
[https://github.com/apache/spark/pull/34306]

> Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
> --
>
> Key: SPARK-36151
> URL: https://issues.apache.org/jira/browse/SPARK-36151
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests

2021-10-17 Thread gaoyajun02 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoyajun02 updated SPARK-36964:
---
Affects Version/s: 3.3.0
   3.2.0

> Reuse CachedDNSToSwitchMapping for yarn  container requests
> ---
>
> Key: SPARK-36964
> URL: https://issues.apache.org/jira/browse/SPARK-36964
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: gaoyajun02
>Priority: Major
>
> Similar to SPARK-13704​, In some cases, YarnAllocator add container requests 
> with locality preference can be expensive, it may call the topology script 
> for rack awareness.
> When submit a very large job in a very large Yarn cluster, the topology 
> script may take signifiant time to run. And this blocks receiving 
> YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from 
> spark dynamic executor allocation thread, which may blocks the 
> ExecutorAllocationListener, and then result in executorManagement queue 
> backlog.
>  
> Some logs:
> {code:java}
> 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation 
> ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 
> INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error 
> reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures 
> timed out after [120 seconds]. This timeout is controlled by 
> spark.rpc.askTimeout at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
>  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839)
>  at 
> org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411)
>  at 
> org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361)
>  at 
> org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316)
>  at 
> org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)Caused by: 
> java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] 
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at 
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 
> more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation 
> ExecutorAllocationManager: Unable to reach the cluster manager to request 
> 1922 total executors!
> 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation 
> ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 
> INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error 
> reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures 
> timed out after [120 seconds]. This timeout is controlled by 
> spark.rpc.askTimeout at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
>  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) 
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839)
>  at 
> 

[jira] [Commented] (SPARK-36928) Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray

2021-10-17 Thread PengLei (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429791#comment-17429791
 ] 

PengLei commented on SPARK-36928:
-

working on this later

> Handle ANSI intervals in ColumnarRow, ColumnarBatchRow and ColumnarArray
> 
>
> Key: SPARK-36928
> URL: https://issues.apache.org/jira/browse/SPARK-36928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle ANSI interval types - YearMonthIntervalType and DayTimeIntervalType in 
> Columnar* classes, and write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36151:


Assignee: Apache Spark

> Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
> --
>
> Key: SPARK-36151
> URL: https://issues.apache.org/jira/browse/SPARK-36151
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36151:


Assignee: (was: Apache Spark)

> Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
> --
>
> Key: SPARK-36151
> URL: https://issues.apache.org/jira/browse/SPARK-36151
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36151) Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429785#comment-17429785
 ] 

Apache Spark commented on SPARK-36151:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/34306

> Enable MiMa for Scala 2.13 artifacts after Spark 3.2.0 release
> --
>
> Key: SPARK-36151
> URL: https://issues.apache.org/jira/browse/SPARK-36151
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13

2021-10-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37026.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34301
[https://github.com/apache/spark/pull/34301]

> Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
> -
>
> Key: SPARK-37026
> URL: https://issues.apache.org/jira/browse/SPARK-37026
> Project: Spark
>  Issue Type: Bug
>  Components: Build, ML
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.3.0
>
>
> ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because 
> the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but 
> scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429751#comment-17429751
 ] 

Apache Spark commented on SPARK-37031:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34305

> Unify v1 and v2 DESCRIBE NAMESPACE tests
> 
>
> Key: SPARK-37031
> URL: https://issues.apache.org/jira/browse/SPARK-37031
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and 
> v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429752#comment-17429752
 ] 

Apache Spark commented on SPARK-37031:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34305

> Unify v1 and v2 DESCRIBE NAMESPACE tests
> 
>
> Key: SPARK-37031
> URL: https://issues.apache.org/jira/browse/SPARK-37031
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and 
> v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37031:


Assignee: (was: Apache Spark)

> Unify v1 and v2 DESCRIBE NAMESPACE tests
> 
>
> Key: SPARK-37031
> URL: https://issues.apache.org/jira/browse/SPARK-37031
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and 
> v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37031:


Assignee: Apache Spark

> Unify v1 and v2 DESCRIBE NAMESPACE tests
> 
>
> Key: SPARK-37031
> URL: https://issues.apache.org/jira/browse/SPARK-37031
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and 
> v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37031) Unify v1 and v2 DESCRIBE NAMESPACE tests

2021-10-17 Thread Terry Kim (Jira)
Terry Kim created SPARK-37031:
-

 Summary: Unify v1 and v2 DESCRIBE NAMESPACE tests
 Key: SPARK-37031
 URL: https://issues.apache.org/jira/browse/SPARK-37031
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Terry Kim


Extract DESCRIBE NAMESPACE tests to the common place to run them for V1 and v2 
datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36853) Code failing on checkstyle

2021-10-17 Thread Shockang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shockang updated SPARK-36853:
-
Attachment: image-2021-10-18-01-57-00-714.png

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
> Attachments: image-2021-10-18-01-57-00-714.png, 
> spark_mvn_clean_install_skip_tests_in_windows.log
>
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] 
> (naming) MethodName: Method name 'Once' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] 
> (naming) MethodName: Method name 'AvailableNow' must match pattern 

[jira] [Commented] (SPARK-36853) Code failing on checkstyle

2021-10-17 Thread Shockang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429743#comment-17429743
 ] 

Shockang commented on SPARK-36853:
--

Due to the existence of the following issue: 
[SPARK-37030|https://issues.apache.org/jira/browse/SPARK-37030], maven build 
failed in windows!

I annotated the doubtful code about bash and re executed the command:
{code:java}
mvn -DskipTests clean install
{code}
!image-2021-10-18-01-57-00-714.png!

For your reference, I have attached the build log.

[~hyukjin.kwon] Can this issue be split into multiple subtasks? Because there 
are 131 errors.

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
> Attachments: image-2021-10-18-01-57-00-714.png, 
> spark_mvn_clean_install_skip_tests_in_windows.log
>
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] 

[jira] [Updated] (SPARK-36853) Code failing on checkstyle

2021-10-17 Thread Shockang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shockang updated SPARK-36853:
-
Attachment: spark_mvn_clean_install_skip_tests_in_windows.log

> Code failing on checkstyle
> --
>
> Key: SPARK-36853
> URL: https://issues.apache.org/jira/browse/SPARK-36853
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Abhinav Kumar
>Priority: Trivial
> Attachments: spark_mvn_clean_install_skip_tests_in_windows.log
>
>
> There are more - just pasting sample 
>  
> [INFO] There are 32 errors reported by Checkstyle 8.43 with 
> dev/checkstyle.xml ruleset.
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF11.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 107).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF12.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 116).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 104).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF13.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 125).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 109).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF14.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 114).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF15.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 143).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 119).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF16.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 152).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 124).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF17.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 161).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 129).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF18.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 170).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 134).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF19.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 179).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 139).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF20.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 188).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 144).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF21.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 197).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[28] (sizes) 
> LineLength: Line is longer than 100 characters (found 149).
> [ERROR] src\main\java\org\apache\spark\sql\api\java\UDF22.java:[29] (sizes) 
> LineLength: Line is longer than 100 characters (found 206).
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[44,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[60,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[75,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[88,25] 
> (naming) MethodName: Method name 'ProcessingTime' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[100,25] 
> (naming) MethodName: Method name 'Once' must match pattern 
> '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
> [ERROR] src\main\java\org\apache\spark\sql\streaming\Trigger.java:[110,25] 
> (naming) MethodName: Method name 'AvailableNow' must match pattern 
> 

[jira] [Updated] (SPARK-37030) Maven build failed in windows!

2021-10-17 Thread Shockang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shockang updated SPARK-37030:
-
Description: 
I pulled the latest Spark master code on my local windows 10 computer and 
executed the following command:
{code:java}
mvn -DskipTests clean install{code}
Build failed!

!image-2021-10-17-22-18-16-616.png!
{code:java}
Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run 
(default) on project spark-core_2.12: An Ant BuildException has occured: 
Execute failed: java.io.IOException: Cannot run program "bash" (in directory 
"C:\bigdata\spark\core"): CreateProcess error=2{code}
It seems that the plugin: maven-antrun-plugin cannot run because of windows no 
bash. 

The following code comes from pom.xml in spark-core module.
{code:java}


  org.apache.maven.plugins

  maven-antrun-plugin

  

    

      generate-resources

      

        

        

          

            

            

            

          

        

      

      

        run

      

    

  



{code}
 

  was:
I pulled the latest Spark master code on my local windows 10 computer and 
executed the following command:
{code:java}
mvn -DskipTests clean install{code}
Build failed!

!image-2021-10-17-21-55-33-844.png!
{code:java}

Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run 
(default) on project spark-core_2.12: An Ant BuildException has occured: 
Execute failed: java.io.IOException: Cannot run program "bash" (in directory 
"C:\bigdata\spark\core"): CreateProcess error=2{code}
It seems that the plugin: maven-antrun-plugin cannot run because of windows no 
bash. 

The following code comes from pom.xml in spark-core module.
{code:java}


  org.apache.maven.plugins

  maven-antrun-plugin

  

    

      generate-resources

      

        

        

          

            

            

            

          

        

      

      

        run

      

    

  



{code}
 


> Maven build failed in windows!
> --
>
> Key: SPARK-37030
> URL: https://issues.apache.org/jira/browse/SPARK-37030
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
> Environment: OS: Windows 10 Professional
> OS Version: 21H1
> Maven Version: 3.6.3
>  
>Reporter: Shockang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: image-2021-10-17-22-18-16-616.png
>
>
> I pulled the latest Spark master code on my local windows 10 computer and 
> executed the following command:
> {code:java}
> mvn -DskipTests clean install{code}
> Build failed!
> !image-2021-10-17-22-18-16-616.png!
> {code:java}
> Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run 
> (default) on project spark-core_2.12: An Ant BuildException has occured: 
> Execute failed: java.io.IOException: Cannot run program "bash" (in directory 
> "C:\bigdata\spark\core"): CreateProcess error=2{code}
> It seems that the plugin: maven-antrun-plugin cannot run because of windows 
> no bash. 
> The following code comes from pom.xml in spark-core module.
> {code:java}
> 
>   org.apache.maven.plugins
>   maven-antrun-plugin
>   
>     
>       generate-resources
>       
>         
>         
>           
>             
>             
>             
>           
>         
>       
>       
>         run
>       
>     
>   
> 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37030) Maven build failed in windows!

2021-10-17 Thread Shockang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shockang updated SPARK-37030:
-
Attachment: image-2021-10-17-22-18-16-616.png

> Maven build failed in windows!
> --
>
> Key: SPARK-37030
> URL: https://issues.apache.org/jira/browse/SPARK-37030
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
> Environment: OS: Windows 10 Professional
> OS Version: 21H1
> Maven Version: 3.6.3
>  
>Reporter: Shockang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: image-2021-10-17-22-18-16-616.png
>
>
> I pulled the latest Spark master code on my local windows 10 computer and 
> executed the following command:
> {code:java}
> mvn -DskipTests clean install{code}
> Build failed!
> !image-2021-10-17-21-55-33-844.png!
> {code:java}
> Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run 
> (default) on project spark-core_2.12: An Ant BuildException has occured: 
> Execute failed: java.io.IOException: Cannot run program "bash" (in directory 
> "C:\bigdata\spark\core"): CreateProcess error=2{code}
> It seems that the plugin: maven-antrun-plugin cannot run because of windows 
> no bash. 
> The following code comes from pom.xml in spark-core module.
> {code:java}
> 
>   org.apache.maven.plugins
>   maven-antrun-plugin
>   
>     
>       generate-resources
>       
>         
>         
>           
>             
>             
>             
>           
>         
>       
>       
>         run
>       
>     
>   
> 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37030) Maven build failed in windows!

2021-10-17 Thread Shockang (Jira)
Shockang created SPARK-37030:


 Summary: Maven build failed in windows!
 Key: SPARK-37030
 URL: https://issues.apache.org/jira/browse/SPARK-37030
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.2.0
 Environment: OS: Windows 10 Professional

OS Version: 21H1

Maven Version: 3.6.3

 
Reporter: Shockang
 Fix For: 3.2.0


I pulled the latest Spark master code on my local windows 10 computer and 
executed the following command:
{code:java}
mvn -DskipTests clean install{code}
Build failed!

!image-2021-10-17-21-55-33-844.png!
{code:java}

Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run 
(default) on project spark-core_2.12: An Ant BuildException has occured: 
Execute failed: java.io.IOException: Cannot run program "bash" (in directory 
"C:\bigdata\spark\core"): CreateProcess error=2{code}
It seems that the plugin: maven-antrun-plugin cannot run because of windows no 
bash. 

The following code comes from pom.xml in spark-core module.
{code:java}


  org.apache.maven.plugins

  maven-antrun-plugin

  

    

      generate-resources

      

        

        

          

            

            

            

          

        

      

      

        run

      

    

  



{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27312) PropertyGraph <-> GraphX conversions

2021-10-17 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-27312:
--

Assignee: (was: Weichen Xu)

> PropertyGraph <-> GraphX conversions
> 
>
> Key: SPARK-27312
> URL: https://issues.apache.org/jira/browse/SPARK-27312
> Project: Spark
>  Issue Type: Story
>  Components: Graph, GraphX
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> As a user, I can convert a GraphX graph into a PropertyGraph and a 
> PropertyGraph into a GraphX graph if they are compatible.
> * Scala only
> * Whether this is an internal API is pending design discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37029:


Assignee: Apache Spark

> Modify the assignment logic of dirFetchRequests variables
> -
>
> Key: SPARK-37029
> URL: https://issues.apache.org/jira/browse/SPARK-37029
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.2, 3.2.0
>Reporter: jinhai
>Assignee: Apache Spark
>Priority: Major
>
> In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate 
> dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the 
> MapStatus object generated in the shuffle write phase had already generated 
> the BlockManagerId object according to externalShuffleServiceEnabled in the 
> BlockManager.initialize method.
> So we don't need to judge it again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37029:


Assignee: (was: Apache Spark)

> Modify the assignment logic of dirFetchRequests variables
> -
>
> Key: SPARK-37029
> URL: https://issues.apache.org/jira/browse/SPARK-37029
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.2, 3.2.0
>Reporter: jinhai
>Priority: Major
>
> In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate 
> dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the 
> MapStatus object generated in the shuffle write phase had already generated 
> the BlockManagerId object according to externalShuffleServiceEnabled in the 
> BlockManager.initialize method.
> So we don't need to judge it again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429681#comment-17429681
 ] 

Apache Spark commented on SPARK-37029:
--

User 'manbuyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34304

> Modify the assignment logic of dirFetchRequests variables
> -
>
> Key: SPARK-37029
> URL: https://issues.apache.org/jira/browse/SPARK-37029
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.1.2, 3.2.0
>Reporter: jinhai
>Priority: Major
>
> In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate 
> dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the 
> MapStatus object generated in the shuffle write phase had already generated 
> the BlockManagerId object according to externalShuffleServiceEnabled in the 
> BlockManager.initialize method.
> So we don't need to judge it again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36904) The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH

2021-10-17 Thread Jacek Laskowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski resolved SPARK-36904.
-
Resolution: Invalid

I finally managed to find the root cause of the issue which is 
{{conf/hive-site.xml}} in {{HIVE_HOME}} with the driver configured (!) Sorry 
for a false alarm.

> The specified datastore driver ("org.postgresql.Driver") was not found in the 
> CLASSPATH
> ---
>
> Key: SPARK-36904
> URL: https://issues.apache.org/jira/browse/SPARK-36904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
> Environment: Spark 3.2.0 (RC6)
> {code:java}
> $ ./bin/spark-shell --version 
>   
>
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/
> Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.12
> Branch heads/v3.2.0-rc6
> Compiled by user jacek on 2021-09-30T10:44:35Z
> Revision dde73e2e1c7e55c8e740cb159872e081ddfa7ed6
> Url https://github.com/apache/spark.git
> Type --help for more information.
> {code}
> Built from [https://github.com/apache/spark/commits/v3.2.0-rc6] using the 
> following command:
> {code:java}
> $ ./build/mvn \
> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
> -DskipTests \
> clean install
> {code}
> {code:java}
> $ java -version
> openjdk version "11.0.12" 2021-07-20
> OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
> OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode) 
> {code}
>Reporter: Jacek Laskowski
>Priority: Critical
> Attachments: exception.txt
>
>
> It looks similar to [hivethriftserver built into spark3.0.0. is throwing 
> error "org.postgresql.Driver" was not found in the 
> CLASSPATH|https://stackoverflow.com/q/62534653/1305344], but reporting here 
> for future reference.
> After I built the 3.2.0 (RC6) I ran `spark-shell` to execute `sql("describe 
> table covid_19")`. That gave me the exception (a full version is attached):
> {code}
> Caused by: java.lang.reflect.InvocationTargetException: 
> org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" 
> plugin to create a ConnectionPool gave an error : The specified datastore 
> driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check 
> your CLASSPATH specification, and the name of the driver.
>   at jdk.internal.reflect.GeneratedConstructorAccessor64.newInstance(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:330)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:203)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:162)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:285)
>   at jdk.internal.reflect.GeneratedConstructorAccessor63.newInstance(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContextHelper.createStoreManagerForProperties(NucleusContextHelper.java:133)
>   at 
> org.datanucleus.PersistenceNucleusContextImpl.initialise(PersistenceNucleusContextImpl.java:422)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:817)
>   ... 171 more
> Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the 
> "BONECP" plugin to create a ConnectionPool gave an error : The specified 
> datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. 
> Please check your CLASSPATH specification, and the name of the driver.
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
>   at 
> 

[jira] [Created] (SPARK-37029) Modify the assignment logic of dirFetchRequests variables

2021-10-17 Thread jinhai (Jira)
jinhai created SPARK-37029:
--

 Summary: Modify the assignment logic of dirFetchRequests variables
 Key: SPARK-37029
 URL: https://issues.apache.org/jira/browse/SPARK-37029
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.2.0, 3.1.2
Reporter: jinhai


In the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, we generate 
dirFetchRequests based on externalShuffleServiceEnabled. But in fact, the 
MapStatus object generated in the shuffle write phase had already generated the 
BlockManagerId object according to externalShuffleServiceEnabled in the 
BlockManager.initialize method.

So we don't need to judge it again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-37018) Spark SQL should support create function with Aggregator

2021-10-17 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37018:
---
Comment: was deleted

(was: I'm working.)

> Spark SQL should support create function with Aggregator
> 
>
> Key: SPARK-37018
> URL: https://issues.apache.org/jira/browse/SPARK-37018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL not support create function with Aggregator and deprecated 
> UserDefinedAggregateFunction.
> If we remove UserDefinedAggregateFunction, Spark SQL should provide a new 
> option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37018) Spark SQL should support create function with Aggregator

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37018:


Assignee: (was: Apache Spark)

> Spark SQL should support create function with Aggregator
> 
>
> Key: SPARK-37018
> URL: https://issues.apache.org/jira/browse/SPARK-37018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL not support create function with Aggregator and deprecated 
> UserDefinedAggregateFunction.
> If we remove UserDefinedAggregateFunction, Spark SQL should provide a new 
> option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37018) Spark SQL should support create function with Aggregator

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429657#comment-17429657
 ] 

Apache Spark commented on SPARK-37018:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34303

> Spark SQL should support create function with Aggregator
> 
>
> Key: SPARK-37018
> URL: https://issues.apache.org/jira/browse/SPARK-37018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL not support create function with Aggregator and deprecated 
> UserDefinedAggregateFunction.
> If we remove UserDefinedAggregateFunction, Spark SQL should provide a new 
> option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37018) Spark SQL should support create function with Aggregator

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429658#comment-17429658
 ] 

Apache Spark commented on SPARK-37018:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34303

> Spark SQL should support create function with Aggregator
> 
>
> Key: SPARK-37018
> URL: https://issues.apache.org/jira/browse/SPARK-37018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL not support create function with Aggregator and deprecated 
> UserDefinedAggregateFunction.
> If we remove UserDefinedAggregateFunction, Spark SQL should provide a new 
> option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37018) Spark SQL should support create function with Aggregator

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37018:


Assignee: Apache Spark

> Spark SQL should support create function with Aggregator
> 
>
> Key: SPARK-37018
> URL: https://issues.apache.org/jira/browse/SPARK-37018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Spark SQL not support create function with Aggregator and deprecated 
> UserDefinedAggregateFunction.
> If we remove UserDefinedAggregateFunction, Spark SQL should provide a new 
> option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37028:


Assignee: Apache Spark

>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Assignee: Apache Spark
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this problem, 
> but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37028:


Assignee: (was: Apache Spark)

>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this problem, 
> but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429652#comment-17429652
 ] 

Apache Spark commented on SPARK-37028:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/34302

>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this problem, 
> but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-17 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this problem, 
but sometimes the speculated task may also run in a bad executor.
 We should have a 'kill' link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
 We should have a 'kill' link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.


>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this problem, 
> but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-17 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
 We should have a 'kill' link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.


>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
>  We should have a 'kill' link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.

2021-10-17 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Summary:  Add a 'kill' executor link in the Web UI.  (was:  Add a 'kill' 
executor link in Web UI.)

>  Add a 'kill' executor link in the Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
> We should have a "kill" link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.

2021-10-17 Thread weixiuli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37028:
-
Description: 
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.

  was:
The executor which is running in a bad node(eg. The system is overloaded or 
disks are busy) or it has big GC overheads may affect the efficiency of job 
execution, although there are speculative mechanisms to resolve this 
problem,but sometimes the speculated task may also run in a bad executor.
We should have a "kill" link for each executor, similar to what we have for 
each stage, so it's easier for users to kill executors in the UI.


>  Add a 'kill' executor link in Web UI.
> --
>
> Key: SPARK-37028
> URL: https://issues.apache.org/jira/browse/SPARK-37028
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: weixiuli
>Priority: Major
>
> The executor which is running in a bad node(eg. The system is overloaded or 
> disks are busy) or has big GC overheads may affect the efficiency of job 
> execution, although there are speculative mechanisms to resolve this 
> problem,but sometimes the speculated task may also run in a bad executor.
> We should have a "kill" link for each executor, similar to what we have for 
> each stage, so it's easier for users to kill executors in the UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org