date:20220418

[jira] [Resolved] (SPARK-38689) Use error classes in the compilation errors of not allowed DESC PARTITION

2022-04-18 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-38689.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36163
[https://github.com/apache/spark/pull/36163]

> Use error classes in the compilation errors of not allowed DESC PARTITION
> -
>
> Key: SPARK-38689
> URL: https://issues.apache.org/jira/browse/SPARK-38689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: huangtengfei
>Priority: Major
> Fix For: 3.4.0
>
>
> Migrate the following errors in QueryCompilationErrors:
> * descPartitionNotAllowedOnTempView
> * descPartitionNotAllowedOnView
> * descPartitionNotAllowedOnViewError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38689) Use error classes in the compilation errors of not allowed DESC PARTITION

2022-04-18 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-38689:


Assignee: huangtengfei

> Use error classes in the compilation errors of not allowed DESC PARTITION
> -
>
> Key: SPARK-38689
> URL: https://issues.apache.org/jira/browse/SPARK-38689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: huangtengfei
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * descPartitionNotAllowedOnTempView
> * descPartitionNotAllowedOnView
> * descPartitionNotAllowedOnViewError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38900) DS V2 supports push down collection functions

2022-04-18 Thread Zhixiong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523563#comment-17523563
 ] 

Zhixiong Chen commented on SPARK-38900:
---

Yes. I can do this job.

> DS V2 supports push down collection functions
> -
>
> Key: SPARK-38900
> URL: https://issues.apache.org/jira/browse/SPARK-38900
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38901) DS V2 supports push down misc functions

2022-04-18 Thread Zhixiong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523562#comment-17523562
 ] 

Zhixiong Chen commented on SPARK-38901:
---

Yes. I can do this job.

> DS V2 supports push down misc functions
> ---
>
> Key: SPARK-38901
> URL: https://issues.apache.org/jira/browse/SPARK-38901
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38899) DS V2 supports push down datetime functions

2022-04-18 Thread Zhixiong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523564#comment-17523564
 ] 

Zhixiong Chen commented on SPARK-38899:
---

Yes. I can do this job.

> DS V2 supports push down datetime functions
> ---
>
> Key: SPARK-38899
> URL: https://issues.apache.org/jira/browse/SPARK-38899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38897) DS V2 supports push down string functions

2022-04-18 Thread Zhixiong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523566#comment-17523566
 ] 

Zhixiong Chen commented on SPARK-38897:
---

Yes. I can do this job.

> DS V2 supports push down string functions
> -
>
> Key: SPARK-38897
> URL: https://issues.apache.org/jira/browse/SPARK-38897
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhixiong Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38912) Clean "classproperty" workaround in pyspark.sql.session once support for Python 3.8 is dropped

2022-04-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38912:
-
Priority: Major  (was: Minor)

> Clean "classproperty" workaround in pyspark.sql.session once support for 
> Python 3.8 is dropped
> --
>
> Key: SPARK-38912
> URL: https://issues.apache.org/jira/browse/SPARK-38912
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.2.1
>Reporter: Furcy Pin
>Priority: Major
>
> The proper bugfix of SPARK-38870 uses a feature of Python 3.9 
> and had to use a workaround for earlier version (cf TODOs in the 
> [pyspark.sql.session module|https://github.com/apache/spark/pull/36161/files])
> Once support for Python 3.8 is officially dropped, we should remove this 
> workaround.
> _Python 3.8 will end-of-life is 2024/10._



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38912) Clean "classproperty" workaround in pyspark.sql.session once support for Python 3.8 is dropped

2022-04-18 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38912:
-
Priority: Minor  (was: Major)

> Clean "classproperty" workaround in pyspark.sql.session once support for 
> Python 3.8 is dropped
> --
>
> Key: SPARK-38912
> URL: https://issues.apache.org/jira/browse/SPARK-38912
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.2.1
>Reporter: Furcy Pin
>Priority: Minor
>
> The proper bugfix of SPARK-38870 uses a feature of Python 3.9 
> and had to use a workaround for earlier version (cf TODOs in the 
> [pyspark.sql.session module|https://github.com/apache/spark/pull/36161/files])
> Once support for Python 3.8 is officially dropped, we should remove this 
> workaround.
> _Python 3.8 will end-of-life is 2024/10._



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

2022-04-18 Thread gaokui (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523476#comment-17523476
 ] 

gaokui edited comment on SPARK-38812 at 4/18/22 8:05 AM:
-

I see  SPARK-2373, SPARK-6664

actually i can get method , use once time dag job to calcute, not twice.

for example :

val intRDD=sc.makeRDD(Array(1,2,3,4,5,6))
intRDD.foreachPartition(iter=>

{ val (it1,it2)=iter.patition(x=>x<=3) saveQualityError(it1)   //but right here 
can not use rdd.savetextfile, need write store policy with interaltime and 
writing size. saveQualityGood(it2)  //but right here can not use 
rdd.savetextfile, need write store policy with interaltime and writing size. 
//and more serious problem short bucket effect. one patition good data is less, 
worse data is more. then one write method will wait another method. }

)

 

but this method cause will not use a lot of rdd api. recycle copy code that rdd 
include. like flush witer to hdfs.


was (Author: sungaok):
I see  SPARK-2373, SPARK-6664 

actually i can get more better method than two, use once time job to calcute, 
not twice.

for example :

val intRDD=sc.makeRDD(Array(1,2,3,4,5,6))
intRDD.foreachPartition(iter=>{
val (it1,it2)=iter.patition(x=>x<=3)
saveQualityError(it1)   //but right here can not use rdd.savetextfile, need 
write store policy with interaltime and writing size.
saveQualityGood(it2)  //but right here can not use rdd.savetextfile, need write 
store policy with interaltime and writing size.

//and more serious problem short bucket effect. one patition good data is less, 
worse data is more. then one write method will wait another method.
})

> when i clean data ,I hope one rdd spill two rdd according clean data rule
> -
>
> Key: SPARK-38812
> URL: https://issues.apache.org/jira/browse/SPARK-38812
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: gaokui
>Priority: Major
>
> when id do clean data,one rdd according one value(>or <) filter data, and 
> then generate two different set，one is error data file， another is errorless 
> data file.
> Now I use filter, but this filter must have two spark dag job, that cost too 
> much.
> exactly some code like iterator.span(preidicate) and then return one 
> tuple(iter1,iter2)
> one dataset will be spilted tow dataset in one rule data clean progress.
> i hope compute once not twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

2022-04-18 Thread gaokui (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaokui reopened SPARK-38812:


you can see my attach 

> when i clean data ,I hope one rdd spill two rdd according clean data rule
> -
>
> Key: SPARK-38812
> URL: https://issues.apache.org/jira/browse/SPARK-38812
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: gaokui
>Priority: Major
>
> when id do clean data,one rdd according one value(>or <) filter data, and 
> then generate two different set，one is error data file， another is errorless 
> data file.
> Now I use filter, but this filter must have two spark dag job, that cost too 
> much.
> exactly some code like iterator.span(preidicate) and then return one 
> tuple(iter1,iter2)
> one dataset will be spilted tow dataset in one rule data clean progress.
> i hope compute once not twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38746) Move the tests for `PARSE_EMPTY_STATEMENT` to QueryParsingErrorsSuite

2022-04-18 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-38746.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36221
[https://github.com/apache/spark/pull/36221]

> Move the tests for `PARSE_EMPTY_STATEMENT` to QueryParsingErrorsSuite
> -
>
> Key: SPARK-38746
> URL: https://issues.apache.org/jira/browse/SPARK-38746
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: panbingkun
>Priority: Major
> Fix For: 3.4.0
>
>
> Move tests for the error class *PARSE_EMPTY_STATEMENT* from ErrorParserSuite 
> to QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38746) Move the tests for `PARSE_EMPTY_STATEMENT` to QueryParsingErrorsSuite

2022-04-18 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-38746:


Assignee: panbingkun

> Move the tests for `PARSE_EMPTY_STATEMENT` to QueryParsingErrorsSuite
> -
>
> Key: SPARK-38746
> URL: https://issues.apache.org/jira/browse/SPARK-38746
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: panbingkun
>Priority: Major
>
> Move tests for the error class *PARSE_EMPTY_STATEMENT* from ErrorParserSuite 
> to QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: spark-ui.png

> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: spark-env.sh

> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: spark-default.conf

> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)

sichun zhai created SPARK-38930:
---

 Summary: spark  Executors  status always is KILLED
 Key: SPARK-38930
 URL: https://issues.apache.org/jira/browse/SPARK-38930
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2, 3.1.3
Reporter: sichun zhai
 Attachments: spark-default.conf, spark-env.sh, spark-ui.png

In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi or 
other spark program ,the ui always show Executors  status is killed

haved patch  [https://github.com/apache/spark/pull/12012]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: stderr

> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38931) RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint

2022-04-18 Thread Yun Tang (Jira)

Yun Tang created SPARK-38931:


 Summary: RocksDB File manager would not create initial dfs 
directory with unknown number of keys on 1st empty checkpoint
 Key: SPARK-38931
 URL: https://issues.apache.org/jira/browse/SPARK-38931
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.2.1
Reporter: Yun Tang
 Fix For: 3.2.2


Currently, we could disable to track the number of keys for performance when 
using RocksDB state store. However, if the 1st checkpoint is empty, it will not 
create the root dfs directory, which leads to exception below:


{code:java}
File 
/private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
 does not exist
java.io.FileNotFoundException: File 
/private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
at 
org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
at 
org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:93)
at 
org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:353)
at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:400)
at 
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:703)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:140)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:143)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile(RocksDBFileManager.scala:438)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.saveCheckpointToDfs(RocksDBFileManager.scala:174)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBSuite.saveCheckpointFiles(RocksDBSuite.scala:566)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$35(RocksDBSuite.scala:179)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
at 
org.scalatest.funsuite.AnyFunSuite

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: (was: stderr)

> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: stderr

> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Description: 
In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi or 
other spark program ,the ui always show Executors  status is killed

haved patch  [https://github.com/apache/spark/pull/12012]

run SparkPi commad:

/opt/app/applications/bd-spark/bin/run-example  --class 
org.apache.spark.examples.SparkPi  --master 
spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
--driver-java-options 
"-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"
 --conf 
spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"

 

  was:
In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi or 
other spark program ,the ui always show Executors  status is killed

haved patch  [https://github.com/apache/spark/pull/12012]

 


> spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
> run SparkPi commad:
> /opt/app/applications/bd-spark/bin/run-example  --class 
> org.apache.spark.examples.SparkPi  --master 
> spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
> --driver-java-options 
> "-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"
>  --conf 
> spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) Spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Summary: Spark  Executors  status always is KILLED  (was: spark  Executors  
status always is KILLED)

> Spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> haved patch  [https://github.com/apache/spark/pull/12012]
> run SparkPi commad:
> /opt/app/applications/bd-spark/bin/run-example  --class 
> org.apache.spark.examples.SparkPi  --master 
> spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
> --driver-java-options 
> "-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"
>  --conf 
> spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38931) RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38931:


Assignee: (was: Apache Spark)

> RocksDB File manager would not create initial dfs directory with unknown 
> number of keys on 1st empty checkpoint
> ---
>
> Key: SPARK-38931
> URL: https://issues.apache.org/jira/browse/SPARK-38931
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 3.2.2
>
>
> Currently, we could disable to track the number of keys for performance when 
> using RocksDB state store. However, if the 1st checkpoint is empty, it will 
> not create the root dfs directory, which leads to exception below:
> {code:java}
> File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
> java.io.FileNotFoundException: File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:93)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:353)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:400)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:703)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:140)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:143)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile(RocksDBFileManager.scala:438)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.saveCheckpointToDfs(RocksDBFileManager.scala:174)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.saveCheckpointFiles(RocksDBSuite.scala:566)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$35(RocksDBSuite.scala:179)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
>   at scala.collectio

[jira] [Commented] (SPARK-38931) RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523611#comment-17523611
 ] 

Apache Spark commented on SPARK-38931:
--

User 'Myasuka' has created a pull request for this issue:
https://github.com/apache/spark/pull/36242

> RocksDB File manager would not create initial dfs directory with unknown 
> number of keys on 1st empty checkpoint
> ---
>
> Key: SPARK-38931
> URL: https://issues.apache.org/jira/browse/SPARK-38931
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 3.2.2
>
>
> Currently, we could disable to track the number of keys for performance when 
> using RocksDB state store. However, if the 1st checkpoint is empty, it will 
> not create the root dfs directory, which leads to exception below:
> {code:java}
> File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
> java.io.FileNotFoundException: File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:93)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:353)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:400)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:703)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:140)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:143)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile(RocksDBFileManager.scala:438)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.saveCheckpointToDfs(RocksDBFileManager.scala:174)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.saveCheckpointFiles(RocksDBSuite.scala:566)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$35(RocksDBSuite.scala:179)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233

[jira] [Assigned] (SPARK-38931) RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38931:


Assignee: Apache Spark

> RocksDB File manager would not create initial dfs directory with unknown 
> number of keys on 1st empty checkpoint
> ---
>
> Key: SPARK-38931
> URL: https://issues.apache.org/jira/browse/SPARK-38931
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yun Tang
>Assignee: Apache Spark
>Priority: Critical
> Fix For: 3.2.2
>
>
> Currently, we could disable to track the number of keys for performance when 
> using RocksDB state store. However, if the 1st checkpoint is empty, it will 
> not create the root dfs directory, which leads to exception below:
> {code:java}
> File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
> java.io.FileNotFoundException: File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:93)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:353)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:400)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:703)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:140)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:143)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile(RocksDBFileManager.scala:438)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.saveCheckpointToDfs(RocksDBFileManager.scala:174)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.saveCheckpointFiles(RocksDBSuite.scala:566)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$35(RocksDBSuite.scala:179)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
>

[jira] [Commented] (SPARK-38931) RocksDB File manager would not create initial dfs directory with unknown number of keys on 1st empty checkpoint

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523612#comment-17523612
 ] 

Apache Spark commented on SPARK-38931:
--

User 'Myasuka' has created a pull request for this issue:
https://github.com/apache/spark/pull/36242

> RocksDB File manager would not create initial dfs directory with unknown 
> number of keys on 1st empty checkpoint
> ---
>
> Key: SPARK-38931
> URL: https://issues.apache.org/jira/browse/SPARK-38931
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yun Tang
>Priority: Critical
> Fix For: 3.2.2
>
>
> Currently, we could disable to track the number of keys for performance when 
> using RocksDB state store. However, if the 1st checkpoint is empty, it will 
> not create the root dfs directory, which leads to exception below:
> {code:java}
> File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
> java.io.FileNotFoundException: File 
> /private/var/folders/rk/wyr101_562ngn8lp7tbqt7_0gp/T/spark-ce4a0607-b1d8-43b8-becd-638c6b030019/state/1/1
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:93)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:353)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:400)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:703)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:140)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.(CheckpointFileManager.scala:143)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.zipToDfsFile(RocksDBFileManager.scala:438)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.saveCheckpointToDfs(RocksDBFileManager.scala:174)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.saveCheckpointFiles(RocksDBSuite.scala:566)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$35(RocksDBSuite.scala:179)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233

[jira] [Updated] (SPARK-38930) Spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Description: 
In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi or 
other spark program ,the ui always show Executors  status is killed

spark worker error log :

22/04/18 17:08:27 INFO Worker: Asked to kill executor app-20220418170822-0039/0
22/04/18 17:08:27 INFO ExecutorRunner: Runner thread for executor 
app-20220418170822-0039/0 interrupted
22/04/18 17:08:27 INFO ExecutorRunner: Killing process!
22/04/18 17:08:27 DEBUG SizeBasedRollingPolicy: 55 + 18896 > 1073741824
22/04/18 17:08:27 DEBUG RollingFileAppender: Closed file 
/opt/spark/work/app-20220418170822-0039/0/stderr
22/04/18 17:08:27 DEBUG RollingFileAppender: Closed file 
/opt/spark/work/app-20220418170822-0039/0/stdout
22/04/18 17:08:27 INFO ExecutorRunner: exitCode:Some(143)
22/04/18 17:08:27 INFO Worker: Executor app-20220418170822-0039/0 finished with 
state KILLED exitStatus 143

 

haved patch  [https://github.com/apache/spark/pull/12012]

run SparkPi commad:

/opt/app/applications/bd-spark/bin/run-example  -{-}class 
org.apache.spark.examples.SparkPi{-}  -master 
spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
--driver-java-options 
"-Dlog4j.configuration=[file:/opt/app/applications/bd-spark/conf/log4j.properties|file://opt/app/applications/bd-spark/conf/log4j.properties]"
 --conf 
spark.executor.extraJavaOptions="-Dlog4j.configuration=[file:/opt/app/applications/bd-spark/conf/log4j.properties|file://opt/app/applications/bd-spark/conf/log4j.properties]"

 

  was:
In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi or 
other spark program ,the ui always show Executors  status is killed

haved patch  [https://github.com/apache/spark/pull/12012]

run SparkPi commad:

/opt/app/applications/bd-spark/bin/run-example  --class 
org.apache.spark.examples.SparkPi  --master 
spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
--driver-java-options 
"-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"
 --conf 
spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/opt/app/applications/bd-spark/conf/log4j.properties"

 


> Spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: spark-default.conf, spark-env.sh, spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> spark worker error log :
> 22/04/18 17:08:27 INFO Worker: Asked to kill executor 
> app-20220418170822-0039/0
> 22/04/18 17:08:27 INFO ExecutorRunner: Runner thread for executor 
> app-20220418170822-0039/0 interrupted
> 22/04/18 17:08:27 INFO ExecutorRunner: Killing process!
> 22/04/18 17:08:27 DEBUG SizeBasedRollingPolicy: 55 + 18896 > 1073741824
> 22/04/18 17:08:27 DEBUG RollingFileAppender: Closed file 
> /opt/spark/work/app-20220418170822-0039/0/stderr
> 22/04/18 17:08:27 DEBUG RollingFileAppender: Closed file 
> /opt/spark/work/app-20220418170822-0039/0/stdout
> 22/04/18 17:08:27 INFO ExecutorRunner: exitCode:Some(143)
> 22/04/18 17:08:27 INFO Worker: Executor app-20220418170822-0039/0 finished 
> with state KILLED exitStatus 143
>  
> haved patch  [https://github.com/apache/spark/pull/12012]
> run SparkPi commad:
> /opt/app/applications/bd-spark/bin/run-example  -{-}class 
> org.apache.spark.examples.SparkPi{-}  -master 
> spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
> --driver-java-options 
> "-Dlog4j.configuration=[file:/opt/app/applications/bd-spark/conf/log4j.properties|file://opt/app/applications/bd-spark/conf/log4j.properties]"
>  --conf 
> spark.executor.extraJavaOptions="-Dlog4j.configuration=[file:/opt/app/applications/bd-spark/conf/log4j.properties|file://opt/app/applications/bd-spark/conf/log4j.properties]"
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38930) Spark Executors status always is KILLED

2022-04-18 Thread sichun zhai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sichun zhai updated SPARK-38930:

Attachment: driver.log

> Spark  Executors  status always is KILLED
> -
>
> Key: SPARK-38930
> URL: https://issues.apache.org/jira/browse/SPARK-38930
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3
>Reporter: sichun zhai
>Priority: Major
> Attachments: driver.log, spark-default.conf, spark-env.sh, 
> spark-ui.png, stderr
>
>
> In  standalone deploy mode, try run spark org.apache.spark.examples.SparkPi 
> or other spark program ,the ui always show Executors  status is killed
> spark worker error log :
> 22/04/18 17:08:27 INFO Worker: Asked to kill executor 
> app-20220418170822-0039/0
> 22/04/18 17:08:27 INFO ExecutorRunner: Runner thread for executor 
> app-20220418170822-0039/0 interrupted
> 22/04/18 17:08:27 INFO ExecutorRunner: Killing process!
> 22/04/18 17:08:27 DEBUG SizeBasedRollingPolicy: 55 + 18896 > 1073741824
> 22/04/18 17:08:27 DEBUG RollingFileAppender: Closed file 
> /opt/spark/work/app-20220418170822-0039/0/stderr
> 22/04/18 17:08:27 DEBUG RollingFileAppender: Closed file 
> /opt/spark/work/app-20220418170822-0039/0/stdout
> 22/04/18 17:08:27 INFO ExecutorRunner: exitCode:Some(143)
> 22/04/18 17:08:27 INFO Worker: Executor app-20220418170822-0039/0 finished 
> with state KILLED exitStatus 143
>  
> haved patch  [https://github.com/apache/spark/pull/12012]
> run SparkPi commad:
> /opt/app/applications/bd-spark/bin/run-example  -{-}class 
> org.apache.spark.examples.SparkPi{-}  -master 
> spark://10.205.90.120:7077,10.205.90.131:7077 --deploy-mode cluster 
> --driver-java-options 
> "-Dlog4j.configuration=[file:/opt/app/applications/bd-spark/conf/log4j.properties|file://opt/app/applications/bd-spark/conf/log4j.properties]"
>  --conf 
> spark.executor.extraJavaOptions="-Dlog4j.configuration=[file:/opt/app/applications/bd-spark/conf/log4j.properties|file://opt/app/applications/bd-spark/conf/log4j.properties]"
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38932) Datasource v2 support report unique keys

2022-04-18 Thread XiDuo You (Jira)

XiDuo You created SPARK-38932:
-

 Summary: Datasource v2 support report unique keys
 Key: SPARK-38932
 URL: https://issues.apache.org/jira/browse/SPARK-38932
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


Datasource v2 can be used to connect to some databases who support [*unique 
key*|https://en.wikipedia.org/wiki/Unique_key].

Spark catalyst optimizer support do further optimization through unique keys. 
So it can improve the performance if the Scan reports its unique keys to Spark.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38933) Add examples of window functions into SQL docs

2022-04-18 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-38933:
--

 Summary: Add examples of window functions into SQL docs
 Key: SPARK-38933
 URL: https://issues.apache.org/jira/browse/SPARK-38933
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


Currently, Spark SQL docs only display the aggregate functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38933) Add examples of window functions into SQL docs

2022-04-18 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38933:
---
Description: Currently, Spark SQL docs only display the window functions.  
(was: Currently, Spark SQL docs only display the aggregate functions.)

> Add examples of window functions into SQL docs
> --
>
> Key: SPARK-38933
> URL: https://issues.apache.org/jira/browse/SPARK-38933
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark SQL docs only display the window functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38933) Add examples of window functions into SQL docs

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38933:


Assignee: (was: Apache Spark)

> Add examples of window functions into SQL docs
> --
>
> Key: SPARK-38933
> URL: https://issues.apache.org/jira/browse/SPARK-38933
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark SQL docs only display the window functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38933) Add examples of window functions into SQL docs

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523628#comment-17523628
 ] 

Apache Spark commented on SPARK-38933:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36243

> Add examples of window functions into SQL docs
> --
>
> Key: SPARK-38933
> URL: https://issues.apache.org/jira/browse/SPARK-38933
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark SQL docs only display the window functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38933) Add examples of window functions into SQL docs

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38933:


Assignee: Apache Spark

> Add examples of window functions into SQL docs
> --
>
> Key: SPARK-38933
> URL: https://issues.apache.org/jira/browse/SPARK-38933
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark SQL docs only display the window functions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-04-18 Thread Lily (Jira)

Lily created SPARK-38934:


 Summary: Provider TemporaryAWSCredentialsProvider has no 
credentials
 Key: SPARK-38934
 URL: https://issues.apache.org/jira/browse/SPARK-38934
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Spark Core
Affects Versions: 3.2.1
Reporter: Lily


 

We are using Jupyter Hub on K8s as a notebook based development environment and 
Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 and 
Hadoop 3.3.1.

When we run a code like the one below in the Jupyter Hub on K8s,

 
{code:java}
val perm = ... // get AWS temporary credential by AWS STS from AWS assumed role

// set AWS temporary credential
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
"org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
perm.credential.accessKeyID)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
perm.credential.secretAccessKey)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
perm.credential.sessionToken)

// execute simple Spark action
spark.read.format("parquet").load("s3a:///*").show(1) {code}
 

 

we got warning like the one below in the first code execution, but we were able 
to get the proper result thanks to Spark task retry function. 
{code:java}
22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
(10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
s3a:///.parquet: 
org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
TemporaryAWSCredentialsProvider has no credentials
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
at 
org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
TemporaryAWSCredentialsProvider has no credentials
at 
org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:544

[jira] [Updated] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-04-18 Thread Lily (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lily updated SPARK-38934:
-
Description: 
 

We are using Jupyter Hub on K8s as a notebook based development environment and 
Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 and 
Hadoop 3.3.1.

When we run a code like the one below in the Jupyter Hub on K8s,

 
{code:java}
val perm = ... // get AWS temporary credential by AWS STS from AWS assumed role

// set AWS temporary credential
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
"org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
perm.credential.accessKeyID)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
perm.credential.secretAccessKey)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
perm.credential.sessionToken)

// execute simple Spark action
spark.read.format("parquet").load("s3a:///*").show(1) {code}
 

 

the first few executors left a warning like the one below in the first code 
execution, but we were able to get the proper result thanks to Spark task retry 
function. 
{code:java}
22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
(10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
s3a:///.parquet: 
org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
TemporaryAWSCredentialsProvider has no credentials
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
at 
org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
TemporaryAWSCredentialsProvider has no credentials
at 
org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5445)
at 
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6420)
at 
com.amazonaws.services.s3.AmazonS3Client.fetchRegion

[jira] [Updated] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-04-18 Thread Lily (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lily updated SPARK-38934:
-
Description: 
 

We are using Jupyter Hub on K8s as a notebook based development environment and 
Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 and 
Hadoop 3.3.1.

When we run a code like the one below in the Jupyter Hub on K8s,

 
{code:java}
val perm = ... // get AWS temporary credential by AWS STS from AWS assumed role

// set AWS temporary credential
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
"org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
perm.credential.accessKeyID)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
perm.credential.secretAccessKey)
spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
perm.credential.sessionToken)

// execute simple Spark action
spark.read.format("parquet").load("s3a:///*").show(1) {code}
 

 

first few executors left a warning like the one below in the first code 
execution, but we were able to get the proper result thanks to Spark task retry 
function. 
{code:java}
22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
(10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
s3a:///.parquet: 
org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
TemporaryAWSCredentialsProvider has no credentials
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
at 
org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
TemporaryAWSCredentialsProvider has no credentials
at 
org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5445)
at 
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6420)
at 
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFrom

[jira] [Commented] (SPARK-38904) Low cost DataFrame schema swap util

2022-04-18 Thread Rafal Wojdyla (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523636#comment-17523636
 ] 

Rafal Wojdyla commented on SPARK-38904:
---

[~hyukjin.kwon] ok, will give it a shot, and ping you if I get stuck. Also if 
you have any immediate tips would appreciate it.

> Low cost DataFrame schema swap util
> ---
>
> Key: SPARK-38904
> URL: https://issues.apache.org/jira/browse/SPARK-38904
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> This question is related to [https://stackoverflow.com/a/37090151/1661491]. 
> Let's assume I have a pyspark DataFrame with certain schema, and I would like 
> to overwrite that schema with a new schema that I *know* is compatible, I 
> could do:
> {code:python}
> df: DataFrame
> new_schema = ...
> df.rdd.toDF(schema=new_schema)
> {code}
> Unfortunately this triggers computation as described in the link above. Is 
> there a way to do that at the metadata level (or lazy), without eagerly 
> triggering computation or conversions?
> Edit, note:
>  * the schema can be arbitrarily complicated (nested etc)
>  * new schema includes updates to description, nullability and additional 
> metadata (bonus points for updates to the type)
>  * I would like to avoid writing a custom query expression generator, 
> *unless* there's one already built into Spark that can generate query based 
> on the schema/{{{}StructType{}}}
> Copied from: 
> [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan]
> See POC of workaround/util in 
> [https://github.com/ravwojdyla/spark-schema-utils]
> Also posted in 
> [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nickolay (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523657#comment-17523657
 ] 

Nickolay commented on SPARK-38816:
--

Dear, [~srowen]. We call this method before choosing a solver. Thus, if we want 
to use a non-negative matrix factorization(NNLS solver), its initial matrices

userFactors and itemFactors can contians negative values. Could this be a 
problem?

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nickolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nickolay (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523657#comment-17523657
 ] 

Nickolay edited comment on SPARK-38816 at 4/18/22 11:13 AM:


Dear, [~srowen]. We call this method before choosing a solver. Thus, if we want 
to use a non-negative matrix factorization(NNLS solver), its initial matrices 
(userFactors and itemFactors) can contians negative values. Could this be a 
problem?


was (Author: JIRAUSER287733):
Dear, [~srowen]. We call this method before choosing a solver. Thus, if we want 
to use a non-negative matrix factorization(NNLS solver), its initial matrices

userFactors and itemFactors can contians negative values. Could this be a 
problem?

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nickolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-04-18 Thread Lily (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lily updated SPARK-38934:
-
Priority: Major  (was: Critical)

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java

[jira] [Updated] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-04-18 Thread Lily (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lily updated SPARK-38934:
-
Priority: Critical  (was: Major)

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Critical
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.j

[jira] [Created] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-38935:
--

 Summary: Improve the exception type and message of casting string 
to numbers
 Key: SPARK-38935
 URL: https://issues.apache.org/jira/browse/SPARK-38935
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


# change the exception type from "java.lang.NumberFormatException" to 
SparkNumberFormatException

 # Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-38935:
---
Description: 
# change the exception type from "java.lang.NumberFormatException" to 
SparkNumberFormatException

       2. Show the exact target data type in the error message

  was:
# change the exception type from "java.lang.NumberFormatException" to 
SparkNumberFormatException

 # Show the exact target data type in the error message


> Improve the exception type and message of casting string to numbers
> ---
>
> Key: SPARK-38935
> URL: https://issues.apache.org/jira/browse/SPARK-38935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> # change the exception type from "java.lang.NumberFormatException" to 
> SparkNumberFormatException
>        2. Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523668#comment-17523668
 ] 

Apache Spark commented on SPARK-38935:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36244

> Improve the exception type and message of casting string to numbers
> ---
>
> Key: SPARK-38935
> URL: https://issues.apache.org/jira/browse/SPARK-38935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> # change the exception type from "java.lang.NumberFormatException" to 
> SparkNumberFormatException
>        2. Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38935:


Assignee: Gengliang Wang  (was: Apache Spark)

> Improve the exception type and message of casting string to numbers
> ---
>
> Key: SPARK-38935
> URL: https://issues.apache.org/jira/browse/SPARK-38935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> # change the exception type from "java.lang.NumberFormatException" to 
> SparkNumberFormatException
>        2. Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38935:


Assignee: Apache Spark  (was: Gengliang Wang)

> Improve the exception type and message of casting string to numbers
> ---
>
> Key: SPARK-38935
> URL: https://issues.apache.org/jira/browse/SPARK-38935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> # change the exception type from "java.lang.NumberFormatException" to 
> SparkNumberFormatException
>        2. Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38935:


Assignee: Apache Spark  (was: Gengliang Wang)

> Improve the exception type and message of casting string to numbers
> ---
>
> Key: SPARK-38935
> URL: https://issues.apache.org/jira/browse/SPARK-38935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> # change the exception type from "java.lang.NumberFormatException" to 
> SparkNumberFormatException
>        2. Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523669#comment-17523669
 ] 

Sean R. Owen commented on SPARK-38816:
--

I don't think so - ALS doesn't need nonnegative weights. It uses a QR 
decomposition (I think?) but that's fine.

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38693) Spark does not use SessionManager

2022-04-18 Thread Brad Solomon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523674#comment-17523674
 ] 

Brad Solomon commented on SPARK-38693:
--

Keycloak is an open source IAM solution that allows single-sign on (SSO) to 
Spark. In my view this is not about supporting Keycloak specifically (I do not 
have any affiliation with Keycloak, FYI) but rather supporting a popular 
authentication provider that in turn supports multiple protocols such as SAML 
2.0 and OpenID Connect. I suspect that multiple authentication providers 
leverage SessionManager rather than Keycloak being an outlier.

> Spark does not use SessionManager
> -
>
> Key: SPARK-38693
> URL: https://issues.apache.org/jira/browse/SPARK-38693
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.1
>Reporter: Brad Solomon
>Priority: Major
>
> Spark's failure to use a `SessionManager` causes 
> `java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
> from being used with 
> [org.keycloak.adapters.servlet.KeycloakOIDCFilter|#_servlet_filter_adapter] 
> as the `spark.ui.filters` class.
>  
> Sample logs:
>  
> {code:java}
> spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
> http://REDACTED/auth/realms/master/.well-known/openid-configuration
> spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
> spark_1 | java.lang.IllegalStateException: No SessionManager{code}
>  
> Configuration:
>  
>  
> {code:java}
> spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
> spark.acls.enable=true
> spark.admin.acls=*
> spark.ui.view.acls=*
> spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
>  
> {code}
>  
> Above `spark-keycloak.json` contains configuration generated in the Keycloak 
> admin console. We can see that Spark gets as far as allowing the 
> KeycloakOIDCFilter class to read this file and initiate communication with 
> keycloak.
>  
> This IllegalStateException exception emanates from Jetty:
>  
> [https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]
>  
> It appears that Spark's `ServletContextHandler` has the ability to use a 
> `SessionManager` but doesn't. This seems to be a blocker that prevents 
> integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nikolay (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523684#comment-17523684
 ] 

Nikolay commented on SPARK-38816:
-

Its not use QR decomposition. From discription we have "The method used is 
similar to one described by Polyak (B. T. Polyak, The conjugate gradient method 
in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112)". It 
just seemed strange to me that we want to have non-negative matrices at each 
step, but we take the initial matrices with negative weights. Perhaps this 
affects the speed of convergence. In any case, thanks for your comments.

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nikolay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay updated SPARK-38816:

Attachment: image-2022-04-18-15-54-15-474.png

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nikolay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay updated SPARK-38816:

Attachment: image-2022-04-18-15-54-06-679.png

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nikolay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay updated SPARK-38816:

Attachment: (was: image-2022-04-18-15-54-15-474.png)

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Nikolay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay updated SPARK-38816:

Attachment: (was: image-2022-04-18-15-54-06-679.png)

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38936) Script transform feed thread should have name

2022-04-18 Thread dzcxzl (Jira)

dzcxzl created SPARK-38936:
--

 Summary: Script transform feed thread should have name
 Key: SPARK-38936
 URL: https://issues.apache.org/jira/browse/SPARK-38936
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1, 3.1.1
Reporter: dzcxzl


Lost feed thread name after SPARK-32105 refactoring



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-38693) Spark does not use SessionManager

2022-04-18 Thread Brad Solomon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523685#comment-17523685
 ] 

Brad Solomon edited comment on SPARK-38693 at 4/18/22 1:00 PM:
---

Another argument in favor of supporting is that Spark claims,

> Enabling authentication for the Web UIs is done using [javax servlet 
> filters|https://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html]. You 
> will need a filter that implements the authentication method you want to 
> deploy.

This claim of support doesn't make caveats about the lack of use of 
SessionManager. It simply states that authentication can be enabled through 
javax servlet filters.Keycloak offers this through 
[KeycloakOIDCFilter|#_servlet_filter_adapter],] but yet is not supported. 


was (Author: JIRAUSER287274):
Another argument in favor of supporting is that Spark claims

> Enabling authentication for the Web UIs is done using [javax servlet 
> filters|https://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html]. You 
> will need a filter that implements the authentication method you want to 
> deploy.

This claim of support doesn't make caveats about the lack of use of 
SessionManager. It simply states that authentication can be enabled through 
javax servlet filters.Keycloak offers this through 
[KeycloakOIDCFilter|[https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter],]
 but yet is not supported. 

> Spark does not use SessionManager
> -
>
> Key: SPARK-38693
> URL: https://issues.apache.org/jira/browse/SPARK-38693
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.1
>Reporter: Brad Solomon
>Priority: Major
>
> Spark's failure to use a `SessionManager` causes 
> `java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
> from being used with 
> [org.keycloak.adapters.servlet.KeycloakOIDCFilter|#_servlet_filter_adapter] 
> as the `spark.ui.filters` class.
>  
> Sample logs:
>  
> {code:java}
> spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
> http://REDACTED/auth/realms/master/.well-known/openid-configuration
> spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
> spark_1 | java.lang.IllegalStateException: No SessionManager{code}
>  
> Configuration:
>  
>  
> {code:java}
> spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
> spark.acls.enable=true
> spark.admin.acls=*
> spark.ui.view.acls=*
> spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
>  
> {code}
>  
> Above `spark-keycloak.json` contains configuration generated in the Keycloak 
> admin console. We can see that Spark gets as far as allowing the 
> KeycloakOIDCFilter class to read this file and initiate communication with 
> keycloak.
>  
> This IllegalStateException exception emanates from Jetty:
>  
> [https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]
>  
> It appears that Spark's `ServletContextHandler` has the ability to use a 
> `SessionManager` but doesn't. This seems to be a blocker that prevents 
> integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38936) Script transform feed thread should have name

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523686#comment-17523686
 ] 

Apache Spark commented on SPARK-38936:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/36245

> Script transform feed thread should have name
> -
>
> Key: SPARK-38936
> URL: https://issues.apache.org/jira/browse/SPARK-38936
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.2.1
>Reporter: dzcxzl
>Priority: Trivial
>
> Lost feed thread name after SPARK-32105 refactoring



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38693) Spark does not use SessionManager

2022-04-18 Thread Brad Solomon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523685#comment-17523685
 ] 

Brad Solomon commented on SPARK-38693:
--

Another argument in favor of supporting is that Spark claims

> Enabling authentication for the Web UIs is done using [javax servlet 
> filters|https://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html]. You 
> will need a filter that implements the authentication method you want to 
> deploy.

This claim of support doesn't make caveats about the lack of use of 
SessionManager. It simply states that authentication can be enabled through 
javax servlet filters.Keycloak offers this through 
[KeycloakOIDCFilter|[https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter],]
 but yet is not supported. 

> Spark does not use SessionManager
> -
>
> Key: SPARK-38693
> URL: https://issues.apache.org/jira/browse/SPARK-38693
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.1
>Reporter: Brad Solomon
>Priority: Major
>
> Spark's failure to use a `SessionManager` causes 
> `java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
> from being used with 
> [org.keycloak.adapters.servlet.KeycloakOIDCFilter|#_servlet_filter_adapter] 
> as the `spark.ui.filters` class.
>  
> Sample logs:
>  
> {code:java}
> spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
> http://REDACTED/auth/realms/master/.well-known/openid-configuration
> spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
> spark_1 | java.lang.IllegalStateException: No SessionManager{code}
>  
> Configuration:
>  
>  
> {code:java}
> spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
> spark.acls.enable=true
> spark.admin.acls=*
> spark.ui.view.acls=*
> spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
>  
> {code}
>  
> Above `spark-keycloak.json` contains configuration generated in the Keycloak 
> admin console. We can see that Spark gets as far as allowing the 
> KeycloakOIDCFilter class to read this file and initiate communication with 
> keycloak.
>  
> This IllegalStateException exception emanates from Jetty:
>  
> [https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]
>  
> It appears that Spark's `ServletContextHandler` has the ability to use a 
> `SessionManager` but doesn't. This seems to be a blocker that prevents 
> integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38936) Script transform feed thread should have name

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38936:


Assignee: (was: Apache Spark)

> Script transform feed thread should have name
> -
>
> Key: SPARK-38936
> URL: https://issues.apache.org/jira/browse/SPARK-38936
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.2.1
>Reporter: dzcxzl
>Priority: Trivial
>
> Lost feed thread name after SPARK-32105 refactoring



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38936) Script transform feed thread should have name

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38936:


Assignee: Apache Spark

> Script transform feed thread should have name
> -
>
> Key: SPARK-38936
> URL: https://issues.apache.org/jira/browse/SPARK-38936
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.2.1
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
> Lost feed thread name after SPARK-32105 refactoring



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38937) interpolate support param `limit_direction`

2022-04-18 Thread zhengruifeng (Jira)

zhengruifeng created SPARK-38937:


 Summary: interpolate support param `limit_direction`
 Key: SPARK-38937
 URL: https://issues.apache.org/jira/browse/SPARK-38937
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38937) interpolate support param `limit_direction`

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523696#comment-17523696
 ] 

Apache Spark commented on SPARK-38937:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/36246

> interpolate support param `limit_direction`
> ---
>
> Key: SPARK-38937
> URL: https://issues.apache.org/jira/browse/SPARK-38937
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38937) interpolate support param `limit_direction`

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38937:


Assignee: Apache Spark

> interpolate support param `limit_direction`
> ---
>
> Key: SPARK-38937
> URL: https://issues.apache.org/jira/browse/SPARK-38937
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38937) interpolate support param `limit_direction`

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38937:


Assignee: (was: Apache Spark)

> interpolate support param `limit_direction`
> ---
>
> Key: SPARK-38937
> URL: https://issues.apache.org/jira/browse/SPARK-38937
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-18 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523711#comment-17523711
 ] 

Sean R. Owen commented on SPARK-38816:
--

Hm, you're right, I didn't see that before. The blocks are solved with NNLS.
Would you care to try changing the init to use absolute value and see what it 
does? I imagine some results change but if it's trivially different, could be 
fine to 'fix'.

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nikolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38935) Improve the exception type and message of casting string to numbers

2022-04-18 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-38935:
---
Priority: Minor  (was: Major)

> Improve the exception type and message of casting string to numbers
> ---
>
> Key: SPARK-38935
> URL: https://issues.apache.org/jira/browse/SPARK-38935
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> # change the exception type from "java.lang.NumberFormatException" to 
> SparkNumberFormatException
>        2. Show the exact target data type in the error message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37643) when charVarcharAsString is true, char datatype partition table query incorrect

2022-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37643:
---

Assignee: YuanGuanhu

> when charVarcharAsString is true, char datatype partition table query 
> incorrect
> ---
>
> Key: SPARK-37643
> URL: https://issues.apache.org/jira/browse/SPARK-37643
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
> Environment: spark 3.2.0
>Reporter: YuanGuanhu
>Assignee: YuanGuanhu
>Priority: Major
>
> This ticket aim at fixing the bug that does not apply right-padding for char 
> types column when charVarcharAsString is true and partition data length is 
> less than defined length.
> For example, a query below returns nothing in master, but a correct result is 
> `abc`.
> {code:java}
> scala> sql("set spark.sql.legacy.charVarcharAsString=true")
> scala> sql("CREATE TABLE tb01(i string, c char(5)) USING parquet partitioned 
> by (c)")
> scala> sql("INSERT INTO tb01 values(1, 'abc')")
> scala> sql("select c from tb01 where c = 'abc'").show
> +---+
> |  c|
> +---+
> +---+{code}
> This is because `ApplyCharTypePadding` rpad the expr to charLength. We should 
> handle this consider conf spark.sql.legacy.charVarcharAsString value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37643) when charVarcharAsString is true, char datatype partition table query incorrect

2022-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37643.
-
Fix Version/s: 3.3.0
   3.2.2
   3.1.3
   Resolution: Fixed

Issue resolved by pull request 36187
[https://github.com/apache/spark/pull/36187]

> when charVarcharAsString is true, char datatype partition table query 
> incorrect
> ---
>
> Key: SPARK-37643
> URL: https://issues.apache.org/jira/browse/SPARK-37643
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
> Environment: spark 3.2.0
>Reporter: YuanGuanhu
>Assignee: YuanGuanhu
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.3
>
>
> This ticket aim at fixing the bug that does not apply right-padding for char 
> types column when charVarcharAsString is true and partition data length is 
> less than defined length.
> For example, a query below returns nothing in master, but a correct result is 
> `abc`.
> {code:java}
> scala> sql("set spark.sql.legacy.charVarcharAsString=true")
> scala> sql("CREATE TABLE tb01(i string, c char(5)) USING parquet partitioned 
> by (c)")
> scala> sql("INSERT INTO tb01 values(1, 'abc')")
> scala> sql("select c from tb01 where c = 'abc'").show
> +---+
> |  c|
> +---+
> +---+{code}
> This is because `ApplyCharTypePadding` rpad the expr to charLength. We should 
> handle this consider conf spark.sql.legacy.charVarcharAsString value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38886) Remove outer join if aggregate functions are duplicate agnostic on streamed side

2022-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38886.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36177
[https://github.com/apache/spark/pull/36177]

> Remove outer join if aggregate functions are duplicate agnostic on streamed 
> side
> 
>
> Key: SPARK-38886
> URL: https://issues.apache.org/jira/browse/SPARK-38886
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> If aggregate child is outer join, and the aggregate references are all coming 
> from the streamed side and the aggregate functions are all duplicate 
> agnostic, we can remve the outer join.
> For example:
> {code:java}
> SELECT t1.c1, min(t1.c2) FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 GROUP BY t1.c1
> ==>
> SELECT t1.c1, min(t1.c2) FROM t1 GROUP BY t1.c1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38886) Remove outer join if aggregate functions are duplicate agnostic on streamed side

2022-04-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38886:
---

Assignee: XiDuo You

> Remove outer join if aggregate functions are duplicate agnostic on streamed 
> side
> 
>
> Key: SPARK-38886
> URL: https://issues.apache.org/jira/browse/SPARK-38886
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> If aggregate child is outer join, and the aggregate references are all coming 
> from the streamed side and the aggregate functions are all duplicate 
> agnostic, we can remve the outer join.
> For example:
> {code:java}
> SELECT t1.c1, min(t1.c2) FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 GROUP BY t1.c1
> ==>
> SELECT t1.c1, min(t1.c2) FROM t1 GROUP BY t1.c1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38926) Output types in error messages in SQL style

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523760#comment-17523760
 ] 

Apache Spark commented on SPARK-38926:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36247

> Output types in error messages in SQL style
> ---
>
> Key: SPARK-38926
> URL: https://issues.apache.org/jira/browse/SPARK-38926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> All types should be printed in SQL style in error messages. For example, the 
> type DateType should be highlighted as DATE to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py

2022-04-18 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523761#comment-17523761
 ] 

Maciej Szymkiewicz commented on SPARK-37015:


Issue resolved by pull request 34324.

https://github.com/apache/spark/pull/34324

> Inline type hints for python/pyspark/streaming/dstream.py
> -
>
> Key: SPARK-37015
> URL: https://issues.apache.org/jira/browse/SPARK-37015
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37093) Inline type hints python/pyspark/streaming

2022-04-18 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37093.

Fix Version/s: 3.3.0
   Resolution: Fixed

> Inline type hints python/pyspark/streaming
> --
>
> Key: SPARK-37093
> URL: https://issues.apache.org/jira/browse/SPARK-37093
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py

2022-04-18 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37015.

Fix Version/s: 3.3.0
   Resolution: Fixed

> Inline type hints for python/pyspark/streaming/dstream.py
> -
>
> Key: SPARK-37015
> URL: https://issues.apache.org/jira/browse/SPARK-37015
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37015) Inline type hints for python/pyspark/streaming/dstream.py

2022-04-18 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37015:
--

Assignee: dch nguyen

> Inline type hints for python/pyspark/streaming/dstream.py
> -
>
> Key: SPARK-37015
> URL: https://issues.apache.org/jira/browse/SPARK-37015
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38885) Upgrade netty to 4.1.76

2022-04-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38885.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36173
[https://github.com/apache/spark/pull/36173]

> Upgrade netty to 4.1.76
> ---
>
> Key: SPARK-38885
> URL: https://issues.apache.org/jira/browse/SPARK-38885
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://netty.io/news/2022/04/12/4-1-76-Final.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38885) Upgrade netty to 4.1.76

2022-04-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-38885:


Assignee: Yang Jie

> Upgrade netty to 4.1.76
> ---
>
> Key: SPARK-38885
> URL: https://issues.apache.org/jira/browse/SPARK-38885
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> https://netty.io/news/2022/04/12/4-1-76-Final.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-04-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-38910:


Assignee: angerszhu

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-04-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-38910:
-
Priority: Minor  (was: Major)

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-04-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38910.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36207
[https://github.com/apache/spark/pull/36207]

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.4.0
>
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38938) Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-18 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-38938:


 Summary: Implement `inplace` and `columns` parameters of 
`Series.drop`
 Key: SPARK-38938
 URL: https://issues.apache.org/jira/browse/SPARK-38938
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Implement `inplace` and `columns` parameters of `Series.drop`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38938) Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523796#comment-17523796
 ] 

Apache Spark commented on SPARK-38938:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/36215

> Implement `inplace` and `columns` parameters of `Series.drop`
> -
>
> Key: SPARK-38938
> URL: https://issues.apache.org/jira/browse/SPARK-38938
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `inplace` and `columns` parameters of `Series.drop`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38938) Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38938:


Assignee: (was: Apache Spark)

> Implement `inplace` and `columns` parameters of `Series.drop`
> -
>
> Key: SPARK-38938
> URL: https://issues.apache.org/jira/browse/SPARK-38938
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `inplace` and `columns` parameters of `Series.drop`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38938) Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38938:


Assignee: Apache Spark

> Implement `inplace` and `columns` parameters of `Series.drop`
> -
>
> Key: SPARK-38938
> URL: https://issues.apache.org/jira/browse/SPARK-38938
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement `inplace` and `columns` parameters of `Series.drop`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38868) `assert_true` fails unconditionnaly after `left_outer` joins

2022-04-18 Thread Bruce Robbins (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-38868:
--
Component/s: SQL

> `assert_true` fails unconditionnaly after `left_outer` joins
> 
>
> Key: SPARK-38868
> URL: https://issues.apache.org/jira/browse/SPARK-38868
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0, 3.4.0
>Reporter: Fabien Dubosson
>Priority: Major
>
> When `assert_true` is used after a `left_outer` join the assert exception is 
> raised even though all the rows meet the condition. Using an `inner` join 
> does not expose this issue.
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as sf
> session = SparkSession.builder.getOrCreate()
> entries = session.createDataFrame(
>     [
>         ("a", 1),
>         ("b", 2),
>         ("c", 3),
>     ],
>     ["id", "outcome_id"],
> )
> outcomes = session.createDataFrame(
>     [
>         (1, 12),
>         (2, 34),
>         (3, 32),
>     ],
>     ["outcome_id", "outcome_value"],
> )
> # Inner join works as expected
> (
>     entries.join(outcomes, on="outcome_id", how="inner")
>     .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
>     .filter(sf.col("valid").isNull())
>     .show()
> )
> # Left join fails with «'('outcome_value > 10)' is not true!» even though it 
> is the case
> (
>     entries.join(outcomes, on="outcome_id", how="left_outer")
>     .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
>     .filter(sf.col("valid").isNull())
>     .show()
> ){code}
> Reproduced on `pyspark` versions: `3.2.1`, `3.2.0`, `3.1.2` and `3.1.1`. I am 
> not sure if "native" Spark exposes this issue as well or not, I don't have 
> the knowledge/setup to test that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523847#comment-17523847
 ] 

Bjørn Jørgensen commented on SPARK-38810:
-

[Black 22.3.0|https://black.readthedocs.io/en/latest/change_log.html] have a 
"Fix Black to work with Click 8.1.0 (#2966)"

One thing to look at her is [how stable is 
black|https://black.readthedocs.io/en/stable/faq.html#how-stable-is-black-s-style]
 

I will push a PR when I wake up. 

> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38939) Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax

2022-04-18 Thread Jackie Zhang (Jira)

Jackie Zhang created SPARK-38939:


 Summary: Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax
 Key: SPARK-38939
 URL: https://issues.apache.org/jira/browse/SPARK-38939
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1, 3.2.0, 3.3.0
Reporter: Jackie Zhang


Current 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38939) Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax

2022-04-18 Thread Jackie Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jackie Zhang updated SPARK-38939:
-
Description: Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will 
always throw error if the column doesn't exist. We would like to provide an (IF 
EXISTS) syntax to provide better user experience, and make consistent with some 
other DMLs such as `DROP TABLE (IF EXISTS)` etc.  (was: Current )

> Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax
> -
>
> Key: SPARK-38939
> URL: https://issues.apache.org/jira/browse/SPARK-38939
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Jackie Zhang
>Priority: Major
>
> Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will always throw error 
> if the column doesn't exist. We would like to provide an (IF EXISTS) syntax 
> to provide better user experience, and make consistent with some other DMLs 
> such as `DROP TABLE (IF EXISTS)` etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB

2022-04-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38851.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36138
[https://github.com/apache/spark/pull/36138]

> Refactor `HistoryServerSuite` to add UTs for RocksDB
> 
>
> Key: SPARK-38851
> URL: https://issues.apache.org/jira/browse/SPARK-38851
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> HistoryServerSuite now only test leveldb backend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB

2022-04-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38851:
-

Assignee: Yang Jie

> Refactor `HistoryServerSuite` to add UTs for RocksDB
> 
>
> Key: SPARK-38851
> URL: https://issues.apache.org/jira/browse/SPARK-38851
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> HistoryServerSuite now only test leveldb backend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38939) Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax

2022-04-18 Thread Jackie Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jackie Zhang updated SPARK-38939:
-
Description: Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will 
always throw error if the column doesn't exist. We would like to provide an (IF 
EXISTS) syntax to provide better user experience for downstream handlers (such 
as Delta) that support it, and make consistent with some other DMLs such as 
`DROP TABLE (IF EXISTS)`  (was: Currently `ALTER TABLE ... DROP COLUMN(s) ...` 
syntax will always throw error if the column doesn't exist. We would like to 
provide an (IF EXISTS) syntax to provide better user experience, and make 
consistent with some other DMLs such as `DROP TABLE (IF EXISTS)` etc.)

> Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax
> -
>
> Key: SPARK-38939
> URL: https://issues.apache.org/jira/browse/SPARK-38939
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Jackie Zhang
>Priority: Major
>
> Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will always throw error 
> if the column doesn't exist. We would like to provide an (IF EXISTS) syntax 
> to provide better user experience for downstream handlers (such as Delta) 
> that support it, and make consistent with some other DMLs such as `DROP TABLE 
> (IF EXISTS)`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38856) Fix a rejectedExecutionException error when push-based shuffle is enabled

2022-04-18 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-38856.
-
Target Version/s: 3.4.0
Assignee: weixiuli
  Resolution: Fixed

> Fix a rejectedExecutionException error when push-based shuffle is enabled
> -
>
> Key: SPARK-38856
> URL: https://issues.apache.org/jira/browse/SPARK-38856
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.2.1
>Reporter: weixiuli
>Assignee: weixiuli
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38330) Certificate doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]

2022-04-18 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523891#comment-17523891
 ] 

Steve Loughran commented on SPARK-38330:


FWIW I'm not 100% sure this is fixed, as we've had local impala test runs fail 
if hadoop-cos was on the classpath.

hadoop-3.3.3 will have the unshaded wildfly references though, so switching to 
openssl will be an option. please test it when I get the RC out this week

> Certificate doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
> --
>
> Key: SPARK-38330
> URL: https://issues.apache.org/jira/browse/SPARK-38330
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 3.2.1
> Environment: Spark 3.2.1 built with `hadoop-cloud` flag.
> Direct access to s3 using default file committer.
> JDK8.
>  
>Reporter: André F.
>Priority: Major
>
> Trying to run any job after bumping our Spark version from 3.1.2 to 3.2.1, 
> lead us to the current exception while reading files on s3:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
> s3a:///.parquet: com.amazonaws.SdkClientException: Unable to 
> execute HTTP request: Certificate for  doesn't match 
> any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: 
> Unable to execute HTTP request: Certificate for  doesn't match any of 
> the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) 
> at 
> org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) 
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at 
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:596) {code}
>  
> {code:java}
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for 
>  doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
>   at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
>   at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
>   at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
>   at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
>   at com.amazonaws.http.conn.$Proxy16.connect(Unknown Source)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java

[jira] [Commented] (SPARK-38825) Add a test to cover parquet notIn filter

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523907#comment-17523907
 ] 

Apache Spark commented on SPARK-38825:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/36248

> Add a test to cover parquet notIn filter
> 
>
> Key: SPARK-38825
> URL: https://issues.apache.org/jira/browse/SPARK-38825
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.3.0
>
>
> Add a test to cover parquet filter notIn



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38825) Add a test to cover parquet notIn filter

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523909#comment-17523909
 ] 

Apache Spark commented on SPARK-38825:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/36248

> Add a test to cover parquet notIn filter
> 
>
> Key: SPARK-38825
> URL: https://issues.apache.org/jira/browse/SPARK-38825
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.3.0
>
>
> Add a test to cover parquet filter notIn



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38939) Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax

2022-04-18 Thread Jackie Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jackie Zhang updated SPARK-38939:
-
Summary: Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax  (was: 
Support ALTER TABLE ... DROP COLUMN [IF EXISTS] .. syntax)

> Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax
> -
>
> Key: SPARK-38939
> URL: https://issues.apache.org/jira/browse/SPARK-38939
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Jackie Zhang
>Priority: Major
>
> Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will always throw error 
> if the column doesn't exist. We would like to provide an (IF EXISTS) syntax 
> to provide better user experience for downstream handlers (such as Delta) 
> that support it, and make consistent with some other DMLs such as `DROP TABLE 
> (IF EXISTS)`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38939) Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38939:


Assignee: Apache Spark

> Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax
> -
>
> Key: SPARK-38939
> URL: https://issues.apache.org/jira/browse/SPARK-38939
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Jackie Zhang
>Assignee: Apache Spark
>Priority: Major
>
> Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will always throw error 
> if the column doesn't exist. We would like to provide an (IF EXISTS) syntax 
> to provide better user experience for downstream handlers (such as Delta) 
> that support it, and make consistent with some other DMLs such as `DROP TABLE 
> (IF EXISTS)`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38939) Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax

2022-04-18 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523946#comment-17523946
 ] 

Apache Spark commented on SPARK-38939:
--

User 'jackierwzhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36249

> Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax
> -
>
> Key: SPARK-38939
> URL: https://issues.apache.org/jira/browse/SPARK-38939
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Jackie Zhang
>Priority: Major
>
> Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will always throw error 
> if the column doesn't exist. We would like to provide an (IF EXISTS) syntax 
> to provide better user experience for downstream handlers (such as Delta) 
> that support it, and make consistent with some other DMLs such as `DROP TABLE 
> (IF EXISTS)`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38939) Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax

2022-04-18 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38939:


Assignee: (was: Apache Spark)

> Support ALTER TABLE ... DROP [IF EXISTS] COLUMN .. syntax
> -
>
> Key: SPARK-38939
> URL: https://issues.apache.org/jira/browse/SPARK-38939
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: Jackie Zhang
>Priority: Major
>
> Currently `ALTER TABLE ... DROP COLUMN(s) ...` syntax will always throw error 
> if the column doesn't exist. We would like to provide an (IF EXISTS) syntax 
> to provide better user experience for downstream handlers (such as Delta) 
> that support it, and make consistent with some other DMLs such as `DROP TABLE 
> (IF EXISTS)`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 147 matches

Mail list logo