[GitHub] [spark-website] zhengruifeng commented on pull request #478: [SPARK-45195][FOLLOWUP] Update python example

2023-09-18 Thread via GitHub


zhengruifeng commented on PR #478:
URL: https://github.com/apache/spark-website/pull/478#issuecomment-1724871330

   thank you guys


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: [SPARK-45195][FOLLOWUP] Update python example (#478)

2023-09-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c578724e88 [SPARK-45195][FOLLOWUP] Update python example (#478)
c578724e88 is described below

commit c578724e8818a2ebd5f49e0a9e300bb86c8000cb
Author: Ruifeng Zheng 
AuthorDate: Tue Sep 19 13:53:42 2023 +0800

[SPARK-45195][FOLLOWUP] Update python example (#478)

* init

* address comments

address comments

address comments
---
 index.md| 7 +--
 site/index.html | 7 +--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/index.md b/index.md
index ada6242742..188ced0d6c 100644
--- a/index.md
+++ b/index.md
@@ -88,12 +88,15 @@ navigation:
 
 
 Run now
-Install with 'pip' or try 
offical image
+Install with 'pip'
 
 
 $ pip install pyspark
 $ pyspark
-$ 
+
+Use the official Docker image
+
+
 $ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
 
 
diff --git a/site/index.html b/site/index.html
index 3ccc7104ce..e3da4b2a5b 100644
--- a/site/index.html
+++ b/site/index.html
@@ -213,12 +213,15 @@
 
 
 Run now
-Install with 'pip' or try 
offical image
+Install with 'pip'
 
 
 $ pip install pyspark
 $ pyspark
-$ 
+
+Use the official Docker image
+
+
 $ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng merged pull request #478: [SPARK-45195][FOLLOWUP] Update python example

2023-09-18 Thread via GitHub


zhengruifeng merged PR #478:
URL: https://github.com/apache/spark-website/pull/478


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all code snippets (Spark 3.4 and below)

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 4a418a448ea [SPARK-45210][DOCS][3.4] Switch languages consistently 
across docs for all code snippets (Spark 3.4 and below)
4a418a448ea is described below

commit 4a418a448eab6e1007927db92ecacad6594397c8
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 19 14:51:27 2023 +0900

[SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all 
code snippets (Spark 3.4 and below)

### What changes were proposed in this pull request?

This PR proposes to recover the availity of switching languages 
consistently across docs for all code snippets in Spark 3.4 and below by using 
the proper class selector in the JQuery. Previously the selector was a string 
`.nav-link tab_python` which did not comply multiple class selection: 
https://www.w3.org/TR/CSS21/selector.html#class-html. I assume it worked as a 
legacy behaviour somewhere. Now it uses the standard way `.nav-link.tab_python`.

Note that https://github.com/apache/spark/pull/42657 works because there's 
only single class assigned (after we refactored the site at 
https://github.com/apache/spark/pull/40269)

### Why are the changes needed?

This is a regression in our documentation site.

### Does this PR introduce _any_ user-facing change?

Yes, once you click the language tab, it will apply to the examples in the 
whole page.

### How was this patch tested?

Manually tested after building the site.

![Screenshot 2023-09-19 at 12 08 17 
PM](https://github.com/apache/spark/assets/6477701/09d0c117-9774-4404-8e2e-d454b7f700a3)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42989 from HyukjinKwon/SPARK-45210.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 796d8785c61e09d1098350657fd44707763687db)
Signed-off-by: Hyukjin Kwon 
---
 docs/js/main.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/js/main.js b/docs/js/main.js
index 968097c8041..07df2541f74 100755
--- a/docs/js/main.js
+++ b/docs/js/main.js
@@ -63,7 +63,7 @@ function codeTabs() {
 // while retaining the scroll position
 e.preventDefault();
 var scrollOffset = $(this).offset().top - $(document).scrollTop();
-$("." + $(this).attr('class')).tab('show');
+$("." + $(this).attr('class').split(" ").join(".")).tab('show');
 $(document).scrollTop($(this).offset().top - scrollOffset);
   });
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all code snippets (Spark 3.4 and below)

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new e428fe902bb [SPARK-45210][DOCS][3.4] Switch languages consistently 
across docs for all code snippets (Spark 3.4 and below)
e428fe902bb is described below

commit e428fe902bb1f12cea973de7fe4b885ae69fd6ca
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 19 14:51:27 2023 +0900

[SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all 
code snippets (Spark 3.4 and below)

### What changes were proposed in this pull request?

This PR proposes to recover the availity of switching languages 
consistently across docs for all code snippets in Spark 3.4 and below by using 
the proper class selector in the JQuery. Previously the selector was a string 
`.nav-link tab_python` which did not comply multiple class selection: 
https://www.w3.org/TR/CSS21/selector.html#class-html. I assume it worked as a 
legacy behaviour somewhere. Now it uses the standard way `.nav-link.tab_python`.

Note that https://github.com/apache/spark/pull/42657 works because there's 
only single class assigned (after we refactored the site at 
https://github.com/apache/spark/pull/40269)

### Why are the changes needed?

This is a regression in our documentation site.

### Does this PR introduce _any_ user-facing change?

Yes, once you click the language tab, it will apply to the examples in the 
whole page.

### How was this patch tested?

Manually tested after building the site.

![Screenshot 2023-09-19 at 12 08 17 
PM](https://github.com/apache/spark/assets/6477701/09d0c117-9774-4404-8e2e-d454b7f700a3)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42989 from HyukjinKwon/SPARK-45210.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 796d8785c61e09d1098350657fd44707763687db)
Signed-off-by: Hyukjin Kwon 
---
 docs/js/main.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/js/main.js b/docs/js/main.js
index 968097c8041..07df2541f74 100755
--- a/docs/js/main.js
+++ b/docs/js/main.js
@@ -63,7 +63,7 @@ function codeTabs() {
 // while retaining the scroll position
 e.preventDefault();
 var scrollOffset = $(this).offset().top - $(document).scrollTop();
-$("." + $(this).attr('class')).tab('show');
+$("." + $(this).attr('class').split(" ").join(".")).tab('show');
 $(document).scrollTop($(this).offset().top - scrollOffset);
   });
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all code snippets (Spark 3.4 and below)

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 578233e8e0a [SPARK-45210][DOCS][3.4] Switch languages consistently 
across docs for all code snippets (Spark 3.4 and below)
578233e8e0a is described below

commit 578233e8e0a05213ce2c3f9f49525075c2da801f
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 19 14:51:27 2023 +0900

[SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all 
code snippets (Spark 3.4 and below)

### What changes were proposed in this pull request?

This PR proposes to recover the availity of switching languages 
consistently across docs for all code snippets in Spark 3.4 and below by using 
the proper class selector in the JQuery. Previously the selector was a string 
`.nav-link tab_python` which did not comply multiple class selection: 
https://www.w3.org/TR/CSS21/selector.html#class-html. I assume it worked as a 
legacy behaviour somewhere. Now it uses the standard way `.nav-link.tab_python`.

Note that https://github.com/apache/spark/pull/42657 works because there's 
only single class assigned (after we refactored the site at 
https://github.com/apache/spark/pull/40269)

### Why are the changes needed?

This is a regression in our documentation site.

### Does this PR introduce _any_ user-facing change?

Yes, once you click the language tab, it will apply to the examples in the 
whole page.

### How was this patch tested?

Manually tested after building the site.

![Screenshot 2023-09-19 at 12 08 17 
PM](https://github.com/apache/spark/assets/6477701/09d0c117-9774-4404-8e2e-d454b7f700a3)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42989 from HyukjinKwon/SPARK-45210.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 796d8785c61e09d1098350657fd44707763687db)
Signed-off-by: Hyukjin Kwon 
---
 docs/js/main.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/js/main.js b/docs/js/main.js
index 968097c8041..07df2541f74 100755
--- a/docs/js/main.js
+++ b/docs/js/main.js
@@ -63,7 +63,7 @@ function codeTabs() {
 // while retaining the scroll position
 e.preventDefault();
 var scrollOffset = $(this).offset().top - $(document).scrollTop();
-$("." + $(this).attr('class')).tab('show');
+$("." + $(this).attr('class').split(" ").join(".")).tab('show');
 $(document).scrollTop($(this).offset().top - scrollOffset);
   });
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all code snippets (Spark 3.4 and below)

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 796d8785c61 [SPARK-45210][DOCS][3.4] Switch languages consistently 
across docs for all code snippets (Spark 3.4 and below)
796d8785c61 is described below

commit 796d8785c61e09d1098350657fd44707763687db
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 19 14:51:27 2023 +0900

[SPARK-45210][DOCS][3.4] Switch languages consistently across docs for all 
code snippets (Spark 3.4 and below)

### What changes were proposed in this pull request?

This PR proposes to recover the availity of switching languages 
consistently across docs for all code snippets in Spark 3.4 and below by using 
the proper class selector in the JQuery. Previously the selector was a string 
`.nav-link tab_python` which did not comply multiple class selection: 
https://www.w3.org/TR/CSS21/selector.html#class-html. I assume it worked as a 
legacy behaviour somewhere. Now it uses the standard way `.nav-link.tab_python`.

Note that https://github.com/apache/spark/pull/42657 works because there's 
only single class assigned (after we refactored the site at 
https://github.com/apache/spark/pull/40269)

### Why are the changes needed?

This is a regression in our documentation site.

### Does this PR introduce _any_ user-facing change?

Yes, once you click the language tab, it will apply to the examples in the 
whole page.

### How was this patch tested?

Manually tested after building the site.

![Screenshot 2023-09-19 at 12 08 17 
PM](https://github.com/apache/spark/assets/6477701/09d0c117-9774-4404-8e2e-d454b7f700a3)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42989 from HyukjinKwon/SPARK-45210.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 docs/js/main.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/js/main.js b/docs/js/main.js
index 968097c8041..07df2541f74 100755
--- a/docs/js/main.js
+++ b/docs/js/main.js
@@ -63,7 +63,7 @@ function codeTabs() {
 // while retaining the scroll position
 e.preventDefault();
 var scrollOffset = $(this).offset().top - $(document).scrollTop();
-$("." + $(this).attr('class')).tab('show');
+$("." + $(this).attr('class').split(" ").join(".")).tab('show');
 $(document).scrollTop($(this).offset().top - scrollOffset);
   });
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-44910][SQL][3.4] Encoders.bean does not support superclasses with generic type arguments

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new e5d82875463 [SPARK-44910][SQL][3.4] Encoders.bean does not support 
superclasses with generic type arguments
e5d82875463 is described below

commit e5d8287546306f31d3888662669970e46895c155
Author: Giambattista Bloisi 
AuthorDate: Mon Sep 18 21:38:42 2023 -0700

[SPARK-44910][SQL][3.4] Encoders.bean does not support superclasses with 
generic type arguments

### What changes were proposed in this pull request?
This pull request adds Encoders.bean support for beans having a superclass 
declared with generic type arguments.
For example:

```
class JavaBeanWithGenericsA {
public T getPropertyA() {
return null;
}

public void setPropertyA(T a) {

}
}

class JavaBeanWithGenericBase extends JavaBeanWithGenericsA {
}

Encoders.bean(JavaBeanWithGenericBase.class); // Exception
```

That feature had to be part of [PR 
42327](https://github.com/apache/spark/commit/1f5d78b5952fcc6c7d36d3338a5594070e3a62dd)
 but was missing as I was focusing on nested beans only (hvanhovell )

### Why are the changes needed?
JavaTypeInference.encoderFor did not solve TypeVariable objects for 
superclasses so when managing a case like in the example above an exception was 
thrown.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests have been extended, new specific tests have been added

### Was this patch authored or co-authored using generative AI tooling?
No

hvanhovell this is branch-3.4 port of 
[PR-42634](https://github.com/apache/spark/pull/42634)

Closes #42914 from gbloisi-openaire/SPARK-44910-branch-3.4.

Authored-by: Giambattista Bloisi 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/JavaTypeInference.scala |  6 ++-
 ...thGenerics.java => JavaTypeInferenceBeans.java} | 52 +++---
 .../sql/catalyst/JavaTypeInferenceSuite.scala  | 41 +++--
 3 files changed, 88 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
index 75aca3ccbdd..a0341c0d9c8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
@@ -131,10 +131,13 @@ object JavaTypeInference {
   // TODO: we should only collect properties that have getter and setter. 
However, some tests
   //   pass in scala case class as java bean class which doesn't have 
getter and setter.
   val properties = getJavaBeanReadableProperties(c)
+  // add type variables from inheritance hierarchy of the class
+  val classTV = JavaTypeUtils.getTypeArguments(c, 
classOf[Object]).asScala.toMap ++
+typeVariables
   // Note that the fields are ordered by name.
   val fields = properties.map { property =>
 val readMethod = property.getReadMethod
-val encoder = encoderFor(readMethod.getGenericReturnType, seenTypeSet 
+ c, typeVariables)
+val encoder = encoderFor(readMethod.getGenericReturnType, seenTypeSet 
+ c, classTV)
 // The existence of `javax.annotation.Nonnull`, means this field is 
not nullable.
 val hasNonNull = readMethod.isAnnotationPresent(classOf[Nonnull])
 EncoderField(
@@ -158,4 +161,3 @@ object JavaTypeInference {
   .filter(_.getReadMethod != null)
   }
 }
-
diff --git 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
old mode 100755
new mode 100644
similarity index 54%
rename from 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
rename to 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
index b84a3122cf8..8438b75c762
--- 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
+++ 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
@@ -17,25 +17,65 @@
 
 package org.apache.spark.sql.catalyst;
 
-class JavaBeanWithGenerics {
+public class JavaTypeInferenceBeans {
+
+  static class JavaBeanWithGenericsA {
+public T getPropertyA() {
+  return null;
+}
+
+public void setPropertyA(T a) {
+
+}
+  }
+
+  static class JavaBeanWithGenericsAB extends JavaBeanWithGenericsA 
{
+public T getPropertyB() {
+  return null;
+}
+
+public void setPropertyB(T a) {
+
+}
+  }
+
+  static class 

[spark] branch branch-3.5 updated: [SPARK-44910][SQL] Encoders.bean does not support superclasses with generic type arguments

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6a498087361 [SPARK-44910][SQL] Encoders.bean does not support 
superclasses with generic type arguments
6a498087361 is described below

commit 6a498087361ecbd653821fc283b9ea0fa703c820
Author: Giambattista Bloisi 
AuthorDate: Mon Sep 18 21:37:09 2023 -0700

[SPARK-44910][SQL] Encoders.bean does not support superclasses with generic 
type arguments

### What changes were proposed in this pull request?
This pull request adds Encoders.bean support for beans having a superclass 
declared with generic type arguments.
For example:

```
class JavaBeanWithGenericsA {
public T getPropertyA() {
return null;
}

public void setPropertyA(T a) {

}
}

class JavaBeanWithGenericBase extends JavaBeanWithGenericsA {
}

Encoders.bean(JavaBeanWithGenericBase.class); // Exception
```

That feature had to be part of [PR 
42327](https://github.com/apache/spark/commit/1f5d78b5952fcc6c7d36d3338a5594070e3a62dd)
 but was missing as I was focusing on nested beans only (hvanhovell )

### Why are the changes needed?
JavaTypeInference.encoderFor did not solve TypeVariable objects for 
superclasses so when managing a case like in the example above an exception was 
thrown.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests have been extended, new specific tests have been added

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42634 from gbloisi-openaire/SPARK-44910.

Lead-authored-by: Giambattista Bloisi 
Co-authored-by: gbloisi-openaire 
<141144100+gbloisi-opena...@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 7e14c8cc33f0ed0a9c53a888e8a3b17dd2a5d493)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/JavaTypeInference.scala |  5 ++-
 ...thGenerics.java => JavaTypeInferenceBeans.java} | 51 +++---
 .../sql/catalyst/JavaTypeInferenceSuite.scala  | 41 +++--
 3 files changed, 88 insertions(+), 9 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
index 3d536b735db..191ccc52544 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
@@ -130,10 +130,13 @@ object JavaTypeInference {
   // TODO: we should only collect properties that have getter and setter. 
However, some tests
   //   pass in scala case class as java bean class which doesn't have 
getter and setter.
   val properties = getJavaBeanReadableProperties(c)
+  // add type variables from inheritance hierarchy of the class
+  val classTV = JavaTypeUtils.getTypeArguments(c, 
classOf[Object]).asScala.toMap ++
+typeVariables
   // Note that the fields are ordered by name.
   val fields = properties.map { property =>
 val readMethod = property.getReadMethod
-val encoder = encoderFor(readMethod.getGenericReturnType, seenTypeSet 
+ c, typeVariables)
+val encoder = encoderFor(readMethod.getGenericReturnType, seenTypeSet 
+ c, classTV)
 // The existence of `javax.annotation.Nonnull`, means this field is 
not nullable.
 val hasNonNull = readMethod.isAnnotationPresent(classOf[Nonnull])
 EncoderField(
diff --git 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
similarity index 54%
rename from 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
rename to 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
index b84a3122cf8..cc3540717ee 100644
--- 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
+++ 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
@@ -17,25 +17,66 @@
 
 package org.apache.spark.sql.catalyst;
 
-class JavaBeanWithGenerics {
+public class JavaTypeInferenceBeans {
+
+  static class JavaBeanWithGenericsA {
+public T getPropertyA() {
+  return null;
+}
+
+public void setPropertyA(T a) {
+
+}
+  }
+
+  static class JavaBeanWithGenericsAB extends JavaBeanWithGenericsA 
{
+public T getPropertyB() {
+  return null;
+}
+
+public void setPropertyB(T a) {
+
+}
+  }
+
+  static class JavaBeanWithGenericsABC extends JavaBeanWithGenericsAB 
{
+

[spark] branch master updated: [SPARK-44910][SQL] Encoders.bean does not support superclasses with generic type arguments

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e14c8cc33f [SPARK-44910][SQL] Encoders.bean does not support 
superclasses with generic type arguments
7e14c8cc33f is described below

commit 7e14c8cc33f0ed0a9c53a888e8a3b17dd2a5d493
Author: Giambattista Bloisi 
AuthorDate: Mon Sep 18 21:37:09 2023 -0700

[SPARK-44910][SQL] Encoders.bean does not support superclasses with generic 
type arguments

### What changes were proposed in this pull request?
This pull request adds Encoders.bean support for beans having a superclass 
declared with generic type arguments.
For example:

```
class JavaBeanWithGenericsA {
public T getPropertyA() {
return null;
}

public void setPropertyA(T a) {

}
}

class JavaBeanWithGenericBase extends JavaBeanWithGenericsA {
}

Encoders.bean(JavaBeanWithGenericBase.class); // Exception
```

That feature had to be part of [PR 
42327](https://github.com/apache/spark/commit/1f5d78b5952fcc6c7d36d3338a5594070e3a62dd)
 but was missing as I was focusing on nested beans only (hvanhovell )

### Why are the changes needed?
JavaTypeInference.encoderFor did not solve TypeVariable objects for 
superclasses so when managing a case like in the example above an exception was 
thrown.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests have been extended, new specific tests have been added

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42634 from gbloisi-openaire/SPARK-44910.

Lead-authored-by: Giambattista Bloisi 
Co-authored-by: gbloisi-openaire 
<141144100+gbloisi-opena...@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/JavaTypeInference.scala |  5 ++-
 ...thGenerics.java => JavaTypeInferenceBeans.java} | 51 +++---
 .../sql/catalyst/JavaTypeInferenceSuite.scala  | 41 +++--
 3 files changed, 88 insertions(+), 9 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
index 3d536b735db..191ccc52544 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
@@ -130,10 +130,13 @@ object JavaTypeInference {
   // TODO: we should only collect properties that have getter and setter. 
However, some tests
   //   pass in scala case class as java bean class which doesn't have 
getter and setter.
   val properties = getJavaBeanReadableProperties(c)
+  // add type variables from inheritance hierarchy of the class
+  val classTV = JavaTypeUtils.getTypeArguments(c, 
classOf[Object]).asScala.toMap ++
+typeVariables
   // Note that the fields are ordered by name.
   val fields = properties.map { property =>
 val readMethod = property.getReadMethod
-val encoder = encoderFor(readMethod.getGenericReturnType, seenTypeSet 
+ c, typeVariables)
+val encoder = encoderFor(readMethod.getGenericReturnType, seenTypeSet 
+ c, classTV)
 // The existence of `javax.annotation.Nonnull`, means this field is 
not nullable.
 val hasNonNull = readMethod.isAnnotationPresent(classOf[Nonnull])
 EncoderField(
diff --git 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
similarity index 54%
rename from 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
rename to 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
index b84a3122cf8..cc3540717ee 100644
--- 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java
+++ 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaTypeInferenceBeans.java
@@ -17,25 +17,66 @@
 
 package org.apache.spark.sql.catalyst;
 
-class JavaBeanWithGenerics {
+public class JavaTypeInferenceBeans {
+
+  static class JavaBeanWithGenericsA {
+public T getPropertyA() {
+  return null;
+}
+
+public void setPropertyA(T a) {
+
+}
+  }
+
+  static class JavaBeanWithGenericsAB extends JavaBeanWithGenericsA 
{
+public T getPropertyB() {
+  return null;
+}
+
+public void setPropertyB(T a) {
+
+}
+  }
+
+  static class JavaBeanWithGenericsABC extends JavaBeanWithGenericsAB 
{
+public T getPropertyC() {
+  return null;
+}
+
+public void setPropertyC(T a) {
+
+}
+  }
+
+  

[spark] branch master updated: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2697a9426e1 [SPARK-43654][CONNECT][PS][TESTS] Enable 
`InternalFrameParityTests.test_from_pandas`
2697a9426e1 is described below

commit 2697a9426e174e30c797a27660f20e42cf2faefc
Author: Haejoon Lee 
AuthorDate: Mon Sep 18 21:26:02 2023 -0700

[SPARK-43654][CONNECT][PS][TESTS] Enable 
`InternalFrameParityTests.test_from_pandas`

### What changes were proposed in this pull request?

This PR proposes to enable `InternalFrameParityTests.test_from_pandas`

### Why are the changes needed?

To improve the test coverage for Pandas API with Spark Connect.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added UT for Spark Connect.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42956 from itholic/enable_from_pandas.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/pandas/tests/connect/test_parity_internal.py | 4 +---
 python/pyspark/pandas/utils.py  | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/pandas/tests/connect/test_parity_internal.py 
b/python/pyspark/pandas/tests/connect/test_parity_internal.py
index d586fec57f7..d90b2b260f4 100644
--- a/python/pyspark/pandas/tests/connect/test_parity_internal.py
+++ b/python/pyspark/pandas/tests/connect/test_parity_internal.py
@@ -24,9 +24,7 @@ from pyspark.testing.pandasutils import PandasOnSparkTestUtils
 class InternalFrameParityTests(
 InternalFrameTestsMixin, PandasOnSparkTestUtils, ReusedConnectTestCase
 ):
-@unittest.skip("TODO(SPARK-43654): Enable 
InternalFrameParityTests.test_from_pandas.")
-def test_from_pandas(self):
-super().test_from_pandas()
+pass
 
 
 if __name__ == "__main__":
diff --git a/python/pyspark/pandas/utils.py b/python/pyspark/pandas/utils.py
index ebeb1d69d1b..b647697edf9 100644
--- a/python/pyspark/pandas/utils.py
+++ b/python/pyspark/pandas/utils.py
@@ -954,7 +954,7 @@ def spark_column_equals(left: Column, right: Column) -> 
bool:
 error_class="NOT_COLUMN",
 message_parameters={"arg_name": "right", "arg_type": 
type(right).__name__},
 )
-return repr(left) == repr(right)
+return repr(left).replace("`", "") == repr(right).replace("`", "")
 else:
 return left._jc.equals(right._jc)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on a diff in pull request #478: [SPARK-45195] Update python example

2023-09-18 Thread via GitHub


gatorsmile commented on code in PR #478:
URL: https://github.com/apache/spark-website/pull/478#discussion_r1329548341


##
index.md:
##
@@ -88,13 +88,17 @@ navigation:
 
 
 Run now
-Install with 'pip' or try 
offical image
+Install with 'pip'
 
 
 $ pip install pyspark
 $ pyspark
-$ 
+
+Try offical image

Review Comment:
   Typo: offical => official 
   
   ```
   Use the official Docker image.
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on a diff in pull request #478: [SPARK-45195] Update python example

2023-09-18 Thread via GitHub


gatorsmile commented on code in PR #478:
URL: https://github.com/apache/spark-website/pull/478#discussion_r1329547154


##
index.md:
##
@@ -88,13 +88,17 @@ navigation:
 
 
 Run now
-Install with 'pip' or try 
offical image
+Install with 'pip'
 
 
 $ pip install pyspark
 $ pyspark
-$ 
+
+Try offical image
+
+
 $ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
+$ 

Review Comment:
   remove this line?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng opened a new pull request, #478: [SPARK-45195] Update python example

2023-09-18 Thread via GitHub


zhengruifeng opened a new pull request, #478:
URL: https://github.com/apache/spark-website/pull/478

   To address 
https://github.com/apache/spark-website/pull/477#discussion_r1329098596
   
   
![image](https://github.com/apache/spark-website/assets/7322292/691cd53b-c5dd-43eb-a5fa-be1611239194)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45202][BUILD] Fix lint-js tool and js format

2023-09-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e818e9cefa0 [SPARK-45202][BUILD] Fix lint-js tool and js format
e818e9cefa0 is described below

commit e818e9cefa012037a38a99819498f331697d3a50
Author: Kent Yao 
AuthorDate: Tue Sep 19 11:05:41 2023 +0800

[SPARK-45202][BUILD] Fix lint-js tool and js format

### What changes were proposed in this pull request?

This PR fixes lint-js tool, which covers only js files in the core module, 
and also fixes the js format of the rest modules.

```
dev/lint-js


/Users/kentyao/spark/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
   29:10  error  'renderPlanViz' is defined but never used  
no-unused-vars
   31:18  error  'd3' is not defined
no-undef
   35:11  error  'graphlibDot' is not defined   
no-undef
   37:22  error  'dagreD3' is not defined   
no-undef
   46:27  error  '$' is not defined 
no-undef
   59:38  error  'd3' is not defined
no-undef
   66:21  error  'd3' is not defined
no-undef
   67:3   error  'd3' is not defined
no-undef
   68:20  error  'd' is defined but never used. Allowed unused args must 
match /^_ignored_.*/u  no-unused-vars
   69:21  error  'd3' is not defined
no-undef
   70:7   error  '$' is not defined 
no-undef
  104:55  error  'i' is defined but never used. Allowed unused args must 
match /^_ignored_.*/u  no-unused-vars
  108:1   error  Expected indentation of 10 spaces but found 12 
indent
  109:1   error  Expected indentation of 10 spaces but found 12 
indent
  115:1   error  Expected indentation of 4 spaces but found 6   
indent
  116:1   error  Expected indentation of 6 spaces but found 8   
indent
  116:16  error  'd3' is not defined
no-undef
  117:1   error  Expected indentation of 4 spaces but found 6   
indent
  118:1   error  Expected indentation of 2 spaces but found 4   
indent
  129:13  error  'd3' is not defined
no-undef
  130:34  error  'd3' is not defined
no-undef
  133:13  error  'd3' is not defined
no-undef
  134:34  error  'd3' is not defined
no-undef
  137:13  error  'd3' is not defined
no-undef
  138:15  error  'd3' is not defined
no-undef
  142:13  error  'd3' is not defined
no-undef
  143:15  error  'd3' is not defined
no-undef
  180:11  error  'd3' is not defined
no-undef
  196:3   error  '$' is not defined 
no-undef
  198:26  error  '$' is not defined 
no-undef
  199:1   error  Expected indentation of 6 spaces but found 8   
indent
  200:1   error  Expected indentation of 8 spaces but found 10  
indent
  201:1   error  Expected indentation of 8 spaces but found 10  
indent
  201:28  error  'd3' is not defined
no-undef
  201:41  error  '$' is not defined 
no-undef
  202:1   error  Expected indentation of 8 spaces but found 10  
indent
  203:1   error  Expected indentation of 8 spaces but found 10  
indent
  204:1   error  Expected indentation of 8 spaces but found 10  

[spark] branch branch-3.5 updated: Revert "Revert "[SPARK-44742][PYTHON][DOCS] Add Spark version drop down to the PySpark doc site""

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 555c8def51e Revert "Revert "[SPARK-44742][PYTHON][DOCS] Add Spark 
version drop down to the PySpark doc site""
555c8def51e is described below

commit 555c8def51e5951c7bf5165a332795e9e330ec9d
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 19 10:18:18 2023 +0900

Revert "Revert "[SPARK-44742][PYTHON][DOCS] Add Spark version drop down to 
the PySpark doc site""

This reverts commit bbe12e148eb1f289cfb1f4412525f4c4381c10a9.
---
 python/docs/source/_static/css/pyspark.css | 13 
 python/docs/source/_static/versions.json   | 22 +++
 .../docs/source/_templates/version-switcher.html   | 77 ++
 python/docs/source/conf.py |  9 ++-
 4 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/python/docs/source/_static/css/pyspark.css 
b/python/docs/source/_static/css/pyspark.css
index 89b7c65f27a..ccfe60f2bca 100644
--- a/python/docs/source/_static/css/pyspark.css
+++ b/python/docs/source/_static/css/pyspark.css
@@ -95,3 +95,16 @@ u.bd-sidebar .nav>li>ul>.active:hover>a,.bd-sidebar 
.nav>li>ul>.active>a {
 .spec_table tr, td, th {
 border-top: none!important;
 }
+
+/* Styling to the version dropdown */
+#version-button {
+  padding-left: 0.2rem;
+  padding-right: 3.2rem;
+}
+
+#version_switcher {
+  height: auto;
+  max-height: 300px;
+  width: 165px;
+  overflow-y: auto;
+}
diff --git a/python/docs/source/_static/versions.json 
b/python/docs/source/_static/versions.json
new file mode 100644
index 000..3d0bd148180
--- /dev/null
+++ b/python/docs/source/_static/versions.json
@@ -0,0 +1,22 @@
+[
+{
+"name": "3.4.1",
+"version": "3.4.1"
+},
+{
+"name": "3.4.0",
+"version": "3.4.0"
+},
+{
+"name": "3.3.2",
+"version": "3.3.2"
+},
+{
+"name": "3.3.1",
+"version": "3.3.1"
+},
+{
+"name": "3.3.0",
+"version": "3.3.0"
+}
+]
diff --git a/python/docs/source/_templates/version-switcher.html 
b/python/docs/source/_templates/version-switcher.html
new file mode 100644
index 000..16c443229f4
--- /dev/null
+++ b/python/docs/source/_templates/version-switcher.html
@@ -0,0 +1,77 @@
+
+
+
+
+{{ release }}
+
+
+
+
+
+
+
+
+// Function to construct the target URL from the JSON components
+function buildURL(entry) {
+var template = "{{ switcher_template_url }}";  // supplied by jinja
+template = template.replace("{version}", entry.version);
+return template;
+}
+
+// Function to check if corresponding page path exists in other version of docs
+// and, if so, go there instead of the homepage of the other docs version
+function checkPageExistsAndRedirect(event) {
+const currentFilePath = "{{ pagename }}.html",
+  otherDocsHomepage = event.target.getAttribute("href");
+let tryUrl = `${otherDocsHomepage}${currentFilePath}`;
+$.ajax({
+type: 'HEAD',
+url: tryUrl,
+// if the page exists, go there
+success: function() {
+location.href = tryUrl;
+}
+}).fail(function() {
+location.href = otherDocsHomepage;
+});
+return false;
+}
+
+// Function to populate the version switcher
+(function () {
+// get JSON config
+$.getJSON("{{ switcher_json_url }}", function(data, textStatus, jqXHR) {
+// create the nodes first (before AJAX calls) to ensure the order is
+// correct (for now, links will go to doc version homepage)
+$.each(data, function(index, entry) {
+// if no custom name specified (e.g., "latest"), use version string
+if (!("name" in entry)) {
+entry.name = entry.version;
+}
+// construct the appropriate URL, and add it to the dropdown
+entry.url = buildURL(entry);
+const node = document.createElement("a");
+node.setAttribute("class", "list-group-item list-group-item-action 
py-1");
+node.setAttribute("href", `${entry.url}`);
+node.textContent = `${entry.name}`;
+node.onclick = checkPageExistsAndRedirect;
+$("#version_switcher").append(node);
+});
+});
+})();
+
diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 38c331048e7..0f57cb37cee 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -177,10 +177,17 @@ autosummary_generate = True
 # a list of builtin themes.
 html_theme = 'pydata_sphinx_theme'
 
+html_context = {
+"switcher_json_url": "_static/versions.json",
+"switcher_template_url": 
"https://spark.apache.org/docs/{version}/api/python/index.html;,
+}
+
 # Theme options are theme-specific 

[GitHub] [spark-website] zhengruifeng commented on a diff in pull request #477: [SPARK-45195] Update examples with docker official image

2023-09-18 Thread via GitHub


zhengruifeng commented on code in PR #477:
URL: https://github.com/apache/spark-website/pull/477#discussion_r1329402279


##
index.md:
##
@@ -88,11 +88,13 @@ navigation:
 
 
 Run now
-Installing with 'pip'
+Install with 'pip' or try 
offical image
 
 
 $ pip install pyspark
 $ pyspark
+$ 
+$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark

Review Comment:
   make sense, will send a follow up pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45113][PYTHON][DOCS][FOLLOWUP] Add sorting to the example of `collect_set/collect_list` to ensure stable results

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 464a3c19f51 [SPARK-45113][PYTHON][DOCS][FOLLOWUP] Add sorting to the 
example of `collect_set/collect_list` to ensure stable results
464a3c19f51 is described below

commit 464a3c19f51081914a09d4394820da059cb2ee47
Author: yangjie01 
AuthorDate: Tue Sep 19 08:48:03 2023 +0900

[SPARK-45113][PYTHON][DOCS][FOLLOWUP] Add sorting to the example of 
`collect_set/collect_list` to ensure stable results

### What changes were proposed in this pull request?
This PR adds a `sort_array` for the output results of `collect_set` and 
`collect_list` to ensure the output is stable.

### Why are the changes needed?
When executing the Example of `collect_set` and `collect_list` with 
different versions of Scala, the output results may differ, resulting in the 
failure of daily tests on Scala 2.13:

- https://github.com/apache/spark/actions/runs/6209111340/job/16856005714

```
**
File "/__w/spark/spark/python/pyspark/sql/connect/functions.py", line 1030, 
in pyspark.sql.connect.functions.collect_set
Failed example:
df.select(sf.collect_set('age')).show()
Expected:
++
|collect_set(age)|
++
|  [5, 2]|
++
Got:
++
|collect_set(age)|
++
|  [2, 5]|
++

**
   1 of   9 in pyspark.sql.connect.functions.collect_set
***Test Failed*** 1 failures.

Had test failures in pyspark.sql.connect.functions with python3.9; see logs.
Error:  running /__w/spark/spark/python/run-tests --modules=pyspark-connect 
--parallelism=1 ; received return code 255
Error: Process completed with exit code 19.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42968 from LuciferYang/SPARK-45113-FOLLOWUP.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/functions.py | 96 -
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 54bd330ebc0..5474873df7b 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -3671,39 +3671,39 @@ def collect_list(col: "ColumnOrName") -> Column:
 
 Examples
 
-Example 1: Collect values from a single column DataFrame
+Example 1: Collect values from a DataFrame and sort the result in 
ascending order
 
 >>> from pyspark.sql import functions as sf
->>> df = spark.createDataFrame([(2,), (5,), (5,)], ('age',))
->>> df.select(sf.collect_list('age')).show()
-+-+
-|collect_list(age)|
-+-+
-|[2, 5, 5]|
-+-+
-
-Example 2: Collect values from a DataFrame with multiple columns
-
->>> from pyspark.sql import functions as sf
->>> df = spark.createDataFrame([(1, "John"), (2, "John"), (3, "Ana")], 
("id", "name"))
->>> df.groupBy("name").agg(sf.collect_list('id')).show()
-+++
-|name|collect_list(id)|
-+++
-|John|  [1, 2]|
-| Ana| [3]|
-+++
+>>> df = spark.createDataFrame([(1,), (2,), (2,)], ('value',))
+>>> 
df.select(sf.sort_array(sf.collect_list('value')).alias('sorted_list')).show()
++---+
+|sorted_list|
++---+
+|  [1, 2, 2]|
++---+
 
-Example 3: Collect values from a DataFrame and sort the result
+Example 2: Collect values from a DataFrame and sort the result in 
descending order
 
 >>> from pyspark.sql import functions as sf
->>> df = spark.createDataFrame([(1,), (2,), (2,)], ('value',))
->>> 
df.select(sf.array_sort(sf.collect_list('value')).alias('sorted_list')).show()
+>>> df = spark.createDataFrame([(2,), (5,), (5,)], ('age',))
+>>> df.select(sf.sort_array(sf.collect_list('age'), 
asc=False).alias('sorted_list')).show()
 +---+
 |sorted_list|
 +---+
-|  [1, 2, 2]|
+|  [5, 5, 2]|
 +---+
+
+Example 3: Collect values from a DataFrame with multiple columns and sort 
the result
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, "John"), (2, "John"), (3, "Ana")], 

[spark] branch branch-3.5 updated: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 2a9dd2b3968 [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter 
threading Rlock, and use the existing eventually util function
2a9dd2b3968 is described below

commit 2a9dd2b3968da7c2e96c502aaf4c158ee782e5f4
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 18 13:46:34 2023 +0900

[SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and 
use the existing eventually util function

This PR is a followup of https://github.com/apache/spark/pull/42929 that:
- Use lighter threading `Rlock` instead of multithreading `Rlock`. 
Multiprocessing does not work with PySpark due to the ser/de problem for socket 
connections, and many others.
- Use the existing eventually util function `pyspark.testing.eventually` 
instead of `assertEventually` to deduplicate code.

Mainly for code clean-up.

No.

Existing tests should pass them.

No.

Closes #42965 from HyukjinKwon/SPARK-45167-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit d5ff04da217df483d27011f6e38417df2eaa42bd)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/client/reattach.py  |  5 ++---
 .../sql/tests/connect/client/test_client.py| 23 +-
 2 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/python/pyspark/sql/connect/client/reattach.py 
b/python/pyspark/sql/connect/client/reattach.py
index e58864b965b..6addb5bd2c6 100644
--- a/python/pyspark/sql/connect/client/reattach.py
+++ b/python/pyspark/sql/connect/client/reattach.py
@@ -18,12 +18,11 @@ from pyspark.sql.connect.utils import check_dependencies
 
 check_dependencies(__name__)
 
+from threading import RLock
 import warnings
 import uuid
 from collections.abc import Generator
 from typing import Optional, Dict, Any, Iterator, Iterable, Tuple, Callable, 
cast, Type, ClassVar
-from multiprocessing import RLock
-from multiprocessing.synchronize import RLock as RLockBase
 from multiprocessing.pool import ThreadPool
 import os
 
@@ -56,7 +55,7 @@ class ExecutePlanResponseReattachableIterator(Generator):
 """
 
 # Lock to manage the pool
-_lock: ClassVar[RLockBase] = RLock()
+_lock: ClassVar[RLock] = RLock()
 _release_thread_pool: Optional[ThreadPool] = ThreadPool(os.cpu_count() if 
os.cpu_count() else 8)
 
 @classmethod
diff --git a/python/pyspark/sql/tests/connect/client/test_client.py 
b/python/pyspark/sql/tests/connect/client/test_client.py
index cf43fb16df7..93b7006799b 100644
--- a/python/pyspark/sql/tests/connect/client/test_client.py
+++ b/python/pyspark/sql/tests/connect/client/test_client.py
@@ -25,6 +25,7 @@ import grpc
 from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
 import pyspark.sql.connect.proto as proto
 from pyspark.testing.connectutils import should_test_connect, 
connect_requirement_message
+from pyspark.testing.utils import eventually
 
 from pyspark.sql.connect.client.core import Retrying
 from pyspark.sql.connect.client.reattach import (
@@ -152,20 +153,6 @@ class 
SparkConnectClientReattachTestCase(unittest.TestCase):
 attach_ops=ResponseGenerator(attach) if attach is not None else 
None,
 )
 
-def assertEventually(self, callable, timeout_ms=1000):
-"""Helper method that will continuously evaluate the callable to not 
raise an
-exception."""
-import time
-
-limit = time.monotonic_ns() + timeout_ms * 1000 * 1000
-while time.monotonic_ns() < limit:
-try:
-callable()
-break
-except Exception:
-time.sleep(0.1)
-callable()
-
 def test_basic_flow(self):
 stub = self._stub_with([self.response, self.finished])
 ite = ExecutePlanResponseReattachableIterator(self.request, stub, 
self.policy, [])
@@ -178,7 +165,7 @@ class SparkConnectClientReattachTestCase(unittest.TestCase):
 self.assertEqual(1, stub.release_calls)
 self.assertEqual(1, stub.execute_calls)
 
-self.assertEventually(check_all, timeout_ms=1000)
+eventually(timeout=1, catch_assertions=True)(check_all)()
 
 def test_fail_during_execute(self):
 def fatal():
@@ -196,7 +183,7 @@ class SparkConnectClientReattachTestCase(unittest.TestCase):
 self.assertEqual(1, stub.release_until_calls)
 self.assertEqual(1, stub.execute_calls)
 
-self.assertEventually(check, timeout_ms=1000)
+eventually(timeout=1, catch_assertions=True)(check)()
 
 def test_fail_and_retry_during_execute(self):
 def non_fatal():
@@ -215,7 +202,7 @@ class SparkConnectClientReattachTestCase(unittest.TestCase):
 

[spark] branch branch-3.5 updated: [SPARK-45167][CONNECT][PYTHON][3.5] Python client must call `release_all`

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 60073f31831 [SPARK-45167][CONNECT][PYTHON][3.5] Python client must 
call `release_all`
60073f31831 is described below

commit 60073f318313ab2329ea1504ef7538641433852e
Author: Martin Grund 
AuthorDate: Tue Sep 19 08:32:21 2023 +0900

[SPARK-45167][CONNECT][PYTHON][3.5] Python client must call `release_all`

### What changes were proposed in this pull request?

Cherry-pick of https://github.com/apache/spark/pull/42929

Previously the Python client would not call `release_all` after fetching 
all results and leaving the query dangling. The query would then be removed 
after the five minute timeout.

This patch adds proper testing for calling release all and release until.

In addition it fixes a test race condition where we would close the 
SparkSession which would in turn close the GRPC channel which might have 
dangling async release calls hanging.

### Why are the changes needed?
Stability

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
new UT

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42973 from grundprinzip/SPARK-45167-3.5.

Authored-by: Martin Grund 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/client/core.py  |   1 +
 python/pyspark/sql/connect/client/reattach.py  |  37 +++-
 .../sql/tests/connect/client/test_client.py| 195 -
 3 files changed, 226 insertions(+), 7 deletions(-)

diff --git a/python/pyspark/sql/connect/client/core.py 
b/python/pyspark/sql/connect/client/core.py
index 7b3299d123b..7b1aafbefeb 100644
--- a/python/pyspark/sql/connect/client/core.py
+++ b/python/pyspark/sql/connect/client/core.py
@@ -1005,6 +1005,7 @@ class SparkConnectClient(object):
 """
 Close the channel.
 """
+ExecutePlanResponseReattachableIterator.shutdown()
 self._channel.close()
 self._closed = True
 
diff --git a/python/pyspark/sql/connect/client/reattach.py 
b/python/pyspark/sql/connect/client/reattach.py
index 7e1e722d5fd..e58864b965b 100644
--- a/python/pyspark/sql/connect/client/reattach.py
+++ b/python/pyspark/sql/connect/client/reattach.py
@@ -21,7 +21,9 @@ check_dependencies(__name__)
 import warnings
 import uuid
 from collections.abc import Generator
-from typing import Optional, Dict, Any, Iterator, Iterable, Tuple, Callable, 
cast
+from typing import Optional, Dict, Any, Iterator, Iterable, Tuple, Callable, 
cast, Type, ClassVar
+from multiprocessing import RLock
+from multiprocessing.synchronize import RLock as RLockBase
 from multiprocessing.pool import ThreadPool
 import os
 
@@ -53,7 +55,30 @@ class ExecutePlanResponseReattachableIterator(Generator):
 ReleaseExecute RPCs that instruct the server to release responses that it 
already processed.
 """
 
-_release_thread_pool = ThreadPool(os.cpu_count() if os.cpu_count() else 8)
+# Lock to manage the pool
+_lock: ClassVar[RLockBase] = RLock()
+_release_thread_pool: Optional[ThreadPool] = ThreadPool(os.cpu_count() if 
os.cpu_count() else 8)
+
+@classmethod
+def shutdown(cls: Type["ExecutePlanResponseReattachableIterator"]) -> None:
+"""
+When the channel is closed, this method will be called before, to make 
sure all
+outstanding calls are closed.
+"""
+with cls._lock:
+if cls._release_thread_pool is not None:
+cls._release_thread_pool.close()
+cls._release_thread_pool.join()
+cls._release_thread_pool = None
+
+@classmethod
+def _initialize_pool_if_necessary(cls: 
Type["ExecutePlanResponseReattachableIterator"]) -> None:
+"""
+If the processing pool for the release calls is None, initialize the 
pool exactly once.
+"""
+with cls._lock:
+if cls._release_thread_pool is None:
+cls._release_thread_pool = ThreadPool(os.cpu_count() if 
os.cpu_count() else 8)
 
 def __init__(
 self,
@@ -62,6 +87,7 @@ class ExecutePlanResponseReattachableIterator(Generator):
 retry_policy: Dict[str, Any],
 metadata: Iterable[Tuple[str, str]],
 ):
+ExecutePlanResponseReattachableIterator._initialize_pool_if_necessary()
 self._request = request
 self._retry_policy = retry_policy
 if request.operation_id:
@@ -111,7 +137,6 @@ class ExecutePlanResponseReattachableIterator(Generator):
 
 self._last_returned_response_id = ret.response_id
 if ret.HasField("result_complete"):
-self._result_complete = True
 self._release_all()
 else:

[spark] branch master updated: [SPARK-45133][CONNECT][TESTS][FOLLOWUP] Add test that queries transition to FINISHED

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8323c0c48de [SPARK-45133][CONNECT][TESTS][FOLLOWUP] Add test that 
queries transition to FINISHED
8323c0c48de is described below

commit 8323c0c48de7f498ef2452059f737a167586b98d
Author: Juliusz Sompolski 
AuthorDate: Tue Sep 19 08:31:15 2023 +0900

[SPARK-45133][CONNECT][TESTS][FOLLOWUP] Add test that queries transition to 
FINISHED

### What changes were proposed in this pull request?

Add test checking that queries (also special case: local relations) 
transition to FINISHED state, even if the client does not consume the results.

### Why are the changes needed?

Add test for SPARK-45133.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This adds tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42910 from juliuszsompolski/SPARK-45133-followup.

Authored-by: Juliusz Sompolski 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/connect/SparkConnectServerTest.scala | 29 -
 .../service/SparkConnectServiceE2ESuite.scala  | 48 ++
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkConnectServerTest.scala
 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkConnectServerTest.scala
index eddd1c6be72..7b02377f484 100644
--- 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkConnectServerTest.scala
+++ 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkConnectServerTest.scala
@@ -16,14 +16,19 @@
  */
 package org.apache.spark.sql.connect
 
-import java.util.UUID
+import java.util.{TimeZone, UUID}
 
+import scala.reflect.runtime.universe.TypeTag
+
+import org.apache.arrow.memory.RootAllocator
 import org.scalatest.concurrent.{Eventually, TimeLimits}
 import org.scalatest.time.Span
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.connect.proto
+import org.apache.spark.sql.catalyst.ScalaReflection
 import org.apache.spark.sql.connect.client.{CloseableIterator, 
CustomSparkConnectBlockingStub, ExecutePlanResponseReattachableIterator, 
GrpcRetryHandler, SparkConnectClient, WrappedCloseableIterator}
+import org.apache.spark.sql.connect.client.arrow.ArrowSerializer
 import org.apache.spark.sql.connect.common.config.ConnectCommon
 import org.apache.spark.sql.connect.config.Connect
 import org.apache.spark.sql.connect.dsl.MockRemoteSession
@@ -43,6 +48,8 @@ trait SparkConnectServerTest extends SharedSparkSession {
 
   val eventuallyTimeout = 30.seconds
 
+  val allocator = new RootAllocator()
+
   override def beforeAll(): Unit = {
 super.beforeAll()
 // Other suites using mocks leave a mess in the global executionManager,
@@ -60,6 +67,7 @@ trait SparkConnectServerTest extends SharedSparkSession {
 
   override def afterAll(): Unit = {
 SparkConnectService.stop()
+allocator.close()
 super.afterAll()
   }
 
@@ -127,6 +135,19 @@ trait SparkConnectServerTest extends SharedSparkSession {
 proto.Plan.newBuilder().setRoot(dsl.sql(query)).build()
   }
 
+  protected def buildLocalRelation[A <: Product: TypeTag](data: Seq[A]) = {
+val encoder = ScalaReflection.encoderFor[A]
+val arrowData =
+  ArrowSerializer.serialize(data.iterator, encoder, allocator, 
TimeZone.getDefault.getID)
+val localRelation = proto.LocalRelation
+  .newBuilder()
+  .setData(arrowData)
+  .setSchema(encoder.schema.json)
+  .build()
+val relation = 
proto.Relation.newBuilder().setLocalRelation(localRelation).build()
+proto.Plan.newBuilder().setRoot(relation).build()
+  }
+
   protected def getReattachableIterator(
   stubIterator: CloseableIterator[proto.ExecutePlanResponse]) = {
 // This depends on the wrapping in 
CustomSparkConnectBlockingStub.executePlanReattachable:
@@ -188,6 +209,12 @@ trait SparkConnectServerTest extends SharedSparkSession {
 executions.head
   }
 
+  protected def eventuallyGetExecutionHolder: ExecuteHolder = {
+Eventually.eventually(timeout(eventuallyTimeout)) {
+  getExecutionHolder
+}
+  }
+
   protected def withClient(f: SparkConnectClient => Unit): Unit = {
 val client = SparkConnectClient
   .builder()
diff --git 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/service/SparkConnectServiceE2ESuite.scala
 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/service/SparkConnectServiceE2ESuite.scala
new file mode 100644
index 000..14ecc9a2e95
--- /dev/null
+++ 

[spark] branch master updated: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f83e5ec202b [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0
f83e5ec202b is described below

commit f83e5ec202be68d6640f9f9403d96a39ef993f82
Author: Haejoon Lee 
AuthorDate: Mon Sep 18 16:27:09 2023 -0700

[SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

### What changes were proposed in this pull request?

This PR proposes to support pandas 2.1.0 for PySpark. See [What's new in 
2.1.0](https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html) for more detail.

### Why are the changes needed?

We should follow the latest version of pandas.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing CI should passed with Pandas 2.1.0

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42793 from itholic/pandas_2.1.0.

Lead-authored-by: Haejoon Lee 
Co-authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile   |   4 +-
 .../source/migration_guide/pyspark_upgrade.rst |   3 +
 .../docs/source/reference/pyspark.pandas/frame.rst |   1 +
 python/pyspark/pandas/base.py  |   2 -
 python/pyspark/pandas/frame.py | 143 +++--
 python/pyspark/pandas/generic.py   |  42 ++
 python/pyspark/pandas/groupby.py   |  50 ++-
 python/pyspark/pandas/indexes/base.py  |   6 +-
 python/pyspark/pandas/indexes/datetimes.py |  18 +++
 python/pyspark/pandas/indexes/timedelta.py |   7 +
 python/pyspark/pandas/namespace.py |  21 ++-
 python/pyspark/pandas/plot/matplotlib.py   |   2 +-
 python/pyspark/pandas/series.py|  83 +++-
 python/pyspark/pandas/supported_api_gen.py |   2 +-
 .../pandas/tests/computation/test_corrwith.py  |   5 +-
 python/pyspark/pandas/tests/test_stats.py  |  31 -
 python/pyspark/pandas/typedef/typehints.py |  15 ++-
 python/pyspark/pandas/window.py|   4 +
 python/pyspark/sql/connect/session.py  |   3 +-
 python/pyspark/sql/pandas/conversion.py|  16 ++-
 python/pyspark/sql/pandas/serializers.py   |   8 +-
 python/pyspark/sql/pandas/types.py |  14 +-
 .../sql/tests/pandas/test_pandas_cogrouped_map.py  |   2 +-
 23 files changed, 415 insertions(+), 67 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 8e5a3cb7c05..d196d0e97c5 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -84,8 +84,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas<=2.1.0' scipy coverage matplotlib
+RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.1.0' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst 
b/python/docs/source/migration_guide/pyspark_upgrade.rst
index 3513f0a878e..d081275dc83 100644
--- a/python/docs/source/migration_guide/pyspark_upgrade.rst
+++ b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -22,6 +22,7 @@ Upgrading PySpark
 Upgrading from PySpark 3.5 to 4.0
 -
 
+* In Spark 4.0, it is recommended to use Pandas version 2.0.0 or above with 
PySpark for optimal compatibility.
 * In Spark 4.0, the minimum supported version for Pandas has been raised from 
1.0.5 to 1.4.4 in PySpark.
 * In Spark 4.0, the minimum supported version for Numpy has been raised from 
1.15 to 1.21 in PySpark.
 * In Spark 4.0, ``Int64Index`` and ``Float64Index`` have been removed from 
pandas API on Spark, ``Index`` should be used directly.
@@ -44,6 +45,8 @@ Upgrading from PySpark 3.5 to 4.0
 * In Spark 4.0, ``squeeze`` parameter from ``ps.read_csv`` and 
``ps.read_excel`` has been removed from pandas API on Spark.
 * In Spark 4.0, ``null_counts`` parameter from ``DataFrame.info`` has been 
removed from pandas API on Spark, use ``show_counts`` 

[spark] branch master updated: [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()`

2023-09-18 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 981312284f0 [SPARK-45188][SQL][DOCS] Update error messages related to 
parameterized `sql()`
981312284f0 is described below

commit 981312284f0776ca847c8d21411f74a72c639b22
Author: Max Gekk 
AuthorDate: Tue Sep 19 00:22:43 2023 +0300

[SPARK-45188][SQL][DOCS] Update error messages related to parameterized 
`sql()`

### What changes were proposed in this pull request?
In the PR, I propose to update some error formats and comments regarding 
`sql()` parameters - maps, arrays and struct might be used as `sql()` 
parameters. New behaviour has been added by 
https://github.com/apache/spark/pull/42752.

### Why are the changes needed?
To inform users about recent changes introduced by SPARK-45033.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt "core/testOnly *SparkThrowableSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42957 from MaxGekk/clean-ClientE2ETestSuite.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json  |  4 ++--
 .../scala/org/apache/spark/sql/SparkSession.scala| 11 +++
 docs/sql-error-conditions.md |  4 ++--
 python/pyspark/pandas/sql_formatter.py   |  3 ++-
 python/pyspark/sql/session.py|  3 ++-
 .../spark/sql/catalyst/analysis/parameters.scala | 14 +-
 .../scala/org/apache/spark/sql/SparkSession.scala| 20 ++--
 7 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 4740ed72f89..186e7b4640d 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1892,7 +1892,7 @@
   },
   "INVALID_SQL_ARG" : {
 "message" : [
-  "The argument  of `sql()` is invalid. Consider to replace it by a 
SQL literal."
+  "The argument  of `sql()` is invalid. Consider to replace it 
either by a SQL literal or by collection constructor functions such as `map()`, 
`array()`, `struct()`."
 ]
   },
   "INVALID_SQL_SYNTAX" : {
@@ -2768,7 +2768,7 @@
   },
   "UNBOUND_SQL_PARAMETER" : {
 "message" : [
-  "Found the unbound parameter: . Please, fix `args` and provide a 
mapping of the parameter to a SQL literal."
+  "Found the unbound parameter: . Please, fix `args` and provide a 
mapping of the parameter to either a SQL literal or collection constructor 
functions such as `map()`, `array()`, `struct()`."
 ],
 "sqlState" : "42P02"
   },
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 8788e34893e..5aa8c5a2bd5 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -235,8 +235,9 @@ class SparkSession private[sql] (
*   An array of Java/Scala objects that can be converted to SQL literal 
expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html;> 
Supported Data
*   Types for supported value types in Scala/Java. For example: 1, 
"Steven",
-   *   LocalDate.of(2023, 4, 2). A value can be also a `Column` of literal 
expression, in that
-   *   case it is taken as is.
+   *   LocalDate.of(2023, 4, 2). A value can be also a `Column` of a literal 
or collection
+   *   constructor functions such as `map()`, `array()`, `struct()`, in that 
case it is taken as
+   *   is.
*
* @since 3.5.0
*/
@@ -272,7 +273,8 @@ class SparkSession private[sql] (
*   expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html;>
*   Supported Data Types for supported value types in Scala/Java. For 
example, map keys:
*   "rank", "name", "birthdate"; map values: 1, "Steven", 
LocalDate.of(2023, 4, 2). Map value
-   *   can be also a `Column` of literal expression, in that case it is taken 
as is.
+   *   can be also a `Column` of a literal or collection constructor functions 
such as `map()`,
+   *   `array()`, `struct()`, in that case it is taken as is.
*
* @since 3.4.0
*/
@@ -292,7 +294,8 @@ class SparkSession private[sql] (
*   expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html;>
*   Supported Data Types for supported value types in Scala/Java. For 
example, map keys:
*   "rank", 

[spark] branch branch-3.4 updated: [SPARK-45081][SQL][3.4] Encoders.bean does no longer work with read-only properties

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 66f5f964f82 [SPARK-45081][SQL][3.4] Encoders.bean does no longer work 
with read-only properties
66f5f964f82 is described below

commit 66f5f964f8213e263e8aefb38a7e733753836995
Author: Giambattista Bloisi 
AuthorDate: Mon Sep 18 13:39:54 2023 -0700

[SPARK-45081][SQL][3.4] Encoders.bean does no longer work with read-only 
properties

### What changes were proposed in this pull request?
This PR re-enables Encoders.bean to be called against beans having 
read-only properties, that is properties that have only getters and no setter 
method. Beans with read only properties are even used in internal tests.
Setter methods of a Java bean encoder are stored within an Option wrapper 
because they are missing in case of read-only properties. When a java bean has 
to be initialized, setter methods for the bean properties have to be called: 
this PR filters out read-only properties from that process.

### Why are the changes needed?
The changes are required to avoid an exception to the thrown by getting the 
value of a None option object.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
An additional regression test has been added

### Was this patch authored or co-authored using generative AI tooling?
No

hvanhovell this is 3.4 branch port of 
[PR-42829](https://github.com/apache/spark/pull/42829)

Closes #42913 from gbloisi-openaire/SPARK-45081-branch-3.4.

Authored-by: Giambattista Bloisi 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/catalyst/ScalaReflection.scala  |  4 +++-
 .../java/test/org/apache/spark/sql/JavaDatasetSuite.java | 16 
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 2e03f32a58d..b18613bdad3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -345,7 +345,9 @@ object ScalaReflection extends ScalaReflection {
 CreateExternalRow(convertedFields, enc.schema))
 
 case JavaBeanEncoder(tag, fields) =>
-  val setters = fields.map { f =>
+  val setters = fields
+.filter(_.writeMethod.isDefined)
+.map { f =>
 val newTypePath = walkedTypePath.recordField(
   f.enc.clsTag.runtimeClass.getName,
   f.name)
diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 228b7855142..6a9ffef6991 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -1711,6 +1711,22 @@ public class JavaDatasetSuite implements Serializable {
 Assert.assertEquals(1, df.collectAsList().size());
   }
 
+  public static class ReadOnlyPropertyBean implements Serializable {
+public boolean isEmpty() {
+  return true;
+}
+  }
+
+  @Test
+  public void testReadOnlyPropertyBean() {
+ReadOnlyPropertyBean bean = new ReadOnlyPropertyBean();
+List data = Arrays.asList(bean);
+Dataset df = spark.createDataset(data,
+Encoders.bean(ReadOnlyPropertyBean.class));
+Assert.assertEquals(1, df.schema().length());
+Assert.assertEquals(1, df.collectAsList().size());
+  }
+
   public class CircularReference1Bean implements Serializable {
 private CircularReference2Bean child;
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45153][BUILD][PS] Rebalance testing time for `pyspark-pandas-connect-part1`

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 29acdf755cd [SPARK-45153][BUILD][PS] Rebalance testing time for 
`pyspark-pandas-connect-part1`
29acdf755cd is described below

commit 29acdf755cd0a5c88c5cc5c8c16947a5b8e840f9
Author: Haejoon Lee 
AuthorDate: Mon Sep 18 12:02:28 2023 -0700

[SPARK-45153][BUILD][PS] Rebalance testing time for 
`pyspark-pandas-connect-part1`

### What changes were proposed in this pull request?

This PR proposes to rebalance the tests for `pyspark-pandas-connect-part1`.

### Why are the changes needed?

We rebalance the CI by splitting slow tests into multiple parts, but 
`pyspark-pandas-connect-part1` takes almost an hour than other splitted 
`pyspark-pandas-connect-partx` tests as below:


|pyspark-pandas-connect-part0|pyspark-pandas-connect-part1|pyspark-pandas-connect-part2|
|--|--|--|
|1h 51m|2h 55m|1h 54m|

### Does this PR introduce _any_ user-facing change?

No, this PR is proposed to improve the build infra.

### How was this patch tested?

We should manually check the CI from GitHub Actions after PR is opened.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42909 from itholic/SPARK-45153.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml |  2 ++
 dev/sparktestsupport/modules.py  | 46 
 dev/sparktestsupport/utils.py| 14 +--
 3 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 9c5d25d30af..39fd2796015 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -372,6 +372,8 @@ jobs:
 pyspark-pandas-connect-part1
   - >-
 pyspark-pandas-connect-part2
+  - >-
+pyspark-pandas-connect-part3
 env:
   MODULES_TO_TEST: ${{ matrix.modules }}
   HADOOP_PROFILE: ${{ inputs.hadoop }}
diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 0a751052491..a3bfa288383 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -965,7 +965,6 @@ pyspark_pandas_connect_part0 = Module(
 "pyspark.pandas.tests.connect.test_parity_utils",
 "pyspark.pandas.tests.connect.test_parity_window",
 "pyspark.pandas.tests.connect.indexes.test_parity_base",
-"pyspark.pandas.tests.connect.indexes.test_parity_datetime",
 "pyspark.pandas.tests.connect.indexes.test_parity_align",
 "pyspark.pandas.tests.connect.indexes.test_parity_indexing",
 "pyspark.pandas.tests.connect.indexes.test_parity_reindex",
@@ -982,7 +981,10 @@ pyspark_pandas_connect_part0 = Module(
 "pyspark.pandas.tests.connect.computation.test_parity_describe",
 "pyspark.pandas.tests.connect.computation.test_parity_eval",
 "pyspark.pandas.tests.connect.computation.test_parity_melt",
-"pyspark.pandas.tests.connect.computation.test_parity_pivot",
+"pyspark.pandas.tests.connect.groupby.test_parity_stat",
+"pyspark.pandas.tests.connect.frame.test_parity_attrs",
+"pyspark.pandas.tests.connect.diff_frames_ops.test_parity_dot_frame",
+"pyspark.pandas.tests.connect.diff_frames_ops.test_parity_dot_series",
 ],
 excluded_python_implementations=[
 "PyPy"  # Skip these tests under PyPy since they require numpy, 
pandas, and pyarrow and
@@ -999,7 +1001,6 @@ pyspark_pandas_connect_part1 = Module(
 ],
 python_test_goals=[
 # pandas-on-Spark unittests
-"pyspark.pandas.tests.connect.frame.test_parity_attrs",
 "pyspark.pandas.tests.connect.frame.test_parity_constructor",
 "pyspark.pandas.tests.connect.frame.test_parity_conversion",
 "pyspark.pandas.tests.connect.frame.test_parity_reindexing",
@@ -1012,21 +1013,12 @@ pyspark_pandas_connect_part1 = Module(
 "pyspark.pandas.tests.connect.groupby.test_parity_aggregate",
 "pyspark.pandas.tests.connect.groupby.test_parity_apply_func",
 "pyspark.pandas.tests.connect.groupby.test_parity_cumulative",
-"pyspark.pandas.tests.connect.groupby.test_parity_describe",
-"pyspark.pandas.tests.connect.groupby.test_parity_groupby",
-"pyspark.pandas.tests.connect.groupby.test_parity_head_tail",
-"pyspark.pandas.tests.connect.groupby.test_parity_index",
 "pyspark.pandas.tests.connect.groupby.test_parity_missing_data",
 "pyspark.pandas.tests.connect.groupby.test_parity_split_apply",
-"pyspark.pandas.tests.connect.groupby.test_parity_stat",
 

[spark] branch branch-3.4 updated: [SPARK-45078][SQL][3.4] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 464811a243c [SPARK-45078][SQL][3.4] Fix `array_insert` 
ImplicitCastInputTypes not work
464811a243c is described below

commit 464811a243cbf1502a26d4440c484cfce13d4ddc
Author: Jia Fan 
AuthorDate: Mon Sep 18 11:55:23 2023 -0700

[SPARK-45078][SQL][3.4] Fix `array_insert` ImplicitCastInputTypes not work

### What changes were proposed in this pull request?
This is a backport PR for https://github.com/apache/spark/pull/42951, to 
fix `array_insert` ImplicitCastInputTypes not work.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42960 from Hisoka-X/arrayinsert-fix-3.4.

Authored-by: Jia Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/expressions/collectionOperations.scala | 1 -
 sql/core/src/test/resources/sql-tests/inputs/array.sql| 1 +
 sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out  | 8 
 sql/core/src/test/resources/sql-tests/results/array.sql.out   | 8 
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index fd4249f4776..629ae0499b4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -4642,7 +4642,6 @@ case class ArrayInsert(
 }
   case (e1, e2, e3) => Seq.empty
 }
-Seq.empty
   }
 
   override def checkInputDataTypes(): TypeCheckResult = {
diff --git a/sql/core/src/test/resources/sql-tests/inputs/array.sql 
b/sql/core/src/test/resources/sql-tests/inputs/array.sql
index 606ed14cbe0..b3834b2e816 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/array.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/array.sql
@@ -141,6 +141,7 @@ select array_insert(array(1, 2, 3, NULL), cast(NULL as 
INT), 4);
 select array_insert(array(1, 2, 3, NULL), 4, cast(NULL as INT));
 select array_insert(array(2, 3, NULL, 4), 5, 5);
 select array_insert(array(2, 3, NULL, 4), -5, 1);
+select array_insert(array(1), 2, cast(2 as tinyint));
 
 set spark.sql.legacy.negativeIndexInArrayInsert=true;
 select array_insert(array(1, 3, 4), -2, 2);
diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out 
b/sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out
index 6d17904271d..2aa818a48ef 100644
--- a/sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/ansi/array.sql.out
@@ -659,6 +659,14 @@ struct>
 [1,2,3,null,4]
 
 
+-- !query
+select array_insert(array(1), 2, cast(2 as tinyint))
+-- !query schema
+struct>
+-- !query output
+[1,2]
+
+
 -- !query
 set spark.sql.legacy.negativeIndexInArrayInsert=true
 -- !query schema
diff --git a/sql/core/src/test/resources/sql-tests/results/array.sql.out 
b/sql/core/src/test/resources/sql-tests/results/array.sql.out
index 8943c36ea42..322bff9a886 100644
--- a/sql/core/src/test/resources/sql-tests/results/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/array.sql.out
@@ -540,6 +540,14 @@ struct>
 [1,2,3,null,4]
 
 
+-- !query
+select array_insert(array(1), 2, cast(2 as tinyint))
+-- !query schema
+struct>
+-- !query output
+[1,2]
+
+
 -- !query
 set spark.sql.legacy.negativeIndexInArrayInsert=true
 -- !query schema


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45203][SQL][TEST] Join tests in KeyGroupedPartitioningSuite should use merge join hint

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eadb591f37a [SPARK-45203][SQL][TEST] Join tests in 
KeyGroupedPartitioningSuite should use merge join hint
eadb591f37a is described below

commit eadb591f37a118096bab637e4b6ca913c2753a6b
Author: Wenchen Fan 
AuthorDate: Mon Sep 18 11:50:16 2023 -0700

[SPARK-45203][SQL][TEST] Join tests in KeyGroupedPartitioningSuite should 
use merge join hint

### What changes were proposed in this pull request?

It's more robust to use join hints to enforce sort-merge join in the tests, 
instead of setting configs which may be ineffective after more advanced 
optimizations in the future.

### Why are the changes needed?

make tests more future-proof

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42983 from cloud-fan/minor.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../connector/KeyGroupedPartitioningSuite.scala| 246 -
 1 file changed, 93 insertions(+), 153 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
index ffd1c8e31e9..4cb5457b66b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
@@ -18,6 +18,7 @@ package org.apache.spark.sql.connector
 
 import java.util.Collections
 
+import org.apache.spark.SparkConf
 import org.apache.spark.sql.{DataFrame, Row}
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.{Literal, TransformExpression}
@@ -44,25 +45,9 @@ class KeyGroupedPartitioningSuite extends 
DistributionAndOrderingSuiteBase {
 UnboundBucketFunction,
 UnboundTruncateFunction)
 
-  private var originalV2BucketingEnabled: Boolean = false
-  private var originalAutoBroadcastJoinThreshold: Long = -1
-
-  override def beforeAll(): Unit = {
-super.beforeAll()
-originalV2BucketingEnabled = conf.getConf(V2_BUCKETING_ENABLED)
-conf.setConf(V2_BUCKETING_ENABLED, true)
-originalAutoBroadcastJoinThreshold = 
conf.getConf(AUTO_BROADCASTJOIN_THRESHOLD)
-conf.setConf(AUTO_BROADCASTJOIN_THRESHOLD, -1L)
-  }
-
-  override def afterAll(): Unit = {
-try {
-  super.afterAll()
-} finally {
-  conf.setConf(V2_BUCKETING_ENABLED, originalV2BucketingEnabled)
-  conf.setConf(AUTO_BROADCASTJOIN_THRESHOLD, 
originalAutoBroadcastJoinThreshold)
-}
-  }
+  override def sparkConf: SparkConf = super.sparkConf
+.set(V2_BUCKETING_ENABLED, true)
+.set(AUTO_BROADCASTJOIN_THRESHOLD, -1L)
 
   before {
 functions.foreach { f =>
@@ -261,6 +246,25 @@ class KeyGroupedPartitioningSuite extends 
DistributionAndOrderingSuiteBase {
   .add("order_amount", DoubleType)
   .add("customer_id", LongType)
 
+  private def selectWithMergeJoinHint(t1: String, t2: String): String = {
+s"SELECT /*+ MERGE($t1, $t2) */ "
+  }
+
+  private def createJoinTestDF(
+  keys: Seq[(String, String)],
+  extraColumns: Seq[String] = Nil,
+  joinType: String = ""): DataFrame = {
+val extraColList = if (extraColumns.isEmpty) "" else 
extraColumns.mkString(", ", ", ", "")
+sql(
+  s"""
+ |${selectWithMergeJoinHint("i", "p")}
+ |id, name, i.price as purchase_price, p.price as sale_price 
$extraColList
+ |FROM testcat.ns.$items i $joinType JOIN testcat.ns.$purchases p
+ |ON ${keys.map(k => s"i.${k._1} = p.${k._2}").mkString(" AND ")}
+ |ORDER BY id, purchase_price, sale_price $extraColList
+ |""".stripMargin)
+  }
+
   private def testWithCustomersAndOrders(
   customers_partitions: Array[Transform],
   orders_partitions: Array[Transform],
@@ -273,9 +277,13 @@ class KeyGroupedPartitioningSuite extends 
DistributionAndOrderingSuiteBase {
 sql(s"INSERT INTO testcat.ns.$orders VALUES " +
 s"(100.0, 1), (200.0, 1), (150.0, 2), (250.0, 2), (350.0, 2), (400.50, 
3)")
 
-val df = sql("SELECT customer_name, customer_age, order_amount " +
-s"FROM testcat.ns.$customers c JOIN testcat.ns.$orders o " +
-"ON c.customer_id = o.customer_id ORDER BY c.customer_id, 
order_amount")
+val df = sql(
+  s"""
+|${selectWithMergeJoinHint("c", "o")}
+|customer_name, customer_age, order_amount
+|FROM testcat.ns.$customers c JOIN testcat.ns.$orders o
+|ON c.customer_id = o.customer_id ORDER BY c.customer_id, 

[GitHub] [spark-website] allisonwang-db commented on a diff in pull request #477: [SPARK-45195] Update examples with docker official image

2023-09-18 Thread via GitHub


allisonwang-db commented on code in PR #477:
URL: https://github.com/apache/spark-website/pull/477#discussion_r1329098596


##
index.md:
##
@@ -88,11 +88,13 @@ navigation:
 
 
 Run now
-Installing with 'pip'
+Install with 'pip' or try 
offical image
 
 
 $ pip install pyspark
 $ pyspark
+$ 
+$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark

Review Comment:
   @zhengruifeng I feel this "try official image" is a bit unclear. Maybe we 
can reformat to this?
   ```
   Install with pip
   
   Install with docker
   
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8d363c6e2c8 -> 0dda75f824d)

2023-09-18 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8d363c6e2c8 [SPARK-45196][PYTHON][DOCS] Refine docstring of 
`array/array_contains/arrays_overlap`
 add 0dda75f824d [SPARK-45137][CONNECT] Support map/array parameters in 
parameterized `sql()`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  |   6 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |   7 +
 .../src/main/protobuf/spark/connect/commands.proto |  12 +-
 .../main/protobuf/spark/connect/relations.proto|  12 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  26 +-
 python/pyspark/sql/connect/proto/commands_pb2.py   | 164 +++--
 python/pyspark/sql/connect/proto/commands_pb2.pyi  |  60 -
 python/pyspark/sql/connect/proto/relations_pb2.py  | 268 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  60 -
 9 files changed, 396 insertions(+), 219 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45196][PYTHON][DOCS] Refine docstring of `array/array_contains/arrays_overlap`

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d363c6e2c8 [SPARK-45196][PYTHON][DOCS] Refine docstring of 
`array/array_contains/arrays_overlap`
8d363c6e2c8 is described below

commit 8d363c6e2c84c0dbbb51b9376bb4c2a4d1be3acf
Author: yangjie01 
AuthorDate: Mon Sep 18 09:08:38 2023 -0700

[SPARK-45196][PYTHON][DOCS] Refine docstring of 
`array/array_contains/arrays_overlap`

### What changes were proposed in this pull request?
This pr refine docstring of `array/array_contains/arrays_overlap` and add 
some new examples.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42972 from LuciferYang/collect-1.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/functions.py | 191 ++--
 1 file changed, 164 insertions(+), 27 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 3c65e8d9162..54bd330ebc0 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -11686,7 +11686,8 @@ def array(__cols: Union[List["ColumnOrName_"], 
Tuple["ColumnOrName_", ...]]) ->
 def array(
 *cols: Union["ColumnOrName", Union[List["ColumnOrName_"], 
Tuple["ColumnOrName_", ...]]]
 ) -> Column:
-"""Creates a new array column.
+"""
+Collection function: Creates a new array column from the input columns or 
column names.
 
 .. versionadded:: 1.4.0
 
@@ -11696,25 +11697,63 @@ def array(
 Parameters
 --
 cols : :class:`~pyspark.sql.Column` or str
-column names or :class:`~pyspark.sql.Column`\\s that have
-the same data type.
+Column names or :class:`~pyspark.sql.Column` objects that have the 
same data type.
 
 Returns
 ---
 :class:`~pyspark.sql.Column`
-a column of array type.
+A new Column of array type, where each value is an array containing 
the corresponding values
+from the input columns.
 
 Examples
 
+Example 1: Basic usage of array function with column names.
+
+>>> from pyspark.sql import functions as sf
 >>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
->>> df.select(array('age', 'age').alias("arr")).collect()
-[Row(arr=[2, 2]), Row(arr=[5, 5])]
->>> df.select(array([df.age, df.age]).alias("arr")).collect()
-[Row(arr=[2, 2]), Row(arr=[5, 5])]
->>> df.select(array('age', 'age').alias("col")).printSchema()
-root
- |-- col: array (nullable = false)
- ||-- element: long (containsNull = true)
+>>> df.select(sf.array('name', 'age').alias("arr")).show()
++--+
+|   arr|
++--+
+|[Alice, 2]|
+|  [Bob, 5]|
++--+
+
+Example 2: Usage of array function with Column objects.
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
+>>> df.select(sf.array(df.name, df.age).alias("arr")).show()
++--+
+|   arr|
++--+
+|[Alice, 2]|
+|  [Bob, 5]|
++--+
+
+Example 3: Single argument as list of column names.
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
+>>> df.select(sf.array(['name', 'age']).alias("arr")).show()
++--+
+|   arr|
++--+
+|[Alice, 2]|
+|  [Bob, 5]|
++--+
+
+Example 4: array function with a column containing null values.
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([("Alice", None), ("Bob", 5)], ("name", 
"age"))
+>>> df.select(sf.array('name', 'age').alias("arr")).show()
++-+
+|  arr|
++-+
+|[Alice, NULL]|
+| [Bob, 5]|
++-+
 """
 if len(cols) == 1 and isinstance(cols[0], (list, set)):
 cols = cols[0]  # type: ignore[assignment]
@@ -11724,8 +11763,9 @@ def array(
 @_try_remote_functions
 def array_contains(col: "ColumnOrName", value: Any) -> Column:
 """
-Collection function: returns null if the array is null, true if the array 
contains the
-given value, and false otherwise.
+Collection function: This function returns a boolean indicating whether 
the array
+contains the given value, returning null if the array is null, true if the 
array
+contains the given value, and false otherwise.
 
 .. versionadded:: 1.5.0
 
@@ -11735,22 

[spark] branch master updated: [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM dependent tests

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 02bbd2867be [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear 
message for JVM dependent tests
02bbd2867be is described below

commit 02bbd2867be52187239074f13617ef0ba2acf09e
Author: Haejoon Lee 
AuthorDate: Mon Sep 18 09:02:52 2023 -0700

[SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM 
dependent tests

### What changes were proposed in this pull request?

This PR proposes to correct the message for JVM only tests from Spark 
Connect, and enable the tests when possible to workaround without JVM features.

### Why are the changes needed?

Among JVM-dependent tests, there are tests that can be replaced without 
using JVM features, while there are some edge tests that can only be tested 
using JVM features. We need to be clearer about why these cannot be tested.

### Does this PR introduce _any_ user-facing change?

No. it's test-only

### How was this patch tested?

Updated the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42955 from itholic/connect_jvm_tests.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 .../pandas/tests/connect/groupby/test_parity_apply_func.py| 2 +-
 .../tests/connect/plot/test_parity_frame_plot_matplotlib.py   | 2 +-
 .../pandas/tests/connect/plot/test_parity_frame_plot_plotly.py| 2 +-
 .../tests/connect/plot/test_parity_series_plot_matplotlib.py  | 2 +-
 .../pandas/tests/connect/plot/test_parity_series_plot_plotly.py   | 2 +-
 python/pyspark/pandas/tests/connect/test_parity_frame_spark.py| 8 
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git 
a/python/pyspark/pandas/tests/connect/groupby/test_parity_apply_func.py 
b/python/pyspark/pandas/tests/connect/groupby/test_parity_apply_func.py
index 2c0d49ebb5c..4daf84bd1b5 100644
--- a/python/pyspark/pandas/tests/connect/groupby/test_parity_apply_func.py
+++ b/python/pyspark/pandas/tests/connect/groupby/test_parity_apply_func.py
@@ -24,7 +24,7 @@ from pyspark.testing.pandasutils import PandasOnSparkTestUtils
 class GroupbyParityApplyFuncTests(
 GroupbyApplyFuncMixin, PandasOnSparkTestUtils, ReusedConnectTestCase
 ):
-@unittest.skip("TODO(SPARK-43629): Enable RDD dependent tests with Spark 
Connect.")
+@unittest.skip("Test depends on SparkContext which is not supported from 
Spark Connect.")
 def test_apply_with_side_effect(self):
 super().test_apply_with_side_effect()
 
diff --git 
a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py
index 98da8858a03..3f615326f2b 100644
--- 
a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py
+++ 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_matplotlib.py
@@ -28,7 +28,7 @@ class DataFramePlotMatplotlibParityTests(
 def test_hist_plot(self):
 super().test_hist_plot()
 
-@unittest.skip("TODO(SPARK-43629): Enable RDD with Spark Connect.")
+@unittest.skip("TODO(SPARK-44372): Enable KernelDensity within Spark 
Connect.")
 def test_kde_plot(self):
 super().test_kde_plot()
 
diff --git 
a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py 
b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py
index 7a3efee06df..16b97d6814e 100644
--- a/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py
+++ b/python/pyspark/pandas/tests/connect/plot/test_parity_frame_plot_plotly.py
@@ -32,7 +32,7 @@ class DataFramePlotPlotlyParityTests(
 def test_hist_plot(self):
 super().test_hist_plot()
 
-@unittest.skip("TODO(SPARK-43629): Enable RDD with Spark Connect.")
+@unittest.skip("TODO(SPARK-44372): Enable KernelDensity within Spark 
Connect.")
 def test_kde_plot(self):
 super().test_kde_plot()
 
diff --git 
a/python/pyspark/pandas/tests/connect/plot/test_parity_series_plot_matplotlib.py
 
b/python/pyspark/pandas/tests/connect/plot/test_parity_series_plot_matplotlib.py
index 975d78f17c6..9e264e76229 100644
--- 
a/python/pyspark/pandas/tests/connect/plot/test_parity_series_plot_matplotlib.py
+++ 
b/python/pyspark/pandas/tests/connect/plot/test_parity_series_plot_matplotlib.py
@@ -32,7 +32,7 @@ class SeriesPlotMatplotlibParityTests(
 def test_hist_plot(self):
 super().test_hist_plot()
 
-@unittest.skip("TODO(SPARK-43629): Enable RDD with Spark Connect.")
+@unittest.skip("TODO(SPARK-44372): Enable KernelDensity within Spark 
Connect.")
 def test_kde_plot(self):
 super().test_kde_plot()
 
diff --git 

[spark] branch master updated (82d0823c2f9 -> 1ae8e018164)

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 82d0823c2f9 [SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Move 
`from_xml`/`schema_of_xml` to `Xml Functions`
 add 1ae8e018164 [SPARK-45056][CONNECT][TESTS] Change the tests related to 
Python in `SparkConnectSessionHodlerSuite` to assume shouldTestPandasUDFs

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connect/service/SparkConnectSessionHodlerSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Move `from_xml`/`schema_of_xml` to `Xml Functions`

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 82d0823c2f9 [SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Move 
`from_xml`/`schema_of_xml` to `Xml Functions`
82d0823c2f9 is described below

commit 82d0823c2f9b5b1d60c8799cf6bf7924c24acbe1
Author: Ruifeng Zheng 
AuthorDate: Mon Sep 18 08:58:25 2023 -0700

[SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Move `from_xml`/`schema_of_xml` to 
`Xml Functions`

### What changes were proposed in this pull request?
Move `from_xml`/`schema_of_xml` to `Xml Functions`

### Why are the changes needed?
there is a dedicated function group for xml functions

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #42977 from zhengruifeng/update_group.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/docs/source/reference/pyspark.sql/functions.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/docs/source/reference/pyspark.sql/functions.rst 
b/python/docs/source/reference/pyspark.sql/functions.rst
index 6e00c859da4..e21c05343da 100644
--- a/python/docs/source/reference/pyspark.sql/functions.rst
+++ b/python/docs/source/reference/pyspark.sql/functions.rst
@@ -264,8 +264,6 @@ Collection Functions
 str_to_map
 to_csv
 try_element_at
-from_xml
-schema_of_xml
 
 
 Partition Transformation Functions
@@ -531,6 +529,8 @@ Xml Functions
 .. autosummary::
 :toctree: api/
 
+from_xml
+schema_of_xml
 xpath
 xpath_boolean
 xpath_double


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 series

2023-09-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c1c58698d3d [SPARK-45148][BUILD] Upgrade scalatest related 
dependencies to the 3.2.17 series
c1c58698d3d is described below

commit c1c58698d3d6b1447045fad592f8dfb0395989d1
Author: yangjie01 
AuthorDate: Mon Sep 18 10:01:47 2023 -0500

[SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 
series

### What changes were proposed in this pull request?
This pr aims upgrade `scalatest` related test dependencies to 3.2.17:

- scalatest: upgrade scalatest to 3.2.17
- scalatestplus
   - scalacheck: upgrade to `scalacheck-1-17` 3.2.17.0
   - mockito: upgrade to `mockito-4-11` to 3.2.17.0
   - selenium: uprade to `selenium-4-12` to 3.2.17.0 and `selenium-java` to 
4.12.1, `htmlunit-driver` to 4.12.0, byte-buddy and byte-buddy-agent to 1.14.5

### Why are the changes needed?
The release notes as follows:

- 
scalatest:https://github.com/scalatest/scalatest/releases/tag/release-3.2.17
- scalatestplus
   - scalacheck-1-17: 
https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.17.0-for-scalacheck-1.17
   - mockito-4-11: 
https://github.com/scalatest/scalatestplus-mockito/releases/tag/release-3.2.17.0-for-mockito-4.11
   - selenium-4-12: 
https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.17.0-for-selenium-4.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual test:
   - ChromeUISeleniumSuite
   - RocksDBBackendChromeUIHistoryServerSuite

```
build/sbt -Dguava.version=32.1.2-jre 
-Dspark.test.webdriver.chrome.driver=/Users/yangjie01/Tools/chromedriver 
-Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly 
org.apache.spark.ui.ChromeUISeleniumSuite"

build/sbt -Dguava.version=32.1.2-jre 
-Dspark.test.webdriver.chrome.driver=/Users/yangjie01/Tools/chromedriver 
-Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly 
org.apache.spark.deploy.history.RocksDBBackendChromeUIHistoryServerSuite"
```

```
[info] ChromeUISeleniumSuite:
[info] - SPARK-31534: text for tooltip should be escaped (1 second, 809 
milliseconds)
[info] - SPARK-31882: Link URL for Stage DAGs should not depend on paged 
table. (604 milliseconds)
[info] - SPARK-31886: Color barrier execution mode RDD correctly (252 
milliseconds)
[info] - Search text for paged tables should not be saved (1 second, 309 
milliseconds)
[info] Run completed in 6 seconds, 116 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

```
[info] RocksDBBackendChromeUIHistoryServerSuite:
[info] - ajax rendered relative links are prefixed with uiRoot 
(spark.ui.proxyBase) (1 second, 615 milliseconds)
[info] Run completed in 5 seconds, 130 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 27 s, completed 2023-9-14 11:29:27
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42906 from LuciferYang/SPARK-45148.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Sean Owen 
---
 pom.xml | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/pom.xml b/pom.xml
index 779f9e64f1d..971cb07ea40 100644
--- a/pom.xml
+++ b/pom.xml
@@ -214,8 +214,8 @@
 
 4.9.3
 1.1
-4.9.1
-4.9.1
+4.12.1
+4.12.0
 2.70.0
 3.1.0
 1.1.0
@@ -413,7 +413,7 @@
 
 
   org.scalatestplus
-  selenium-4-9_${scala.binary.version}
+  selenium-4-12_${scala.binary.version}
   test
 
 
@@ -1137,25 +1137,25 @@
   
 org.scalatest
 scalatest_${scala.binary.version}
-3.2.16
+3.2.17
 test
   
   
 org.scalatestplus
 scalacheck-1-17_${scala.binary.version}
-3.2.16.0
+3.2.17.0
 test
   
   
 org.scalatestplus
 mockito-4-11_${scala.binary.version}
-3.2.16.0
+3.2.17.0
 test
   
   
 org.scalatestplus
-selenium-4-9_${scala.binary.version}
-3.2.16.0
+selenium-4-12_${scala.binary.version}
+3.2.17.0
 test
   
   
@@ -1173,13 +1173,13 @@
   
 net.bytebuddy
 byte-buddy
-1.14.4
+

[spark-website] branch asf-site updated: [SPARK-45195] Update examples with docker official image

2023-09-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6b10f7fd85 [SPARK-45195] Update examples with docker official image
6b10f7fd85 is described below

commit 6b10f7fd85327f97cc12bede9ce5c60a744d9063
Author: Ruifeng Zheng 
AuthorDate: Mon Sep 18 07:31:46 2023 -0500

[SPARK-45195] Update examples with docker official image

1, add `docker run` commands for PySpark and SparkR;
2, switch to docker official image for SQL, Scala and Java;

refer to https://hub.docker.com/_/spark

also manually checked all the commands, e,g,:
```
ruifeng.zhengx:~$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
23/09/18 06:02:30 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
  /_/

Using Python version 3.8.10 (default, May 26 2023 14:05:08)
Spark context Web UI available at http://4861f70118ab:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1695016951087).
SparkSession available as 'spark'.
>>> spark.range(0, 10).show()
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

```

Author: Ruifeng Zheng 

Closes #477 from zhengruifeng/offical_image.
---
 index.md| 12 +++-
 site/index.html | 12 +++-
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/index.md b/index.md
index a41e4c9e81..ada6242742 100644
--- a/index.md
+++ b/index.md
@@ -88,11 +88,13 @@ navigation:
 
 
 Run now
-Installing with 'pip'
+Install with 'pip' or try 
offical image
 
 
 $ pip install pyspark
 $ pyspark
+$ 
+$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
 
 
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-sql
+$ docker run -it --rm spark /opt/spark/bin/spark-sql
 spark-sql>
 
 
@@ -175,7 +177,7 @@ FROM json.`logs.json`
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-shell
+$ docker run -it --rm spark 
/opt/spark/bin/spark-shell
 scala>
 
 
@@ -193,7 +195,7 @@ df.where("age > 21")
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-shell
+$ docker run -it --rm spark 
/opt/spark/bin/spark-shell
 scala>
 
 
@@ -210,7 +212,7 @@ df.where("age > 21")
 
 Run now
 
-$ SPARK-HOME/bin/sparkR
+$ docker run -it --rm spark:r /opt/spark/bin/sparkR
 >
 
 
diff --git a/site/index.html b/site/index.html
index e1b0b7e416..3ccc7104ce 100644
--- a/site/index.html
+++ b/site/index.html
@@ -213,11 +213,13 @@
 
 
 Run now
-Installing with 'pip'
+Install with 'pip' or try 
offical image
 
 
 $ pip install pyspark
 $ pyspark
+$ 
+$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
 
 
 
@@ -273,7 +275,7 @@
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-sql
+$ docker run -it --rm spark /opt/spark/bin/spark-sql
 spark-sql
 
 
@@ -293,7 +295,7 @@
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-shell
+$ docker run -it --rm spark 
/opt/spark/bin/spark-shell
 

[GitHub] [spark-website] srowen closed pull request #477: [SPARK-45195] Update examples with docker official image

2023-09-18 Thread via GitHub


srowen closed pull request #477: [SPARK-45195] Update examples with docker 
official image
URL: https://github.com/apache/spark-website/pull/477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45197][CORE] Make `StandaloneRestServer` add `JavaModuleOptions` to drivers

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 360adc2250b [SPARK-45197][CORE] Make `StandaloneRestServer` add 
`JavaModuleOptions` to drivers
360adc2250b is described below

commit 360adc2250bccdb0fbe559dcd1fc4b6b4c7c1d7a
Author: Dongjoon Hyun 
AuthorDate: Mon Sep 18 04:10:22 2023 -0700

[SPARK-45197][CORE] Make `StandaloneRestServer` add `JavaModuleOptions` to 
drivers

### What changes were proposed in this pull request?

This PR aims to make `StandaloneRestServer` add `JavaModuleOptions` to 
drivers by default.

### Why are the changes needed?

Since Apache Spark 3.3.0 (SPARK-36796, #34153), `SparkContext` adds 
`JavaModuleOptions` by default.

We had better add `JavaModuleOptions` when `StandaloneRestServer` receives 
submissions via REST API, too. Otherwise, it fails like the following if the 
users don't set it manually.

**SUBMISSION**
```bash
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" sbin/start-master.sh
$ curl -s -k -XPOST http://yourserver:6066/v1/submissions/create \
--header "Content-Type:application/json;charset=UTF-8" \
--data '{
  "appResource": "",
  "sparkProperties": {
"spark.master": "local[2]",
"spark.app.name": "Test 1",
"spark.submit.deployMode": "cluster",
"spark.jars": 
"/Users/dongjoon/APACHE/spark-release/spark-3.5.0-bin-hadoop3/examples/jars/spark-examples_2.12-3.5.0.jar"
  },
  "clientSparkVersion": "",
  "mainClass": "org.apache.spark.examples.SparkPi",
  "environmentVariables": {},
  "action": "CreateSubmissionRequest",
  "appArgs": []
}'
```

**DRIVER `stderr` LOG**
```
Exception in thread "main" java.lang.reflect.InvocationTargetException
...
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.IllegalAccessError: class 
org.apache.spark.storage.StorageUtils$
(in unnamed module 0x6d7a93c9) cannot access class sun.nio.ch.DirectBuffer
(in module java.base) because module java.base does not export sun.nio.ch
to unnamed module 0x6d7a93c9
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42975 from dongjoon-hyun/SPARK-45197.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/deploy/rest/StandaloneRestServer.scala| 14 +-
 .../spark/deploy/rest/StandaloneRestSubmitSuite.scala  | 12 
 .../java/org/apache/spark/launcher/JavaModuleOptions.java  |  8 
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala 
b/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala
index c060ef9da8c..a298e4f6dbf 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala
@@ -24,7 +24,7 @@ import org.apache.spark.{SPARK_VERSION => sparkVersion, 
SparkConf}
 import org.apache.spark.deploy.{Command, DeployMessages, DriverDescription}
 import org.apache.spark.deploy.ClientArguments._
 import org.apache.spark.internal.config
-import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.launcher.{JavaModuleOptions, SparkLauncher}
 import org.apache.spark.resource.ResourceUtils
 import org.apache.spark.rpc.RpcEndpointRef
 import org.apache.spark.util.Utils
@@ -124,7 +124,10 @@ private[rest] class StandaloneSubmitRequestServlet(
* fields used by python applications since python is not supported in 
standalone
* cluster mode yet.
*/
-  private def buildDriverDescription(request: CreateSubmissionRequest): 
DriverDescription = {
+  private[rest] def buildDriverDescription(
+  request: CreateSubmissionRequest,
+  masterUrl: String,
+  masterRestPort: Int): DriverDescription = {
 // Required fields, including the main class because python is not yet 
supported
 val appResource = Option(request.appResource).getOrElse {
   throw new SubmitRestMissingFieldException("Application jar is missing.")
@@ -149,7 +152,6 @@ private[rest] class StandaloneSubmitRequestServlet(
 // the driver.
 val masters = sparkProperties.get("spark.master")
 val (_, masterPort) = Utils.extractHostPortFromSparkUrl(masterUrl)
-val masterRestPort = this.conf.get(config.MASTER_REST_SERVER_PORT)
 val updatedMasters = masters.map(
   

[spark] branch master updated: [SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Add from_xml and schema_of_xml into PySpark documentation

2023-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5b89bc156cb [SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Add from_xml and 
schema_of_xml into PySpark documentation
5b89bc156cb is described below

commit 5b89bc156cb50c3ddc6bf4174e481e4f2a3f4491
Author: Hyukjin Kwon 
AuthorDate: Mon Sep 18 02:02:26 2023 -0700

[SPARK-44788][PYTHON][DOCS][FOLLOW-UP] Add from_xml and schema_of_xml into 
PySpark documentation

### What changes were proposed in this pull request?

This PR adds `from_xml` and `schema_of_xml` into PySpark documentation

### Why are the changes needed?

For users to know how to use them.

### Does this PR introduce _any_ user-facing change?

Yes, It adds both `from_xml` and `schema_of_xml` into Python documentation

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42974 from HyukjinKwon/SPARK-44788-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 python/docs/source/reference/pyspark.sql/functions.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/reference/pyspark.sql/functions.rst 
b/python/docs/source/reference/pyspark.sql/functions.rst
index 6896efd4fb4..6e00c859da4 100644
--- a/python/docs/source/reference/pyspark.sql/functions.rst
+++ b/python/docs/source/reference/pyspark.sql/functions.rst
@@ -264,6 +264,8 @@ Collection Functions
 str_to_map
 to_csv
 try_element_at
+from_xml
+schema_of_xml
 
 
 Partition Transformation Functions


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45193][PS][CONNECT][TESTS] Refactor `test_mode` to be compatible with Spark Connect

2023-09-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5db824aa6f9 [SPARK-45193][PS][CONNECT][TESTS] Refactor `test_mode` to 
be compatible with Spark Connect
5db824aa6f9 is described below

commit 5db824aa6f9203b24668c4e0135fab50d20831da
Author: Ruifeng Zheng 
AuthorDate: Mon Sep 18 17:00:35 2023 +0800

[SPARK-45193][PS][CONNECT][TESTS] Refactor `test_mode` to be compatible 
with Spark Connect

### What changes were proposed in this pull request?
Refactor `test_mode` to be compatible with Spark Connect

### Why are the changes needed?
for test parity

### Does this PR introduce _any_ user-facing change?
No, test-only

### How was this patch tested?
CI, and manually check:
```
In [5]: import pandas as pd

In [6]: def func(iterator):
   ...: for pdf in iterator:
   ...: if len(pdf) > 0:
   ...: if pdf["partition"][0] == 3:
   ...: yield pd.DataFrame({"num" : ["0", "1", "2", 
"3", "4"] })
   ...: else:
   ...: yield pd.DataFrame({"num" : ["3", "3", "3", 
"3", "4"]} )
   ...:

In [7]: df = spark.range(0, 4, 1, 
4).select(sf.spark_partition_id().alias("partition"))

In [8]: df.mapInPandas(func, "num string").withColumn("p", 
sf.spark_partition_id()).show()

/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py:239:
 FutureWarning: is_categorical_dtype is deprecated and will be removed in a 
future version. Use isinstance(dtype, CategoricalDtype) instead

/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py:239:
 FutureWarning: is_categorical_dtype is deprecated and will be removed in a 
future version. Use isinstance(dtype, CategoricalDtype) instead

/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py:239:
 FutureWarning: is_categorical_dtype is deprecated and will be removed in a 
future version. Use isinstance(dtype, CategoricalDtype) instead

/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py:239:
 FutureWarning: is_categorical_dtype is deprecated and will be removed in a 
future version. Use isinstance(dtype, CategoricalDtype) instead
+---+---+
|num|  p|
+---+---+
|  3|  0|
|  3|  0|
|  3|  0|
|  3|  0|
|  4|  0|
|  3|  1|
|  3|  1|
|  3|  1|
|  3|  1|
|  4|  1|
|  3|  2|
|  3|  2|
|  3|  2|
|  3|  2|
|  4|  2|
|  0|  3|
|  1|  3|
|  2|  3|
|  3|  3|
|  4|  3|
+---+---+

```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42970 from zhengruifeng/py_test_mode.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 .../pandas/tests/computation/test_compute.py   | 44 +-
 .../connect/computation/test_parity_compute.py | 30 +--
 2 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/python/pyspark/pandas/tests/computation/test_compute.py 
b/python/pyspark/pandas/tests/computation/test_compute.py
index 9a29cb236a8..dc145601fca 100644
--- a/python/pyspark/pandas/tests/computation/test_compute.py
+++ b/python/pyspark/pandas/tests/computation/test_compute.py
@@ -15,11 +15,11 @@
 # limitations under the License.
 #
 import unittest
-from distutils.version import LooseVersion
 
 import numpy as np
 import pandas as pd
 
+from pyspark.sql import functions as sf
 from pyspark import pandas as ps
 from pyspark.testing.pandasutils import ComparisonTestBase
 from pyspark.testing.sqlutils import SQLTestUtils
@@ -101,16 +101,40 @@ class FrameComputeMixin:
 with self.assertRaises(ValueError):
 psdf.mode(axis=2)
 
-def f(index, iterator):
-return ["3", "3", "3", "3", "4"] if index == 3 else ["0", "1", 
"2", "3", "4"]
+def func(iterator):
+for pdf in iterator:
+if len(pdf) > 0:
+if pdf["partition"][0] == 3:
+yield pd.DataFrame(
+{
+"num": [
+"3",
+"3",
+"3",
+"3",
+"4",
+]
+}
+)
+else:
+yield pd.DataFrame(
+{
+"num": [
+"0",
+ 

[spark] branch master updated: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89041a4a8c7 [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and 
schema_of_xml to pyspark, spark connect and sql function
89041a4a8c7 is described below

commit 89041a4a8c7b7787fa10f090d4324f20447c4dd3
Author: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
AuthorDate: Mon Sep 18 16:35:22 2023 +0900

[SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to 
pyspark, spark connect and sql function

### What changes were proposed in this pull request?
Add from_xml and schema_of_xml to pyspark, spark connect and sql function

### Why are the changes needed?
from_xml parses XML data nested in a `Column` into a struct. schema_of_xml 
infers schema from XML data in a `Column`.
This PR adds these two functions to pyspark, spark connect and SQL function 
registry.
It is one of the series of PR to add native support for [XML File 
Format](https://issues.apache.org/jira/browse/SPARK-44265) in spark.

### Does this PR introduce _any_ user-facing change?
Yes, it adds from_xml and schema_of_xml to pyspark, spark connect and sql 
function

### How was this patch tested?
- Added new unit tests
- Github Action

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42938 from sandip-db/from_xml-master.

Authored-by: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/functions.scala |  51 +++-
 .../org/apache/spark/sql/FunctionTestSuite.scala   |  10 +
 python/pyspark/errors/error_classes.py |   5 +
 python/pyspark/sql/connect/functions.py|  46 +++
 python/pyspark/sql/functions.py| 129 +
 .../sql/tests/connect/test_connect_function.py | 106 +++
 python/pyspark/sql/tests/test_functions.py |  28 +-
 .../sql/catalyst/analysis/FunctionRegistry.scala   |   6 +-
 .../sql/catalyst/expressions/xmlExpressions.scala  |  26 +-
 .../scala/org/apache/spark/sql/functions.scala |  60 +++-
 .../sql-functions/sql-expression-schema.md |   4 +-
 .../analyzer-results/xml-functions.sql.out | 271 +
 .../resources/sql-tests/inputs/xml-functions.sql   |  50 
 .../sql-tests/results/xml-functions.sql.out| 319 +
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |   5 +-
 .../sql/execution/datasources/xml/XmlSuite.scala   |   2 +-
 16 files changed, 1093 insertions(+), 25 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 83f0ee64501..b94a33007b1 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -26,6 +26,7 @@ import org.apache.spark.sql.api.java._
 import 
org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.PrimitiveLongEncoder
 import org.apache.spark.sql.connect.common.LiteralValueProtoConverter._
 import org.apache.spark.sql.connect.common.UdfUtils
+import org.apache.spark.sql.errors.DataTypeErrors
 import org.apache.spark.sql.expressions.{ScalarUserDefinedFunction, 
UserDefinedFunction}
 import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.types.DataType.parseTypeWithFallback
@@ -7311,15 +7312,59 @@ object functions {
*   
"https://spark.apache.org/docs/latest/sql-data-sources-xml.html#data-source-option;>
 Data
*   Source Option in the version you use.
* @group collection_funcs
+   * @since 4.0.0
+   */
+  // scalastyle:on line.size.limit
+  def from_xml(e: Column, schema: StructType, options: java.util.Map[String, 
String]): Column =
+from_xml(e, lit(schema.json), options.asScala.toIterator)
+
+  // scalastyle:off line.size.limit
+
+  /**
+   * (Java-specific) Parses a column containing a XML string into a 
`StructType` with the
+   * specified schema. Returns `null`, in the case of an unparseable string.
*
+   * @param e
+   *   a string column containing XML data.
+   * @param schema
+   *   the schema as a DDL-formatted string.
+   * @param options
+   *   options to control how the XML is parsed. accepts the same options and 
the xml data source.
+   *   See https://spark.apache.org/docs/latest/sql-data-sources-xml.html#data-source-option;>
 Data
+   *   Source Option in the version you use.
+   * @group collection_funcs
* @since 4.0.0
*/
   // scalastyle:on line.size.limit
-  def from_xml(e: Column, schema: StructType, options: Map[String, String]): 
Column =
-from_xml(e, 

[spark] branch master updated: [SPARK-44823][PYTHON] Update black to 23.9.1 and fix erroneous check

2023-09-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5299e5423f4 [SPARK-44823][PYTHON] Update black to 23.9.1 and fix 
erroneous check
5299e5423f4 is described below

commit 5299e5423f439ce702d14666c79f9d42eb446943
Author: panbingkun 
AuthorDate: Mon Sep 18 16:34:34 2023 +0900

[SPARK-44823][PYTHON] Update black to 23.9.1 and fix erroneous check

### What changes were proposed in this pull request?
The pr aims to update black from 22.6.0 to 23.9.1 and fix erroneous check.

### Why are the changes needed?
The date of `22.6.0` release is `Jun 28, 2022`, it has been over a year 
since now. (https://pypi.org/project/black/23.7.0/#history)
Make PySpark's code style more in line with the latest version of Black's 
requirements.

Release notes:
- 23.9.1: https://github.com/psf/black/blob/main/CHANGES.md#2391
   https://github.com/apache/spark/assets/15246973/2d7235b8-c846-45a0-8e03-40625b1a1f71;>
- 23.9.0: https://github.com/psf/black/blob/main/CHANGES.md#2390
   https://github.com/apache/spark/assets/15246973/225e6e56-79e1-4d36-a47f-26f7bfd4de3e;>
- 23.7.0: https://github.com/psf/black/blob/main/CHANGES.md#2370
   https://github.com/apache/spark/assets/15246973/ec42aab0-1abe-43cf-af4e-7338a4f698e7;>
- 23.3.0: https://github.com/psf/black/blob/main/CHANGES.md#2330
- 23.1.0: https://github.com/psf/black/blob/main/CHANGES.md#2310
   https://github.com/apache/spark/assets/15246973/493bec0e-e7be-4c31-8e01-81b9e729099b;>
- 22.12.0: https://github.com/psf/black/blob/main/CHANGES.md#22120
- 22.10.0: https://github.com/psf/black/blob/main/CHANGES.md#22100
- 22.8.0: https://github.com/psf/black/blob/main/CHANGES.md#2280
   https://github.com/apache/spark/assets/15246973/f643c228-b4ed-4d5f-8668-0fcfb3ca6b65;>
   https://github.com/apache/spark/assets/15246973/5c7dae77-ab6e-4e1c-a438-55ca0ce99f60;>
   https://github.com/apache/spark/assets/15246973/4de07cda-1777-4186-953d-4657bb1e0ee3;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42507 from panbingkun/SPARK-44823.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml   |   2 +-
 dev/pyproject.toml |   2 +-
 dev/reformat-python|   2 +-
 dev/requirements.txt   |   2 +-
 dev/run-tests.py   |   3 +-
 python/pyspark/conf.py |   4 +-
 python/pyspark/context.py  |   2 +-
 python/pyspark/instrumentation_utils.py|   3 -
 python/pyspark/join.py |   8 +-
 python/pyspark/ml/connect/evaluation.py|   1 -
 .../pyspark/ml/deepspeed/deepspeed_distributor.py  |   1 -
 python/pyspark/ml/linalg/__init__.py   |   2 -
 python/pyspark/ml/param/__init__.py|   1 -
 python/pyspark/ml/tests/test_algorithms.py |   7 --
 python/pyspark/ml/tests/test_dl_util.py|   1 -
 python/pyspark/ml/tests/test_feature.py|   1 -
 python/pyspark/ml/tests/test_linalg.py |   2 -
 python/pyspark/ml/torch/distributor.py |   1 -
 python/pyspark/ml/tuning.py|   2 +-
 python/pyspark/ml/util.py  |   1 -
 python/pyspark/ml/wrapper.py   |   1 -
 python/pyspark/mllib/linalg/__init__.py|   2 -
 python/pyspark/mllib/tests/test_feature.py |   1 -
 python/pyspark/mllib/tests/test_linalg.py  |   2 -
 python/pyspark/pandas/frame.py |   7 +-
 python/pyspark/pandas/groupby.py   |   1 -
 python/pyspark/pandas/indexing.py  |   1 -
 python/pyspark/pandas/numpy_compat.py  |   1 -
 python/pyspark/pandas/plot/matplotlib.py   |   1 -
 python/pyspark/pandas/spark/functions.py   |   4 -
 python/pyspark/pandas/sql_processor.py |   2 +-
 python/pyspark/pandas/strings.py   |   1 +
 .../tests/connect/groupby/test_parity_aggregate.py |   1 -
 .../tests/connect/groupby/test_parity_describe.py  |   1 -
 .../tests/connect/groupby/test_parity_head_tail.py |   1 -
 .../tests/connect/groupby/test_parity_index.py |   1 -
 .../tests/connect/groupby/test_parity_stat.py  |   1 -
 .../tests/connect/series/test_parity_all_any.py|   1 -
 .../tests/connect/series/test_parity_as_type.py|   1 -
 .../tests/connect/series/test_parity_conversion.py |   1 -
 .../tests/connect/series/test_parity_sort.py   |   1 -
 .../pandas/tests/data_type_ops/test_string_ops.py  |   1 -
 

[GitHub] [spark-website] zhengruifeng commented on pull request #477: [SPARK-45195] Update example with docker official image

2023-09-18 Thread via GitHub


zhengruifeng commented on PR #477:
URL: https://github.com/apache/spark-website/pull/477#issuecomment-1722809681

   also cc @gatorsmile @HyukjinKwon @allisonwang-db @Yikun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zhengruifeng opened a new pull request, #477: Update Example with docker official image

2023-09-18 Thread via GitHub


zhengruifeng opened a new pull request, #477:
URL: https://github.com/apache/spark-website/pull/477

   1, add `docker` commands for PySpark and SparkR;
   2, switch to docker official image for SQL, Scala and Java;
   
   
   refer to https://hub.docker.com/_/spark
   
   also manually checked all the commands, e,g,:
   ```
   ruifeng.zheng@x:~$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
   Python 3.8.10 (default, May 26 2023, 14:05:08)
   [GCC 9.4.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   23/09/18 06:02:30 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
 /_/
   
   Using Python version 3.8.10 (default, May 26 2023 14:05:08)
   Spark context Web UI available at http://4861f70118ab:4040
   Spark context available as 'sc' (master = local[*], app id = 
local-1695016951087).
   SparkSession available as 'spark'.
   >>> spark.range(0, 10).show()
   +---+
   | id|
   +---+
   |  0|
   |  1|
   |  2|
   |  3|
   |  4|
   |  5|
   |  6|
   |  7|
   |  8|
   |  9|
   +---+
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org