date:20181017

[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-10-17 Thread vinodkc

Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/22171


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22755: [SPARK-25755][SQL][Test] Supplementation of non-CodeGen ...

2018-10-17 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/22755
  
cc @gatorsmile, @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22309
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22309
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97511/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #97511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97511/testReport)**
 for PR 22309 at commit 
[`c14b5f9`](https://github.com/apache/spark/commit/c14b5f9016ec287aee5dc06beef7377600f1d711).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97512/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22263
  
**[Test build #97512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97512/testReport)**
 for PR 22263 at commit 
[`89502dc`](https://github.com/apache/spark/commit/89502dc912156f1b6f762a6989216a82d329d4b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21990


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r226166020
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self):
 spark.stop()
 
 
+class SparkSessionTests2(unittest.TestCase):
+
+def test_active_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+try:
+activeSession = SparkSession.getActiveSession()
+df = activeSession.createDataFrame([(1, 'Alice')], ['age', 
'name'])
+self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
+finally:
+spark.stop()
+
+def test_get_active_session_when_no_active_session(self):
+active = SparkSession.getActiveSession()
+self.assertEqual(active, None)
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+active = SparkSession.getActiveSession()
+self.assertEqual(active, spark)
+spark.stop()
+active = SparkSession.getActiveSession()
+self.assertEqual(active, None)
+
+def test_SparkSession(self):
+spark = SparkSession.builder \
+.master("local") \
+.config("some-config", "v2") \
+.getOrCreate()
+try:
+self.assertEqual(spark.conf.get("some-config"), "v2")
+self.assertEqual(spark.sparkContext._conf.get("some-config"), 
"v2")
+self.assertEqual(spark.version, spark.sparkContext.version)
+spark.sql("CREATE DATABASE test_db")
+spark.catalog.setCurrentDatabase("test_db")
+self.assertEqual(spark.catalog.currentDatabase(), "test_db")
+spark.sql("CREATE TABLE table1 (name STRING, age INT) USING 
parquet")
+self.assertEqual(spark.table("table1").columns, ['name', 
'age'])
+self.assertEqual(spark.range(3).count(), 3)
+finally:
+spark.stop()
+
+def test_global_default_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+try:
+self.assertEqual(SparkSession.builder.getOrCreate(), spark)
+finally:
+spark.stop()
+
+def test_default_and_active_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+activeSession = spark._jvm.SparkSession.getActiveSession()
+defaultSession = spark._jvm.SparkSession.getDefaultSession()
+try:
+self.assertEqual(activeSession, defaultSession)
+finally:
+spark.stop()
+
+def test_config_option_propagated_to_existing_SparkSession(self):
+session1 = SparkSession.builder \
+.master("local") \
+.config("spark-config1", "a") \
+.getOrCreate()
+self.assertEqual(session1.conf.get("spark-config1"), "a")
+session2 = SparkSession.builder \
+.config("spark-config1", "b") \
+.getOrCreate()
+try:
+self.assertEqual(session1, session2)
+self.assertEqual(session1.conf.get("spark-config1"), "b")
+finally:
+session1.stop()
+
+def test_new_session(self):
+session = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+newSession = session.newSession()
+try:
+self.assertNotEqual(session, newSession)
+finally:
+session.stop()
+newSession.stop()
+
+def test_create_new_session_if_old_session_stopped(self):
+session = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+session.stop()
+newSession = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+try:
+self.assertNotEqual(session, newSession)
+finally:
+newSession.stop()
+
+def test_active_session_with_None_and_not_None_context(self):
+from pyspark.context import SparkContext
+from pyspark.conf import SparkConf
+sc = SparkContext._active_spark_context
+self.assertEqual(sc, None)
+activeSession = SparkSession.getActiveSession()
+self.assertEqual(activeSession, None)
+sparkConf = SparkConf()
+sc = SparkContext.getOrCreate(sparkConf)
+activeSession =

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r226166057
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3863,6 +3863,145 @@ def test_jvm_default_session_already_set(self):
 spark.stop()
 
 
+class SparkSessionTests2(unittest.TestCase):
+
+def test_active_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+try:
+activeSession = SparkSession.getActiveSession()
+df = activeSession.createDataFrame([(1, 'Alice')], ['age', 
'name'])
+self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
+finally:
+spark.stop()
+
+def test_get_active_session_when_no_active_session(self):
+active = SparkSession.getActiveSession()
+self.assertEqual(active, None)
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+active = SparkSession.getActiveSession()
+self.assertEqual(active, spark)
+spark.stop()
+active = SparkSession.getActiveSession()
+self.assertEqual(active, None)
+
+def test_SparkSession(self):
+spark = SparkSession.builder \
+.master("local") \
+.config("some-config", "v2") \
+.getOrCreate()
+try:
+self.assertEqual(spark.conf.get("some-config"), "v2")
+self.assertEqual(spark.sparkContext._conf.get("some-config"), 
"v2")
+self.assertEqual(spark.version, spark.sparkContext.version)
+spark.sql("CREATE DATABASE test_db")
+spark.catalog.setCurrentDatabase("test_db")
+self.assertEqual(spark.catalog.currentDatabase(), "test_db")
+spark.sql("CREATE TABLE table1 (name STRING, age INT) USING 
parquet")
+self.assertEqual(spark.table("table1").columns, ['name', 
'age'])
+self.assertEqual(spark.range(3).count(), 3)
+finally:
+spark.stop()
+
+def test_global_default_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+try:
+self.assertEqual(SparkSession.builder.getOrCreate(), spark)
+finally:
+spark.stop()
+
+def test_default_and_active_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+activeSession = spark._jvm.SparkSession.getActiveSession()
+defaultSession = spark._jvm.SparkSession.getDefaultSession()
+try:
+self.assertEqual(activeSession, defaultSession)
+finally:
+spark.stop()
+
+def test_config_option_propagated_to_existing_SparkSession(self):
--- End diff --

Let's just above `SparkSession` -> `spark_session`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r226165866
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2713,6 +2713,25 @@ def from_csv(col, schema, options={}):
 return Column(jc)
 
 
+@since(3.0)
+def _getActiveSession():
--- End diff --

eh.. why is it in functions.py?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21990
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21990
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22675
  
Looks cool otherwise!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226165068
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

Shall we consistently make some codes such as `StructType` as codes like `` 
`StructType` ``


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164813
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
--- End diff --

Should it be `Datasource` or `Data sources`? I am saying this because there 
looks a mismatch with the menu above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164867
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
--- End diff --

really personal preference tho .. `like` -> `such as`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164476
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

`.` -> `,`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164264
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+The schema of the `image` column is:
+ - origin: String (represents the file path of the image)
--- End diff --

I would use SQL types consistently, for instance, StringType, IntegerType


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226163959
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

Where's describing each field?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226163638
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
--- End diff --

Shall we add an example for R as well then? It wouldn't be too difficult to 
add the equivalent examples. Also, I don't think we will add the equivalent 
examples in different languages at different pages.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97514/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22675
  
**[Test build #97514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97514/testReport)**
 for PR 22675 at commit 
[`ba0d888`](https://github.com/apache/spark/commit/ba0d888fc55de976308af29faff94a24ad1ef782).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22677: [SPARK-25683][Core] Updated the log for the firstTime ev...

2018-10-17 Thread shivusondur

Github user shivusondur commented on the issue:

https://github.com/apache/spark/pull/22677
  
cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4073/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226161636
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22675
  
**[Test build #97514 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97514/testReport)**
 for PR 22675 at commit 
[`ba0d888`](https://github.com/apache/spark/commit/ba0d888fc55de976308af29faff94a24ad1ef782).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226161623
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226161557
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
--- End diff --

This looks like SQL features and fit all datasources. Put it in spark SQL 
doc will be better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226156109
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -73,19 +73,21 @@ case class UserDefinedFunction protected[sql] (
*/
   @scala.annotation.varargs
   def apply(exprs: Column*): Column = {
-if (inputTypes.isDefined && nullableTypes.isDefined) {
-  require(inputTypes.get.length == nullableTypes.get.length)
+val numOfArgs = ScalaReflection.getParameterCount(f)
--- End diff --

Actually I think the "assert" should better be in `ScalaUDF`, but then it 
would be slightly different from the current behavior whatsoever. I'm keeping 
other things as they are.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226156536
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext 
{
   checkAnswer(df, Seq(Row("12"), Row("24"), Row("3null"), Row(null)))
 }
   }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf()") {
+val udf1 = udf({(x: Long, y: Any) => x * 2 + (if (y == null) 1 else 
0)})
+val df = spark.range(0, 3).toDF("a")
+  .withColumn("b", udf1($"a", lit(null)))
+  .withColumn("c", udf1(lit(null), $"a"))
+
+checkAnswer(
+  df,
+  Seq(
+Row(0, 1, null),
+Row(1, 3, null),
+Row(2, 5, null)))
+  }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf.register") {
+withTable("t") {
--- End diff --

ah we shuold do `(null, new Integer(1), "x"), ("N", null, "y"), ("N", new 
Integer(3), null)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226156400
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1978,6 +1978,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
   - Since Spark 2.4, The LOAD DATA command supports wildcard `?` and `*`, 
which match any one character, and zero or more characters, respectively. 
Example: `LOAD DATA INPATH '/tmp/folder*/'` or `LOAD DATA INPATH 
'/tmp/part-?'`. Special Characters like `space` also now work in paths. 
Example: `LOAD DATA INPATH '/tmp/folder name/'`.
   - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated 
as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as 
`SELECT 1 FROM range(10) WHERE true`  and returns 10 rows. This violates SQL 
standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without 
GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) 
HAVING true` will return only one row. To restore the previous behavior, set 
`spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`.
+  - Since Spark 2.4, use of the method `def udf(f: AnyRef, dataType: 
DataType): UserDefinedFunction` should not expect any automatic null handling 
of the input parameters, thus a null input of a Scala primitive type will be 
converted to the type's corresponding default value in the UDF. All other UDF 
declaration and registration methods remain the same behavior as before.
--- End diff --

yes, this fixes a bug introduced by a commit in 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226155205
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext 
{
   checkAnswer(df, Seq(Row("12"), Row("24"), Row("3null"), Row(null)))
 }
   }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf()") {
+val udf1 = udf({(x: Long, y: Any) => x * 2 + (if (y == null) 1 else 
0)})
+val df = spark.range(0, 3).toDF("a")
+  .withColumn("b", udf1($"a", lit(null)))
+  .withColumn("c", udf1(lit(null), $"a"))
+
+checkAnswer(
+  df,
+  Seq(
+Row(0, 1, null),
+Row(1, 3, null),
+Row(2, 5, null)))
+  }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf.register") {
+withTable("t") {
--- End diff --

I tried this first, but seems `toDF()` couldn't deduce the right type when 
nulls are present.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226155153
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext 
{
   checkAnswer(df, Seq(Row("12"), Row("24"), Row("3null"), Row(null)))
 }
   }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf()") {
+val udf1 = udf({(x: Long, y: Any) => x * 2 + (if (y == null) 1 else 
0)})
+val df = spark.range(0, 3).toDF("a")
+  .withColumn("b", udf1($"a", lit(null)))
+  .withColumn("c", udf1(lit(null), $"a"))
+
+checkAnswer(
+  df,
+  Seq(
+Row(0, 1, null),
+Row(1, 3, null),
+Row(2, 5, null)))
+  }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf.register") {
+withTable("t") {
--- End diff --

I tried this first, but seems `toDF()` couldn't deduce the right type when 
nulls are present.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21990
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21990
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97510/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21990
  
**[Test build #97510 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97510/testReport)**
 for PR 21990 at commit 
[`3629c78`](https://github.com/apache/spark/commit/3629c78118eb77589ae1126e877f0fd2b57441ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-17 Thread rezasafi

Github user rezasafi commented on the issue:

https://github.com/apache/spark/pull/22612
  
I would appreciate if there is any other concerns with this patch @squito 
@edwinalu @mccheah @dhruve 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22758#discussion_r226151421
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -193,6 +193,16 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   None)
 val logicalRelation = cached.getOrElse {
   val updatedTable = inferIfNeeded(relation, options, fileFormat)
+  // Intialize the catalogTable stats if its not defined.An intial 
value has to be defined
--- End diff --

I don't quite understand why table must have stats. For both file sources 
and hive tables, we will estimate the data size with files, if the table 
doesn't have stats.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22758#discussion_r226150341
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -1051,7 +1051,8 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
 
   test("test statistics of LogicalRelation converted from Hive serde 
tables") {
 Seq("orc", "parquet").foreach { format =>
-  Seq(true, false).foreach { isConverted =>
+  // Botth parquet and orc will have Hivestatistics, both are 
convertable to Logical Relation.
+  Seq(true, true).foreach { isConverted =>
--- End diff --

This is to test when the conversion is on and off. We shouldn't change it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22761
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22761
  
**[Test build #97513 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97513/testReport)**
 for PR 22761 at commit 
[`e1d6a0a`](https://github.com/apache/spark/commit/e1d6a0a9985d0351e4dcc2f868141a079694432d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97513/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of str...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22745#discussion_r226149644
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -509,3 +509,24 @@ case class UnresolvedOrdinal(ordinal: Int)
   override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
   override lazy val resolved = false
 }
+
+/**
+ * When constructing `Invoke`, the data type must be given, which may be 
not possible to define
+ * before analysis. This class acts like a placeholder for `Invoke`, and 
will be replaced by
+ * `Invoke` during analysis after the input data is resolved. Data type 
passed to `Invoke``
+ * will be defined by applying `dataTypeFunction` to the data type of the 
input data.
+ */
+case class UnresolvedInvoke(
--- End diff --

I feel this is too general. Maybe we should just create a new expression 
`GetArrayFromMap` and resolve it to `Invoke` later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r226149299
  
--- Diff: sql/core/src/test/resources/test-data/cars-crlf.csv ---
@@ -0,0 +1,7 @@
+
+year,make,model,comment,blank
+"2012","Tesla","S","No comment",
+
+1997,Ford,E350,"Go get one now they are going fast",
+2015,Chevy,Volt
+
--- End diff --

just for clarification, this file has newline variant combinations, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22309
  
what's still missing to support top level value class?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226149168
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -632,13 +667,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
--- End diff --

shall we update `dataTypeFor` to handle value class, instead of special 
handling it here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226148985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
--- End diff --

do you have an example of this message?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226148892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +386,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
--- End diff --

This is not true, top level value class can be a single primitive type 
column


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22761
  
**[Test build #97513 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97513/testReport)**
 for PR 22761 at commit 
[`e1d6a0a`](https://github.com/apache/spark/commit/e1d6a0a9985d0351e4dcc2f868141a079694432d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22761
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4072/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22761: [MINOR][DOC] Spacing items in migration guide for...

2018-10-17 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22761

[MINOR][DOC] Spacing items in migration guide for readability and 
consistency

## What changes were proposed in this pull request?

Currently, migration guide has no space between each item which looks too 
compact and hard to read. Some of items already had some spaces between them in 
the migration guide. This PR suggest to format them consistently for 
readability.

Before:

![screen shot 2018-10-18 at 9 49 12 
am](https://user-images.githubusercontent.com/6477701/47126706-4c43da80-d2bc-11e8-966f-ad804ab23174.png)

After:

![screen shot 2018-10-18 at 9 53 55 
am](https://user-images.githubusercontent.com/6477701/47126708-4fd76180-d2bc-11e8-9aa5-546f0622ca20.png)

## How was this patch tested?

Manually tested:



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-migration-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22761


commit e1d6a0a9985d0351e4dcc2f868141a079694432d
Author: hyukjinkwon 
Date:   2018-10-18T01:56:16Z

Format migration guide for readability




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21990
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22263
  
**[Test build #97512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97512/testReport)**
 for PR 22263 at commit 
[`89502dc`](https://github.com/apache/spark/commit/89502dc912156f1b6f762a6989216a82d329d4b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4071/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226145444
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1978,6 +1978,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
   - Since Spark 2.4, The LOAD DATA command supports wildcard `?` and `*`, 
which match any one character, and zero or more characters, respectively. 
Example: `LOAD DATA INPATH '/tmp/folder*/'` or `LOAD DATA INPATH 
'/tmp/part-?'`. Special Characters like `space` also now work in paths. 
Example: `LOAD DATA INPATH '/tmp/folder name/'`.
   - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated 
as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as 
`SELECT 1 FROM range(10) WHERE true`  and returns 10 rows. This violates SQL 
standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without 
GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) 
HAVING true` will return only one row. To restore the previous behavior, set 
`spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`.
+  - Since Spark 2.4, use of the method `def udf(f: AnyRef, dataType: 
DataType): UserDefinedFunction` should not expect any automatic null handling 
of the input parameters, thus a null input of a Scala primitive type will be 
converted to the type's corresponding default value in the UDF. All other UDF 
declaration and registration methods remain the same behavior as before.
--- End diff --

Does this target to backport to 2.4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226145051
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext 
{
   checkAnswer(df, Seq(Row("12"), Row("24"), Row("3null"), Row(null)))
 }
   }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf()") {
+val udf1 = udf({(x: Long, y: Any) => x * 2 + (if (y == null) 1 else 
0)})
--- End diff --

nit `udf((x: Long, y: Any) => x * 2 + (if (y == null) 1 else 0))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226145150
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext 
{
   checkAnswer(df, Seq(Row("12"), Row("24"), Row("3null"), Row(null)))
 }
   }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf()") {
+val udf1 = udf({(x: Long, y: Any) => x * 2 + (if (y == null) 1 else 
0)})
+val df = spark.range(0, 3).toDF("a")
+  .withColumn("b", udf1($"a", lit(null)))
+  .withColumn("c", udf1(lit(null), $"a"))
+
+checkAnswer(
+  df,
+  Seq(
+Row(0, 1, null),
+Row(1, 3, null),
+Row(2, 5, null)))
+  }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf.register") {
+withTable("t") {
+  sql("create table t(a varchar(10), b int, c varchar(10)) using 
parquet")
--- End diff --

nit: upper case for keywords


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226144670
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1951,7 +1951,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 AnalysisException is thrown since integer type can not be 
promoted to string type in a loss-less manner.
   
   
-Users can use explict cast
+Users can use explicit cast
--- End diff --

There's the same word mistake right above `explict cast` btw.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread mt40

Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226144119
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -125,6 +125,17 @@ object ScalaReflection extends ScalaReflection {
 case _ => false
   }
 
+  def isValueClass(tpe: `Type`): Boolean = {
+val notNull = !(tpe <:< localTypeOf[Null])
+notNull && definedByConstructorParams(tpe) && tpe <:< 
localTypeOf[AnyVal]
--- End diff --

No, now you ask this, I realize that the `Product` constraint can be 
completely removed.

Also, after scanning through the type api, I found a [built-in 
way](isDerivedValueClass) to check for value class... Don't know why I never 
thought about this ð¤¦ââï¸


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #97511 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97511/testReport)**
 for PR 22309 at commit 
[`c14b5f9`](https://github.com/apache/spark/commit/c14b5f9016ec287aee5dc06beef7377600f1d711).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226143221
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext 
{
   checkAnswer(df, Seq(Row("12"), Row("24"), Row("3null"), Row(null)))
 }
   }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf()") {
+val udf1 = udf({(x: Long, y: Any) => x * 2 + (if (y == null) 1 else 
0)})
+val df = spark.range(0, 3).toDF("a")
+  .withColumn("b", udf1($"a", lit(null)))
+  .withColumn("c", udf1(lit(null), $"a"))
+
+checkAnswer(
+  df,
+  Seq(
+Row(0, 1, null),
+Row(1, 3, null),
+Row(2, 5, null)))
+  }
+
+  test("SPARK-25044 Verify null input handling for primitive types - with 
udf.register") {
+withTable("t") {
--- End diff --

we can use temp view
```
withTempView("v") {
  Seq((null, 1, "x"), ("N", null, "y"), ("N", 3, null)).toDF("a", "b", 
"c").createTempView("v")
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread mt40

Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226142977
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
--- End diff --

I cannot use that because it returns the class name after erasure (e.g. it 
returns `Int` for `IntWrapper`). I'll create a separate method to make this 
clearer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-17 Thread mt40

Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r226142827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

Take class `User` above for example. After compile, field id of type `Id` 
will become `Int` so when constructing `User` we need `id` to be `Int`.

Also why we need `NewInstance` in case `Id` is itself the schema? Because 
`Id` may remain as `Id` if it is treated as another type (following [allocation 
rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details)).
 For example, in method 
[encodeDecodeTest](https://github.com/apache/spark/blob/a40806d2bd84e9a0308165f0d6c97e9cf00aa4a3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala#L373),
 if we pass an instance of `Id` as input, it will not be converted to `Int`. In 
the other case when the required type is explicitly `Id`, then both the input 
and the result returned from deserialization will both become `Int`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97508/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226142651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -73,19 +73,21 @@ case class UserDefinedFunction protected[sql] (
*/
   @scala.annotation.varargs
   def apply(exprs: Column*): Column = {
-if (inputTypes.isDefined && nullableTypes.isDefined) {
-  require(inputTypes.get.length == nullableTypes.get.length)
+val numOfArgs = ScalaReflection.getParameterCount(f)
--- End diff --

since we are here, shall we also check `exprs.length == numOfArgs`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22732
  
**[Test build #97508 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97508/testReport)**
 for PR 22732 at commit 
[`968ed26`](https://github.com/apache/spark/commit/968ed266c7aaf376d428d2ae4edf147036bee3e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r226141983
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -39,29 +42,29 @@ import org.apache.spark.sql.types.DataType
  * @param nullable  True if the UDF can return null value.
  * @param udfDeterministic  True if the UDF is deterministic. 
Deterministic UDF returns same result
  *  each time it is invoked with a particular 
input.
- * @param nullableTypes which of the inputTypes are nullable (i.e. not 
primitive)
  */
 case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
+inputsNullSafe: Seq[Boolean],
 inputTypes: Seq[DataType] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
-udfDeterministic: Boolean = true,
-nullableTypes: Seq[Boolean] = Nil)
+udfDeterministic: Boolean = true)
   extends Expression with ImplicitCastInputTypes with NonSQLExpression 
with UserDefinedExpression {
 
   // The constructor for SPARK 2.1 and 2.2
   def this(
   function: AnyRef,
   dataType: DataType,
   children: Seq[Expression],
+  inputsNullSafe: Seq[Boolean],
--- End diff --

this constructor is here for backward compatibility. If we have to change 
it, I don't think we need to keep it anymore. cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

2018-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22756#discussion_r226141315
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
   /**
* Computes the sum of squared distances between the input points and 
their corresponding cluster
* centers.
+   *
+   * @deprecated This method is deprecated and will be removed in 3.0.0. 
Use ClusteringEvaluator
+   * instead. You can also get the cost on the training 
dataset in the summary.
*/
   @Since("2.0.0")
+  @deprecated("This method is deprecated and will be removed in 3.0.0. Use 
ClusteringEvaluator " +
--- End diff --

It looks reasonable to me to deprecate it in 2.4 so that we can remove it 
in 3.0, if this is the last one. Then we can have a consistent ML API in 3.0 
after removing these deprecated APIs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22612
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22612
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97507/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22612
  
**[Test build #97507 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97507/testReport)**
 for PR 22612 at commit 
[`18ee4ad`](https://github.com/apache/spark/commit/18ee4ad3829c3ee637d45f73c4e182b78c7661fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22624: [SPARK-23781][CORE] Merge token renewer functionality in...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22624
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22624: [SPARK-23781][CORE] Merge token renewer functionality in...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22624
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97506/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22624: [SPARK-23781][CORE] Merge token renewer functionality in...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22624
  
**[Test build #97506 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97506/testReport)**
 for PR 22624 at commit 
[`b58dd8c`](https://github.com/apache/spark/commit/b58dd8c7f16624ec3f47f7c1b149b6644a6512ac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22747: [MINOR][SQL] Set AddJarCommand return empty

2018-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22747
  
BTW, @wangyum . Please create a minor JIRA issue. This is a behavior change 
for users.

cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22756: [SPARK-25758][ML] Deprecate computeCost on Bisect...

2018-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22756#discussion_r226129280
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -125,8 +125,13 @@ class BisectingKMeansModel private[ml] (
   /**
* Computes the sum of squared distances between the input points and 
their corresponding cluster
* centers.
+   *
+   * @deprecated This method is deprecated and will be removed in 3.0.0. 
Use ClusteringEvaluator
+   * instead. You can also get the cost on the training 
dataset in the summary.
*/
   @Since("2.0.0")
+  @deprecated("This method is deprecated and will be removed in 3.0.0. Use 
ClusteringEvaluator " +
--- End diff --

I'm wondering if this PR is a blocker for Spark 2.4. According to JIRA 
desciption (Improvement/Minor), we cannot remove this 3.0.0 because we didn't 
announce deprecation before 3.0.0.

cc @cloud-fan since he is a release manager.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

2018-10-17 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22704#discussion_r226127311
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 ---
@@ -49,8 +49,11 @@ private[deploy] class 
HadoopFSDelegationTokenProvider(fileSystems: Configuration
 val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds)
 
 // Get the token renewal interval if it is not set. It will only be 
called once.
-if (tokenRenewalInterval == null) {
-  tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, 
sparkConf, fsToGetTokens)
+// If running a Kerberos job on Kubernetes, you may specify that you 
wish to not
+// obtain the tokenRenewal interval, as the renewal service may be 
external.
--- End diff --

> And BTW, I know what you mean when you mention an external renewal 
service. But again, that does not exist, and until it does, you should not do 
things that assume its existence.

Right, so in Spark there is no token renewal service ATM, but the existence 
of an external token service that places tokens into Secrets may exist within a 
company organization, no? Whether they leverage one, provided by Spark or not. 
So I personally thought that such a comment is sensible. But I'll remove it 
based on your reasoning. 

> If you're in this code, there are two options:

Ah, very good point,  checking for the presence of a keytab / principal as 
a flag, given that

> It has always been, and always will be, the way to tell Spark that you 
want Spark to renew tokens itself

makes sense

> The current k8s backend is broken in that regard.

The design was specifically for the popular use-case in which for 
cluster-mode we would not send keytabs around and instead read the DT from a 
secret. So true, it does break the traditional design because we are using a 
keytab and not enabling renewal. Contractually with your work, next steps would 
be to parallel the other resource-managers in allowing for the option to use 
the renewal code if the keytab and principal is already in the Driver. Just for 
interest, has there been any cases where the Driver is over-worked in running 
this renewal service and managing the DTs for many executors? In essence, could 
there be any convincing use-case to have the Driver use the keytab for login, 
but not want to do its own renewal as it might be the case that it can't handle 
the load?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21990
  
**[Test build #97510 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97510/testReport)**
 for PR 21990 at commit 
[`3629c78`](https://github.com/apache/spark/commit/3629c78118eb77589ae1126e877f0fd2b57441ce).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21990
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22757: [SPARK-24601][FOLLOWUP] Update Jackson to 2.9.6 in Kines...

2018-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22757
  
Merging this to master as a 'hotfix' for a pretty optional component


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97505/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #97505 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97505/testReport)**
 for PR 21688 at commit 
[`a4661fe`](https://github.com/apache/spark/commit/a4661feb5044f6d89328a4805edf0d0fbf115e89).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4070/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4070/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
**[Test build #97509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97509/testReport)**
 for PR 22760 at commit 
[`47bd907`](https://github.com/apache/spark/commit/47bd9071f88199e4c73b43227626108481d72595).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97509/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4070/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
**[Test build #97509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97509/testReport)**
 for PR 22760 at commit 
[`47bd907`](https://github.com/apache/spark/commit/47bd9071f88199e4c73b43227626108481d72595).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-17 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22760
  
@liyinan926 @mccheah for review on unit testing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerbero...

2018-10-17 Thread ifilonenko

GitHub user ifilonenko opened a pull request:

https://github.com/apache/spark/pull/22760

[SPARK-25751][K8S][TEST] Unit Testing for Kerberos Support

## What changes were proposed in this pull request?

Unit tests for Kerberos support addition

## How was this patch tested?

Unit and Integration tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ifilonenko/spark SPARK-25751

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22760


commit 47bd9071f88199e4c73b43227626108481d72595
Author: Ilan Filonenko 
Date:   2018-10-17T22:12:51Z

unit tests for all features




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 366 matches

Mail list logo