date:20180916

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r217952724
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

Do you mean how we extend this test case? If so, I think it's fine since 
what we need to test within Spark is the specified bloom filter works or not. 
It's rather none of all so one test case should be okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22439
  
**[Test build #96120 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96120/testReport)**
 for PR 22439 at commit 
[`24fbf74`](https://github.com/apache/spark/commit/24fbf742fdd8490f57d29325100036e556847c77).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22439
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreate...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3146/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22439: [SPARK-25444][SQL] Refactor GenArrayData.genCodeT...

2018-09-16 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22439

[SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreateArrayData method

## What changes were proposed in this pull request?

This PR makes `GenArrayData.genCodeToCreateArrayData` method simple by 
using `ArrayData.createArrayData` method.

Before this PR, `genCodeToCreateArrayData` method was complicated
* Generated a temporary Java array to create `ArrayData`
* Had separate code generation path to assign values for `GenericArrayData` 
and `UnsafeArrayData`

After this PR, the method
* Directly generates `GenericArrayData` or `UnsafeArrayData` without a 
temporary array
* Has only code generation path to assign values

## How was this patch tested?

Existing UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-25444

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22439


commit 24fbf742fdd8490f57d29325100036e556847c77
Author: Kazuaki Ishizaki 
Date:   2018-09-17T05:28:00Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22408
  
LGTM. I think the last piece is the migration guide, to explain what 
changed from 2.3 to 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22418#discussion_r217949342
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -50,6 +55,66 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   .createOrReplaceTempView("orc_temp_table")
   }
 
+  protected def testBloomFilterCreation(bloomFilterKind: Kind) {
+val tableName = "bloomFilter"
+
+withTempDir { dir =>
+  withTable(tableName) {
+val sqlStatement = orcImp match {
+  case "native" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |USING ORC
+   |OPTIONS (
+   |  path '${dir.toURI}',
+   |  orc.bloom.filter.columns '*',
+   |  orc.bloom.filter.fpp 0.1
+   |)
+""".stripMargin
+  case "hive" =>
+s"""
+   |CREATE TABLE $tableName (a INT, b STRING)
+   |STORED AS ORC
+   |LOCATION '${dir.toURI}'
+   |TBLPROPERTIES (
+   |  orc.bloom.filter.columns='*',
+   |  orc.bloom.filter.fpp=0.1
+   |)
+""".stripMargin
+  case impl =>
+throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
+}
+
+sql(sqlStatement)
+sql(s"INSERT INTO $tableName VALUES (1, 'str')")
+
+val partFiles = dir.listFiles()
+  .filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
+assert(partFiles.length === 1)
+
+val orcFilePath = new Path(partFiles.head.getAbsolutePath)
+val readerOptions = OrcFile.readerOptions(new Configuration())
+val reader = OrcFile.createReader(orcFilePath, readerOptions)
+var recordReader: RecordReaderImpl = null
+try {
+  recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
+
+  // BloomFilter array is created for all types; `struct`, int 
(`a`), string (`b`)
+  val sargColumns = Array(true, true, true)
+  val orcIndex = recordReader.readRowIndex(0, null, sargColumns)
+
+  // Check the types and counts of bloom filters
+  assert(orcIndex.getBloomFilterKinds.forall(_ === 
bloomFilterKind))
--- End diff --

how can we extend it in the future? How can we change the bloom filter kind 
via the CREATE TABLE statement?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22438: [SPARK-25443][INFRA] fix issues when building docs with ...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22438
  
**[Test build #96119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96119/testReport)**
 for PR 22438 at commit 
[`dbb4fa2`](https://github.com/apache/spark/commit/dbb4fa2469f89c612e7c8ef966d001e828fc8b91).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22438: [SPARK-25443][INFRA] fix issues when building docs with ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3145/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22438: [SPARK-25443][INFRA] fix issues when building docs with ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22438
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22438: [SPARK-25443][INFRA] fix issues when building docs with ...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22438
  
cc @vanzin @felixcheung @srowen @jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22438: [SPARK-25443][INFRA] fix issues when building doc...

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/22438

[SPARK-25443][INFRA] fix issues when building docs with release scripts in 
docker

## What changes were proposed in this pull request?

These 2 changes are required to build the docs for Spark 2.4.0 RC1:
1. install `mkdocs` in the docker image
2. set locale to C.UTF-8. Otherwise jekyll fails to build the doc.

## How was this patch tested?

tested manually when doing the 2.4.0 RC1

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark infra

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22438


commit dbb4fa2469f89c612e7c8ef966d001e828fc8b91
Author: Wenchen Fan 
Date:   2018-09-17T04:48:28Z

fix issues when building docs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22437
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3144/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22437
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22428: [SPARK-25430][SQL] Add map parameter for withColumnRenam...

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22428
  
The performance issue was introduced by repeating query plan analysis, 
which is resolved in the current master if I am not mistaken - if you're in 
doubt, I would suggest to do a quick benchamrk. I think this is something we 
should do it with one liner helper in application side code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96115/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22435
  
**[Test build #96115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96115/testReport)**
 for PR 22435 at commit 
[`da86846`](https://github.com/apache/spark/commit/da868465de9ccdd302699786db30fe4fe90e4cfa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22437
  
**[Test build #96118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96118/testReport)**
 for PR 22437 at commit 
[`1ae77da`](https://github.com/apache/spark/commit/1ae77dad91a26e1390de070ab677b270c6309065).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22428: [SPARK-25430][SQL] Add map parameter for withColumnRenam...

2018-09-16 Thread goungoun

Github user goungoun commented on the issue:

https://github.com/apache/spark/pull/22428

@HyukjinKwon , thanks for your review. Actually, that is the reason that I
open this pull request. I think it is better to giving reusable option to users
than repeating too much of same code in their analysis. In notebook
environment, whenever visualization is required in the middle of the analysis,
I had to convert column names rather than using it as it is so that I can
deliver right messages to the report readers. During the process, I had to
repeat withColumenRenamed too many times.

So, I've researched how the other users are trying to overcome the
limitation. It seems that users tend to use foldleft or for loop with
withColumnRenamed which can cause performance issue creating too many
dataframes inside of Spark engine even without knowing it. The arguments can be
found as follows.

StackOverflow
-
https://stackoverflow.com/questions/38798567/pyspark-rename-more-than-one-column-using-withcolumnrenamed
-
https://stackoverflow.com/questions/35592917/renaming-column-names-of-a-dataframe-in-spark-scala?noredirect=1=1

Spark Issues
[SPARK-12225] Support adding or replacing multiple columns at once in
DataFrame API

[SPARK-21582] DataFrame.withColumnRenamed cause huge performance overhead
If foldleft is used, too many columns can cause performance issue

---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22418
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22418
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96113/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22418
  
**[Test build #96113 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96113/testReport)**
 for PR 22418 at commit 
[`a378adb`](https://github.com/apache/spark/commit/a378adb85ef58a603ca4f9d6a7a527c35e0f2db5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22436
  
Thanks for the fix! I'm not familiar with part though, let's ping @vanzin 
@felixcheung @jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22437
  
**[Test build #96117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96117/testReport)**
 for PR 22437 at commit 
[`865b09b`](https://github.com/apache/spark/commit/865b09bffc964a6c7411b50abe44bb1bab68f649).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22437
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3143/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22437
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22437: [SPARK-25431][SQL][EXAMPLES] Fix function example...

2018-09-16 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/22437

[SPARK-25431][SQL][EXAMPLES] Fix function examples and the example results.

## What changes were proposed in this pull request?

There are some mistakes in examples of newly added functions. Also the 
format of the example results are not unified. We should fix them.

## How was this patch tested?

Manually executed the examples.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-25431/fix_examples_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22437.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22437


commit 865b09bffc964a6c7411b50abe44bb1bab68f649
Author: Takuya UESHIN 
Date:   2018-09-14T09:19:56Z

Fix function examples and the example results.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22437: [SPARK-25431][SQL][EXAMPLES] Fix function examples and t...

2018-09-16 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22437
  
cc @dongjoon-hyun @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22436
  
**[Test build #96116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96116/testReport)**
 for PR 22436 at commit 
[`e4cad8c`](https://github.com/apache/spark/commit/e4cad8c60f3c959af63a900232f14c378cef7928).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22395
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22395
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96114/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22436
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22395
  
**[Test build #96114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96114/testReport)**
 for PR 22395 at commit 
[`71255a1`](https://github.com/apache/spark/commit/71255a1787012baf2d5188991421e8197ec44733).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3142/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22436
  
CC @cloud-fan  This one doesn't block 2.4.0 but would be nice to have. 
Certainly if there's a second RC.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENS...

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/22436

[SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and 
specialize for source vs binary

## What changes were proposed in this pull request?

Fix location of licenses-binary in binary release, and remove binary items 
from source release

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-24654.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22436.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22436


commit e4cad8c60f3c959af63a900232f14c378cef7928
Author: Sean Owen 
Date:   2018-09-17T03:42:10Z

Fix location of licenses-binary in binary release, and remove binary items 
from source releas




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96112/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #96112 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96112/testReport)**
 for PR 7 at commit 
[`5c8f487`](https://github.com/apache/spark/commit/5c8f48715748bdeda703761fba6a4d1828a19985).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...

2018-09-16 Thread suryag10

Github user suryag10 commented on the issue:

https://github.com/apache/spark/pull/22433
  
> I'm wondering, is there some reason this isn't supported in cluster mode 
for yarn & mesos? Or put another way, what is the rationale for k8s being added 
as an exception to this rule?

I donno the specific reason why this was not supported in yarn and mesos. 
The initial contributions to the spark on K8S started with cluster mode(with 
restriction for client mode). So this PR enhances such that STS can run in k8s 
deployments with spark cluster mode(In the latest spark code i had observed 
that the client mode also works(need to cross verify this once)).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22395#discussion_r217942351
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
 ---
@@ -143,16 +143,14 @@ class ArithmeticExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 }
   }
 
-  // By fixing SPARK-15776, Divide's inputType is required to be 
DoubleType of DecimalType.
-  // TODO: in future release, we should add a IntegerDivide to support 
integral types.
-  ignore("/ (Divide) for integral type") {
-checkEvaluation(Divide(Literal(1.toByte), Literal(2.toByte)), 0.toByte)
-checkEvaluation(Divide(Literal(1.toShort), Literal(2.toShort)), 
0.toShort)
-checkEvaluation(Divide(Literal(1), Literal(2)), 0)
-checkEvaluation(Divide(Literal(1.toLong), Literal(2.toLong)), 0.toLong)
-checkEvaluation(Divide(positiveShortLit, negativeShortLit), 0.toShort)
-checkEvaluation(Divide(positiveIntLit, negativeIntLit), 0)
-checkEvaluation(Divide(positiveLongLit, negativeLongLit), 0L)
+  test("/ (Divide) for integral type") {
+checkEvaluation(IntegralDivide(Literal(1.toByte), Literal(2.toByte)), 
0L)
+checkEvaluation(IntegralDivide(Literal(1.toShort), 
Literal(2.toShort)), 0L)
+checkEvaluation(IntegralDivide(Literal(1), Literal(2)), 0L)
+checkEvaluation(IntegralDivide(Literal(1.toLong), Literal(2.toLong)), 
0L)
+checkEvaluation(IntegralDivide(positiveShortLit, negativeShortLit), 0L)
+checkEvaluation(IntegralDivide(positiveIntLit, negativeIntLit), 0L)
+checkEvaluation(IntegralDivide(positiveLongLit, negativeLongLit), 0L)
--- End diff --

good catch! We should clearly define the behavior in the doc string too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky Ext...

2018-09-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22432


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAp...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22432
  
thanks, merging to master/2.4!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22231
  
Yeah I noticed that. I think we should leave it, and, if somehow RC1 
passes, we'll mark this as fixed for a later release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21677
  
> So, are you heading main-method style with separate BM output files?

Yes. So it's not reverting this PR, since writing BM result to a file is 
good. But we should update these BMs to use main-method style.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22231
  
Note that, RC1 was cut before merging this PR, which means, this patch is 
not available in 2.4.0. I hit some problems running the release scripts and 
spent quite a lot of time to fix them, so the final vote is several days behind 
the RC1 tag creation.

@srowen please advice if we should
1. fail the RC1 to include this patch
2. do nothing and release it with 2.4.1
3. revert it from 2.4 since it's an upgrade.

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-09-16 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21860
  
cc @maropu @kiszk @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22428: [SPARK-25430][SQL] Add map parameter for withColumnRenam...

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22428
  
Can we simply call the API multiple times? I think we haven't usually added 
such aliases for an API unless there's strong argument for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22428: [SPARK-25430][SQL] Add map parameter for withColu...

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22428#discussion_r217937566
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2300,6 +2300,37 @@ class Dataset[T] private[sql](
 }
   }
 
+  /**
+   * Returns a new Dataset with columns renamed.
+   * This is a no-op if schema doesn't contain existingNames in columnMap.
+   * {{{
+   *   df.withColumnRenamed(Map(
+   * "c1" -> "first_column",
+   * "c2" -> "second_column"
+   *   ))
+   * }}}
+   *
+   * @group untypedrel
+   * @since 2.4.0
--- End diff --

branch-2.4 is cut out. We will probably target 3.0.0 if we happen to add 
new APIs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18304: [SPARK-21098] Set lineseparator csv multiline and csv wr...

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18304
  
CSV's `lineSep` is not added yet. The problem here is specific to CSV - we 
are os-dependent on the newline separator by Univocity's which is not the case 
in Jackson and which can be worked around when CSV's newline option is added. I 
was working on this feature but faced some problems with handling `multiLine` 
in CSV. Will make a PR when I'm available.

@danielvdende, let's leave this closed for now. Will ping you in the PR I 
will open later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22395#discussion_r217935398
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
 ---
@@ -143,16 +143,14 @@ class ArithmeticExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 }
   }
 
-  // By fixing SPARK-15776, Divide's inputType is required to be 
DoubleType of DecimalType.
-  // TODO: in future release, we should add a IntegerDivide to support 
integral types.
-  ignore("/ (Divide) for integral type") {
-checkEvaluation(Divide(Literal(1.toByte), Literal(2.toByte)), 0.toByte)
-checkEvaluation(Divide(Literal(1.toShort), Literal(2.toShort)), 
0.toShort)
-checkEvaluation(Divide(Literal(1), Literal(2)), 0)
-checkEvaluation(Divide(Literal(1.toLong), Literal(2.toLong)), 0.toLong)
-checkEvaluation(Divide(positiveShortLit, negativeShortLit), 0.toShort)
-checkEvaluation(Divide(positiveIntLit, negativeIntLit), 0)
-checkEvaluation(Divide(positiveLongLit, negativeLongLit), 0L)
+  test("/ (Divide) for integral type") {
+checkEvaluation(IntegralDivide(Literal(1.toByte), Literal(2.toByte)), 
0L)
+checkEvaluation(IntegralDivide(Literal(1.toShort), 
Literal(2.toShort)), 0L)
+checkEvaluation(IntegralDivide(Literal(1), Literal(2)), 0L)
+checkEvaluation(IntegralDivide(Literal(1.toLong), Literal(2.toLong)), 
0L)
+checkEvaluation(IntegralDivide(positiveShortLit, negativeShortLit), 0L)
+checkEvaluation(IntegralDivide(positiveIntLit, negativeIntLit), 0L)
+checkEvaluation(IntegralDivide(positiveLongLit, negativeLongLit), 0L)
--- End diff --

Could you add a test case for `divide by zero` like `test("/ (Divide) 
basic")`?

For now, this PR seems to follow the behavior of Spark `/` instead of Hive 
`div`. We had better be clear on our decision and prevent future unintended 
behavior changes.
```scala
scala> sql("select 2 / 0, 2 div 0").show()
+---+-+
|(CAST(2 AS DOUBLE) / CAST(0 AS DOUBLE))|(2 div 0)|
+---+-+
|   null| null|
+---+-+
```

```sql
0: jdbc:hive2://ctr-e138-1518143905142-477481> select 2 / 0;
+---+
|  _c0  |
+---+
| NULL  |
+---+

0: jdbc:hive2://ctr-e138-1518143905142-477481> select 2 div 0;
Error: Error while compiling statement: FAILED:
SemanticException [Error 10014]: Line 1:7 Wrong arguments '0':
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method 
public org.apache.hadoop.io.LongWritable 
org.apache.hadoop.hive.ql.udf.UDFOPLongDivide.evaluate(org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.LongWritable)
 with arguments {2,0}:/ by zero (state=42000,code=10014)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3141/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22435
  
**[Test build #96115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96115/testReport)**
 for PR 22435 at commit 
[`da86846`](https://github.com/apache/spark/commit/da868465de9ccdd302699786db30fe4fe90e4cfa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSo...

2018-09-16 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22435#discussion_r217934875
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 ---
@@ -83,4 +83,20 @@ class DataSourceScanExecRedactionSuite extends QueryTest 
with SharedSQLContext {
 }
   }
 
+  test("FileSourceScanExec metadata") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, 
"foo=1").toString)
+  val df = spark.read.parquet(basePath).filter("a = 1")
--- End diff --

Thanks @dongjoon-hyun I fixed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22395
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3140/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22395
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22395
  
**[Test build #96114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96114/testReport)**
 for PR 22395 at commit 
[`71255a1`](https://github.com/apache/spark/commit/71255a1787012baf2d5188991421e8197ec44733).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22395
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22418
  
**[Test build #96113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96113/testReport)**
 for PR 22418 at commit 
[`a378adb`](https://github.com/apache/spark/commit/a378adb85ef58a603ca4f9d6a7a527c35e0f2db5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22418
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22418
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3139/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22418
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSo...

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22435#discussion_r217933342
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 ---
@@ -83,4 +83,20 @@ class DataSourceScanExecRedactionSuite extends QueryTest 
with SharedSQLContext {
 }
   }
 
+  test("FileSourceScanExec metadata") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, 
"foo=1").toString)
+  val df = spark.read.parquet(basePath).filter("a = 1")
--- End diff --

Hi, @wangyum . I know that you follow the style of the other test cases in 
this suite, but could you simplify like the following? We had better keep a 
single test case as simple as possible by excluding irrelevant stuffs.
```scala
withTempPath { path =>
  val dir = path.getCanonicalPath
  spark.range(0, 10).toDF("a").write.parquet(dir)
  val df = spark.read.parquet(dir).filter("a = 1")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #96112 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96112/testReport)**
 for PR 7 at commit 
[`5c8f487`](https://github.com/apache/spark/commit/5c8f48715748bdeda703761fba6a4d1828a19985).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-16 Thread phegstrom

Github user phegstrom commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r217930978
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1803,6 +1803,18 @@ test_that("string operators", {
 collect(select(df4, split_string(df4$a, "")))[1, 1],
 list(list("a.b@c.d   1", "b"))
   )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\.", 2)))[1, 1],
+list(list("a", "b@c.d   1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "b", -2)))[1, 1],
+list(list("a.", "@c.d   1\\", ""))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "b", 0)))[1, 1],
--- End diff --

per @felixcheung's I added back the `limit = 0` case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-16 Thread phegstrom

Github user phegstrom commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r217930893
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1803,6 +1803,10 @@ test_that("string operators", {
 collect(select(df4, split_string(df4$a, "")))[1, 1],
 list(list("a.b@c.d   1", "b"))
   )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\.", 2)))[1, 1],
+list(list("a", "b@c.d   1\\b"))
--- End diff --

added for `limit = 0` to catch the "change behavior" case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-16 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r217928631
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -469,7 +470,17 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def treeString: String = treeString(verbose = true)
 
   def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
-generateTreeString(0, Nil, new StringBuilder, verbose = verbose, 
addSuffix = addSuffix).toString
+val baos = new ByteArrayOutputStream()
--- End diff --

In this particular method, there is no benefits. This was changed to reused 
the method which accepts `OutputStream` instead of `StringBuilder`. Benefit of 
`OutputStream` over `StringBuilder` is no full materialization in memory and no 
string size limit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-16 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r217928428
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
 def codegenToSeq(): Seq[(String, String)] = {
   org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
 }
+
+/**
+ * Dumps debug information about query execution into the specified 
file.
+ */
+def toFile(path: String): Unit = {
+  val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
+Utils.DEFAULT_MAX_TO_STRING_FIELDS)
+  val filePath = new Path(path)
+  val fs = FileSystem.get(filePath.toUri, 
sparkSession.sessionState.newHadoopConf())
+  val writer = new BufferedWriter(new 
OutputStreamWriter(fs.create(filePath)))
+
+  try {
+SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, 
Int.MaxValue.toString)
+writer.write("== Parsed Logical Plan ==\n")
--- End diff --

Can we combine this entire block with what is done in the `toString()` 
method?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-16 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r217928334
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -250,5 +254,36 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
 def codegenToSeq(): Seq[(String, String)] = {
   org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
 }
+
+/**
+ * Dumps debug information about query execution into the specified 
file.
+ */
+def toFile(path: String): Unit = {
+  val maxFields = SparkEnv.get.conf.getInt(Utils.MAX_TO_STRING_FIELDS,
+Utils.DEFAULT_MAX_TO_STRING_FIELDS)
+  val filePath = new Path(path)
+  val fs = FileSystem.get(filePath.toUri, 
sparkSession.sessionState.newHadoopConf())
+  val writer = new BufferedWriter(new 
OutputStreamWriter(fs.create(filePath)))
+
+  try {
+SparkEnv.get.conf.set(Utils.MAX_TO_STRING_FIELDS, 
Int.MaxValue.toString)
--- End diff --

It is generally a bad idea to change this conf as people expect that it is 
immutable. Also this change has some far reaching consequences, others will now 
also be exposed to a different `Utils.MAX_TO_STRING_FIELDS` value when calling 
`explain()`. Can you please just pass the parameter down the tree?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-16 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r217928262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -469,7 +470,17 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def treeString: String = treeString(verbose = true)
 
   def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
-generateTreeString(0, Nil, new StringBuilder, verbose = verbose, 
addSuffix = addSuffix).toString
+val baos = new ByteArrayOutputStream()
--- End diff --

What is the benefit of using this instead of using a `java.io.StringWriter` 
or `org.apache.commons.io.output.StringBuilderWriter`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22396#discussion_r217926918
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1897,7 +1897,8 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - In version 2.3 and earlier, CSV rows are considered as malformed if at 
least one column value in the row is malformed. CSV parser dropped such rows in 
the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 
2.4, CSV row is considered as malformed only when it contains malformed column 
values requested from CSV datasource, other values can be ignored. As an 
example, CSV file contains the "id,name" header and one row "1234". In Spark 
2.4, selection of the id column consists of a row with one column value 1234 
but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore 
the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to 
`false`.
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
-  - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
+  - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.
+  - Since Spark 2.4, The LOAD DATA command supports wildcard characters ? 
and *, which match any one character, and zero or more characters, 
respectively. Example: LOAD DATA INPATH '/tmp/folder*/ or LOAD DATA INPATH 
/tmp/part-?. Special Characters like spaces also now work in paths. Example: 
LOAD DATA INPATH /tmp/folder name/.
--- End diff --

The commands and paths should be back-tick-quoted for readability. I think 
they may be interpreted as markdown otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22429
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22429
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96109/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96111/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22429
  
**[Test build #96109 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96109/testReport)**
 for PR 22429 at commit 
[`ce2c086`](https://github.com/apache/spark/commit/ce2c08688bb8b51e97f686c95279a5f42b52116a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22435
  
**[Test build #96111 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96111/testReport)**
 for PR 22435 at commit 
[`830e188`](https://github.com/apache/spark/commit/830e1881b4ef4d9bb661d8b6635470e2596d4eaa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22393: [MINOR][DOCS] Axe deprecated doc refs

2018-09-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22393


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22393: [MINOR][DOCS] Axe deprecated doc refs

2018-09-16 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22393
  
thx. merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAp...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22432
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAp...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22432
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96108/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAp...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22432
  
**[Test build #96108 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96108/testReport)**
 for PR 22432 at commit 
[`04c3f7b`](https://github.com/apache/spark/commit/04c3f7b3c2a1b6a79d571ca2079ca6cc477027a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22396
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96110/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22396
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22396
  
**[Test build #96110 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96110/testReport)**
 for PR 22396 at commit 
[`b34b962`](https://github.com/apache/spark/commit/b34b96208dc86e9642dbc65e33a643df7b7ee406).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22343
  
Thank YOU for your PR and open discussion on this, @seancxmao . Let's see 
in another PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21677
  
Yes. @cloud-fan . We can embrace that concept to all the other main-method 
style benchmark. Previously, we do the manual copy to put the result into 
the nearest place to the corresponding BM code. It's not an easy way for 
automation. 

With @wangyum 's that specific contribution, we can automate all 
benchmarks. Possibly, we can use that in the release process, too. So, are you 
heading `main-method` style with separate BM output files? For me, +1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22427: [SPARK-25438][SQL][TEST] Fix FilterPushdownBenchm...

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22427#discussion_r217923482
  
--- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ---
@@ -2,737 +2,669 @@
 Pushdown for many distinct value case
 

 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
-Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
-
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-Parquet Vectorized8970 / 9122  1.8 
570.3   1.0X
-Parquet Vectorized (Pushdown)  471 /  491 33.4 
 30.0  19.0X
-Native ORC Vectorized 7661 / 7853  2.1 
487.0   1.2X
-Native ORC Vectorized (Pushdown)  1134 / 1161 13.9 
 72.1   7.9X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
-Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+Parquet Vectorized  11405 / 11485  1.4 
725.1   1.0X
+Parquet Vectorized (Pushdown)  675 /  690 23.3 
 42.9  16.9X
+Native ORC Vectorized 7127 / 7170  2.2 
453.1   1.6X
+Native ORC Vectorized (Pushdown)   519 /  541 30.3 
 33.0  22.0X
 
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-Parquet Vectorized9246 / 9297  1.7 
587.8   1.0X
-Parquet Vectorized (Pushdown)  480 /  488 32.8 
 30.5  19.3X
-Native ORC Vectorized 7838 / 7850  2.0 
498.3   1.2X
-Native ORC Vectorized (Pushdown)  1054 / 1118 14.9 
 67.0   8.8X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
-Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+Parquet Vectorized  11457 / 11473  1.4 
728.4   1.0X
+Parquet Vectorized (Pushdown)  656 /  686 24.0 
 41.7  17.5X
+Native ORC Vectorized 7328 / 7342  2.1 
465.9   1.6X
+Native ORC Vectorized (Pushdown)   539 /  565 29.2 
 34.2  21.3X
 
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select 1 string row (value = '7864320'): Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-Parquet Vectorized8989 / 9100  1.7 
571.5   1.0X
-Parquet Vectorized (Pushdown)  448 /  467 35.1 
 28.5  20.1X
-Native ORC Vectorized 7680 / 7768  2.0 
488.3   1.2X
-Native ORC Vectorized (Pushdown)  1067 / 1118 14.7 
 67.8   8.4X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
-Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+Parquet Vectorized  11878 / 11888  1.3 
755.2   1.0X
+Parquet Vectorized (Pushdown)  630 /  654 25.0 
 40.1  18.9X
+Native ORC Vectorized 7342 / 7362  2.1 
466.8   1.6X
+Native ORC Vectorized (Pushdown)   519 /  537 30.3 
 33.0  22.9X
 
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select 1 string row (value <=> '7864320'): Best/Avg Time(ms)Rate(M/s)  
 Per Row(ns)   Relative
 

-Parquet Vectorized9115 / 9266  1.7 
579.5   1.0X
-Parquet Vectorized (Pushdown)  466 /  492 33.7 
 29.7  19.5X
-Native ORC Vectorized 7800 / 7914  2.0 
495.9

[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...

2018-09-16 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22433
  
> As this script is common start point for all the resource 
managers(k8s/yarn/mesos/standalone/local), i guess changing this to fit for all 
the cases has a value add, instead of doing at each resource manager level. 
Thoughts?

Please note that I am specifically referring only to the need for changing 
application `name`.
The rationale given that `name` should be DNS compliant is a restriction 
specific to k8s and not spark.
Instead of doing one off rename's the right approach would be to handle 
this name translation such that it will benefit not just STS, but any user 
application.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s deploy...

2018-09-16 Thread jacobdr

Github user jacobdr commented on the issue:

https://github.com/apache/spark/pull/22433
  
> a DNS-1123 subdomain must consist of lower case alphanumeric characters, 
'-' or '.'

Your changes to the name handling donât comply with this, so agree with 
@mridulm you should move this change elsewhere and more broadly support name 
validation/sanitization for submitted applications in kubernetes 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22434: [SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96107/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22434: [SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22434
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22434: [SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile ...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22434
  
**[Test build #96107 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96107/testReport)**
 for PR 22434 at commit 
[`18a9135`](https://github.com/apache/spark/commit/18a91354abdf793a569a84046f3bf2016b2ccd03).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-16 Thread sujith71955

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22396#discussion_r217920696
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
   - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
+  - Since Spark 2.4 load command from local filesystem supports wildcards 
in the folder level paths(e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/).Also in 
Older versions space in folder/file names has been represented using '%20'(e.g. 
LOAD DATA INPATH 'tmp/folderName/myFile%20Name.csv), this usage will not be 
supported from spark 2.4 version. Since Spark 2.4, Spark supports normal space 
character in folder/file names (e.g. LOAD DATA INPATH 
'hdfs://tmp/folderName/file Name.csv') and wildcard character '?' can be used. 
(e.g. LOAD DATA INPATH 'hdfs://tmp/folderName/fileName?.csv')
--- End diff --

@gatorsmile Just used a common encoder (%20) in our example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-16 Thread sujith71955

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22396#discussion_r217920417
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
   - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. 
In version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
+  - Since Spark 2.4 load command from local filesystem supports wildcards 
in the folder level paths(e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/).Also in 
Older versions space in folder/file names has been represented using '%20'(e.g. 
LOAD DATA INPATH 'tmp/folderName/myFile%20Name.csv), this usage will not be 
supported from spark 2.4 version. Since Spark 2.4, Spark supports normal space 
character in folder/file names (e.g. LOAD DATA INPATH 
'hdfs://tmp/folderName/file Name.csv') and wildcard character '?' can be used. 
(e.g. LOAD DATA INPATH 'hdfs://tmp/folderName/fileName?.csv')
--- End diff --

@srowen   Sorry Sean i missed your suggested text,  I updated the message 
based on your suggestions.  Actually i became bit confused as this PR is a 
combination of bug fix and improvement  :) . 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22435
  
**[Test build #96111 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96111/testReport)**
 for PR 22435 at commit 
[`830e188`](https://github.com/apache/spark/commit/830e1881b4ef4d9bb661d8b6635470e2596d4eaa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22435: [SPARK-25423][SQL] Output "dataFilters" in DataSourceSca...