date:20180208

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20525#discussion_r167159659
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1930,6 +1930,9 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 - Literal values used in SQL operations are converted to DECIMAL with 
the exact precision and scale needed by them.
 - The configuration `spark.sql.decimalOperations.allowPrecisionLoss` 
has been introduced. It defaults to `true`, which means the new behavior 
described here; if set to `false`, Spark uses previous rules, ie. it doesn't 
adjust the needed scale to represent the values and it returns NULL if an exact 
representation of the value is not possible.
 
+ - Since Spark 2.3, writing an empty dataframe (a dataframe with 0 
partitions) in parquet or orc format, creates a format specific metadata only 
file. In prior versions the metadata only file was not created. As a result, 
subsequent attempt to read from this directory fails with AnalysisException 
while inferring schema of the file. For example : 
df.write.format("parquet").save("outDir")
--- End diff --

yea the above 2 changes are good!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20516#discussion_r167159549
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -59,6 +59,7 @@ abstract class SparkFunSuite
   protected val enableAutoThreadAudit = true
 
   protected override def beforeAll(): Unit = {
+System.setProperty("spark.testing", "true")
--- End diff --

if we are already doing this, let's make it more explicit that we should 
remove `./project/SparkBuild.scala:795: javaOptions in Test += 
"-Dspark.testing=1"` and set `spark.testing` in `SparkFunSuite.beforeAll`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

2018-02-08 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/20525#discussion_r167159534
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1930,6 +1930,9 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 - Literal values used in SQL operations are converted to DECIMAL with 
the exact precision and scale needed by them.
 - The configuration `spark.sql.decimalOperations.allowPrecisionLoss` 
has been introduced. It defaults to `true`, which means the new behavior 
described here; if set to `false`, Spark uses previous rules, ie. it doesn't 
adjust the needed scale to represent the values and it returns NULL if an exact 
representation of the value is not possible.
 
+ - Since Spark 2.3, writing an empty dataframe (a dataframe with 0 
partitions) in parquet or orc format, creates a format specific metadata only 
file. In prior versions the metadata only file was not created. As a result, 
subsequent attempt to read from this directory fails with AnalysisException 
while inferring schema of the file. For example : 
df.write.format("parquet").save("outDir")
--- End diff --

even -> even if ?
self-described -> self-describing ?
@cloud-fan Nicely written. Thanks. Let me know if you are ok with the above 
two change ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r167158719
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C](
 
context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
 context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
 
context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+// Use completion callback to stop sorter if task was cancelled.
--- End diff --

`if task is completed(either finished or canceled)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20525
  
@tdas @brkyvz Do we still need the fix for 0-partition DataFrame in 
Structured Streaming after this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20525#discussion_r167158557
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -301,7 +301,6 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSQLContext with Be
   intercept[AnalysisException] {
 
spark.range(10).write.format("csv").mode("overwrite").partitionBy("id").save(path)
   }
-  
spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path)
--- End diff --

How does it fail? If it's a runtime error we should fail earlier during 
analysis. This worth a new JIRA.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...

2018-02-08 Thread heary-cao

Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20516#discussion_r167158512
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -59,6 +59,7 @@ abstract class SparkFunSuite
   protected val enableAutoThreadAudit = true
 
   protected override def beforeAll(): Unit = {
+System.setProperty("spark.testing", "true")
--- End diff --

My debugging tool is IDEA, I think IDE had no relevance to the process of 
setting.
Be similar to HiveSparkSubmitSuite RPackageUtilsSuite SparkSubmitSuite.
There are also manually add System.setProperty("spark.testing", "true").
Of courseï¼ I try with mavn(using command line) to test case , it is 
right.  thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20525#discussion_r167158389
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -19,6 +19,7 @@ package org.apache.spark.sql.execution.datasources
 
 import org.apache.spark.sql.{QueryTest, Row}
 import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.{StringType, StructField, StructType}
--- End diff --

please remove it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20525#discussion_r167158260
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1930,6 +1930,9 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 - Literal values used in SQL operations are converted to DECIMAL with 
the exact precision and scale needed by them.
 - The configuration `spark.sql.decimalOperations.allowPrecisionLoss` 
has been introduced. It defaults to `true`, which means the new behavior 
described here; if set to `false`, Spark uses previous rules, ie. it doesn't 
adjust the needed scale to represent the values and it returns NULL if an exact 
representation of the value is not possible.
 
+ - Since Spark 2.3, writing an empty dataframe (a dataframe with 0 
partitions) in parquet or orc format, creates a format specific metadata only 
file. In prior versions the metadata only file was not created. As a result, 
subsequent attempt to read from this directory fails with AnalysisException 
while inferring schema of the file. For example : 
df.write.format("parquet").save("outDir")
--- End diff --

`Since Spark 2.3, writing an empty dataframe to a directory launches at 
least one write task, even physically the dataframe has no partition. This 
introduces a small behavior change that for self-described file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing 0-partition dataframe, so that schema inference can still work if 
users read that directory later. The new behavior is more reasonable and more 
consistent regarding writing empty dataframe.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20516#discussion_r167156814
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -59,6 +59,7 @@ abstract class SparkFunSuite
   protected val enableAutoThreadAudit = true
 
   protected override def beforeAll(): Unit = {
+System.setProperty("spark.testing", "true")
--- End diff --

Sorry let me make the question more clear.

Why we need this if `./project/SparkBuild.scala:795: javaOptions in Test += 
"-Dspark.testing=1"` works?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87243/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...

2018-02-08 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20516
  
I try with mavn(using command line) to test case , it is right. thanks.
Then, whether we add System.setProperty("spark.testing", "true") in 
SparkFunSuite to slove the IDE test tool problem. Be similar to 
HiveSparkSubmitSuite RPackageUtilsSuite SparkSubmitSuite. thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20555
  
**[Test build #87243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87243/testReport)**
 for PR 20555 at commit 
[`b26ffce`](https://github.com/apache/spark/commit/b26ffce6780078dbc38bff658e1ef7e9c56c3dd8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87247/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87247 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87247/testReport)**
 for PR 20477 at commit 
[`0cc0600`](https://github.com/apache/spark/commit/0cc0600b8f6f3a46189ae38850835f34b57bd945).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20490
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87244/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20490
  
**[Test build #87244 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87244/testReport)**
 for PR 20490 at commit 
[`e9964ca`](https://github.com/apache/spark/commit/e9964ca2fc831819662056210db594f613bce5d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20501: [SPARK-22430][Docs] Unknown tag warnings when bui...

2018-02-08 Thread rekhajoshm

Github user rekhajoshm closed the pull request at:

https://github.com/apache/spark/pull/20501


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20501: [SPARK-22430][Docs] Unknown tag warnings when building R...

2018-02-08 Thread rekhajoshm

Github user rekhajoshm commented on the issue:

https://github.com/apache/spark/pull/20501
  
Ack. thanks for the update @felixcheung @srowen Closing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

2018-02-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20499
  
Yup, I should fix the guide for 2.2 anyway :-) Will open a backport tonight 
KST.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20525
  
**[Test build #87250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87250/testReport)**
 for PR 20525 at commit 
[`30e5aa5`](https://github.com/apache/spark/commit/30e5aa50a5bb01f18eab134a206d72a73e501baf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/743/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20537
  
LGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20551: [SPARK-23271][DOC] Document the empty dataframe w...

2018-02-08 Thread dilipbiswal

Github user dilipbiswal closed the pull request at:

https://github.com/apache/spark/pull/20551


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-08 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
@jerryshao @cloud-fan I have updated my code. Do you have any other 
concerns?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20525
  
@cloud-fan @gatorsmile Done. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20378: [SPARK-11222][Build][Python] Python document styl...

2018-02-08 Thread rekhajoshm

Github user rekhajoshm closed the pull request at:

https://github.com/apache/spark/pull/20378


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20378: [SPARK-11222][Build][Python] Python document style check...

2018-02-08 Thread rekhajoshm

Github user rekhajoshm commented on the issue:

https://github.com/apache/spark/pull/20378
  
@HyukjinKwon @holdenk @ueshin @viirya @icexelloss @felixcheung @BryanCutler 
and @MrBago - This was one of the possible approach that I was running by you. 
I have proposed another approach at #20556  with features as below -
- Use sphinx like check, run only if pydocstyle installed on machine/jenkins
- use pydocstyle rather than single file pep257.py
- verify pydocstyle latest 2.1.1 is in use, to ensure latest doc checks are 
getting executed
- ignore (inclusion/exclusion) features and support via tox.ini
- Be non-breaking change and allow updating docstyle to standard at easy 
pace
Closing this.Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20378: [SPARK-11222][Build][Python] Python document style check...

2018-02-08 Thread rekhajoshm

Github user rekhajoshm commented on the issue:

https://github.com/apache/spark/pull/20378
  
@HyukjinKwon Identifying docstyle failures does not help much as it is not 
straightforward to exclude in this version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20525
  
no we can't merge 2 PRs together. Please pick one of your PRs and put all 
the changes there, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20556
  
**[Test build #87249 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87249/testReport)**
 for PR 20556 at commit 
[`ee14cf7`](https://github.com/apache/spark/commit/ee14cf708603bd904505a110c0ca5d3607d5cdb8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/742/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20525
  
@cloud-fan Actually i had already created the doc pr in the morning using 
the same JIRA number. Whenchen, if we want to have both the changes in the same 
commit , will we be able to do it when we merge the patch ? If not, pl let me 
know , i will close that PR and move over the change to this branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20378: [SPARK-11222][Build][Python] Python document styl...

2018-02-08 Thread rekhajoshm

Github user rekhajoshm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20378#discussion_r167148657
  
--- Diff: dev/lint-python ---
@@ -83,6 +84,53 @@ else
 rm "$PEP8_REPORT_PATH"
 fi
 
+ Python Document Style Checks 
+
+# Get PYDOCSTYLE at runtime so that we don't rely on it being installed on 
the build server.
+# Using pep257.py which is the single file version of pydocstyle.
+PYDOCSTYLE_VERSION="0.2.1"
--- End diff --

As called out earlier, this was single file python doc style checker, the 
latest does not have single file checker that can be included.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20499: [SPARK-23328][PYTHON] Disallow default value None...

2018-02-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20499


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20499
  
thanks, merging to master/2.3!

Can you send a new PR for 2.2? it conflicts...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167147910
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
--- End diff --

I pulled your code and played with it. So your PR does fix the bug, but in 
a hacky way. Let's me explain what happened.
1. `QueryPlan.canonicalized` is called, every expression in 
`DataSourceV2Relation` is canonicalized, including 
`DataSourceV2Relation.projection`. This means, the attributes in `projection` 
are all renamed to "none".
2. `DataSourceV2Relation.output` is called, which triggers the creation of 
the reader, and applies filter push down and column pruning. Note that because 
all attributes are renamed to "none", we are actually pushing invalid filters 
and columns to data sources.
3. line up `reader.schema` and `projection`, to get the actual output. 
Because all names are "none", it works.

However step 2 is pretty dangerous, Spark doesn't define the behavior of 
pushing invalid filters and columns, especially what `reader.schema` should 
return after invalid columns are pushed down.

I prefer my original fix, which put `output` in `DataSourceV2Relation`'s 
constructor parameters, and update it when doing column pruning in 
`PushDownOperatorsToDataSource`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20557: [SPARK-23364][SQL]'desc table' command in spark-sql add ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20557
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20557: [SPARK-23364][SQL]'desc table' command in spark-sql add ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20557
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...

2018-02-08 Thread guoxiaolongzte

GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/20557

[SPARK-23364][SQL]'desc table' command in spark-sql add column head display

## What changes were proposed in this pull request?

Use 'desc partition_table'  command in spark-sql client, i think it should 
add column head display.

Add 'col_name' âdata_typeâ 'comment'  column head display.

fix before:

![2](https://user-images.githubusercontent.com/26266482/36013945-283fea8c-0da2-11e8-8265-63d816dabd9b.png)

fix after:

![1](https://user-images.githubusercontent.com/26266482/36013954-3252fd7a-0da2-11e8-8e63-3b586f238072.png)

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-23364

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20557


commit 5699c0dc2810a4500f0ee34414b77b80afd0e9c1
Author: guoxiaolong 
Date:   2018-02-09T06:00:40Z

[SPARK-23364][SQL]'desc table' command in spark-sql add column head display




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20359: [SPARK-23186][SQL] Initialize DriverManager first before...

2018-02-08 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20359
  
Thank you for merging, @cloud-fan .
And thank you again, @HyukjinKwon , @gatorsmile , and @srowen !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87248/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20556
  
**[Test build #87248 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87248/testReport)**
 for PR 20556 at commit 
[`85ca69d`](https://github.com/apache/spark/commit/85ca69de956cd3255eee5c51e830b9aa8f451308).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/741/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20556
  
**[Test build #87248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87248/testReport)**
 for PR 20556 at commit 
[`85ca69d`](https://github.com/apache/spark/commit/85ca69de956cd3255eee5c51e830b9aa8f451308).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...

2018-02-08 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20532
  
I can see why you want this sometimes, but I'm trying to figure out if its 
really valuable for users in general.  You could always add a custom listener 
to log this info.  It would go into separate file, not the std event log file, 
which means you'd have a little more work to do to stitch them together.  OTOH 
that could be a good thing, as it means these history server wouldn't have to 
parse those extra lines.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...

2018-02-08 Thread rekhajoshm

GitHub user rekhajoshm opened a pull request:

https://github.com/apache/spark/pull/20556

[SPARK-23367][Build] Include python document style checking

## What changes were proposed in this pull request?
Include python document style checking.
This PR includes the pydocstyle checking if pydocstyle is installed, 
similar to sphinx checking.It takes care of exclusion/inclusion of explicit 
document error code via tox.ini. Currently all error codes are ignored to be a 
non-breaking change.

## How was this patch tested?
./dev/run-tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rekhajoshm/spark SPARK-23367

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20556


commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi 
Date:   2015-05-05T23:10:08Z

Merge pull request #1 from apache/master

Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi 
Date:   2015-05-08T21:49:09Z

Merge pull request #2 from apache/master

pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi 
Date:   2015-06-22T00:08:08Z

Merge pull request #3 from apache/master

Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi 
Date:   2015-09-17T01:03:09Z

Merge pull request #4 from apache/master

Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi 
Date:   2015-11-25T18:50:32Z

Merge pull request #5 from apache/master

pull request from apache/master

commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi 
Date:   2016-03-18T19:13:51Z

Merge pull request #6 from apache/master

pull latest from apache spark

commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi 
Date:   2016-04-05T20:26:40Z

Merge pull request #8 from apache/master

pull latest from apache spark

commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1
Author: Rekha Joshi 
Date:   2017-05-01T23:00:30Z

Merge pull request #9 from apache/master

Pull apache spark

commit 63d99b3ce5f222d7126133170a373591f0ac67dd
Author: Rekha Joshi 
Date:   2017-09-30T22:26:44Z

Merge pull request #10 from apache/master

pull latest apache spark

commit a7fc787466b71784ff86f9694f617db0f1042da8
Author: Rekha Joshi 
Date:   2018-01-21T00:17:58Z

Merge pull request #11 from apache/master

Apache spark pull latest

commit 3a2d45377ed4397de802badd764bc2588cfd275b
Author: Rekha Joshi 
Date:   2018-02-09T04:55:12Z

Merge pull request #12 from apache/master

Apache spark latest pull

commit 85ca69de956cd3255eee5c51e830b9aa8f451308
Author: rjoshi2 
Date:   2018-02-09T05:54:03Z

[SPARK-23367][Build] Include python document style checking




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20244: [SPARK-23053][CORE] taskBinarySerialization and t...

2018-02-08 Thread ivoson

Github user ivoson commented on a diff in the pull request:

https://github.com/apache/spark/pull/20244#discussion_r167145734
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2424,121 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * In this test, we simulate the scene in concurrent jobs using the same
+   * rdd which is marked to do checkpoint:
+   * Job one has already finished the spark job, and start the process of 
doCheckpoint;
+   * Job two is submitted, and submitMissingTasks is called.
+   * In submitMissingTasks, if taskSerialization is called before 
doCheckpoint is done,
+   * while part calculates from stage.rdd.partitions is called after 
doCheckpoint is done,
+   * we may get a ClassCastException when execute the task because of some 
rdd will do
+   * Partition cast.
+   *
+   * With this test case, just want to indicate that we should do 
taskSerialization and
+   * part calculate in submitMissingTasks with the same rdd checkpoint 
status.
+   */
+  test("SPARK-23053: avoid ClassCastException in concurrent execution with 
checkpoint") {
--- End diff --

hi @squito , it's fine. The pr and jira have been updated. Thanks for your 
patient and review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87239/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20554
  
**[Test build #87239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87239/testReport)**
 for PR 20554 at commit 
[`05c9d20`](https://github.com/apache/spark/commit/05c9d20da4361d631d8839bd4a45e4966964afa0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87238/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20554
  
**[Test build #87238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87238/testReport)**
 for PR 20554 at commit 
[`3ed2a50`](https://github.com/apache/spark/commit/3ed2a509276194214875f39e1e18d8093155c54c).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87247/testReport)**
 for PR 20477 at commit 
[`0cc0600`](https://github.com/apache/spark/commit/0cc0600b8f6f3a46189ae38850835f34b57bd945).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/740/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20477
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20516
  
Can you try with SBT(using command line)? Usually we don't trust the test 
result of IDE.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20525
  
I think it's better to have the doc change in the same PR, then it's more 
clear which patch caused the behavior change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...

2018-02-08 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20516
  
sure,
Operation environment: IDEA test tool.
test case:  test("can bind to a specific port")
Test code:
val maxRetries = portMaxRetries(conf)
println("maxRetries:" + maxRetries)
run result:
maxRetries:16

if and only if add System.setProperty("spark.testing", "true")  in 
SparkFunSuite.
run result:
maxRetries: 100
thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20387
  
> We've added a resolution rule from UnresolvedRelation to 
DataSourceV2Relation that uses our implementation. UnresolvedRelation needs to 
pass its TableIdentifier to the v2 relation, which is why I added this.

I've been thinking about this a little more. This is actually an existing 
problem for file-based data sources. The solution is, when converting an 
unresolved relation to data source relation, we add some new options to the 
existing data source options before passing the options to data source 
relation. See `FindDataSourceTable.readDataSourceTable` about how we handle the 
path option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/739/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20545
  
**[Test build #87246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87246/testReport)**
 for PR 20545 at commit 
[`664a62c`](https://github.com/apache/spark/commit/664a62c7da9ba5da2007d40ef9c157f7e82938c5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20545
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167142433
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
+projection match {
+  case Some(attrs) =>
+// use the projection attributes to avoid assigning new ids. 
fields that are not projected
+// will be assigned new ids, which is okay because they are not 
projected.
+val attrMap = attrs.map(a => a.name -> a).toMap
+schema.map(f => attrMap.getOrElse(f.name,
+  AttributeReference(f.name, f.dataType, f.nullable, 
f.metadata)()))
+  case _ =>
+schema.toAttributes
+}
+  }
+
+  private lazy val v2Options: DataSourceOptions = {
+// ensure path and table options are set correctly
+val updatedOptions = new mutable.HashMap[String, String]
+updatedOptions ++= options
+
+new DataSourceOptions(options.asJava)
--- End diff --

We all agree that duplicating the logic of creating `DataSourceOptions` in 
many places is a bad idea. Currently there are 2 proposals:

1. Have a central place to take care the data source v2 resolution logic, 
including option creating. This is the approach of data source v1, i.e. the 
class `DataSource`.
2. Similar to proposal 1, but make `DataSourceV2Relation` the central place.

For now we don't know which one is better, it depends on how data source v2 
evolves in the future. At this point of time, I think we should pick the 
simplest approach, which is passing the `DataSourceOptions` to 
`DataSourceV2Relation`. Then we just need a one-line change in 
`DataFrameReader`, and don't need to add `v2Options` here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87242/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87242/testReport)**
 for PR 20477 at commit 
[`0cc0600`](https://github.com/apache/spark/commit/0cc0600b8f6f3a46189ae38850835f34b57bd945).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20552: [SPARK-23099][SS] Migrate foreach sink to DataSourceV2

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20552
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87241/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20552: [SPARK-23099][SS] Migrate foreach sink to DataSourceV2

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20552
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20552: [SPARK-23099][SS] Migrate foreach sink to DataSourceV2

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20552
  
**[Test build #87241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87241/testReport)**
 for PR 20552 at commit 
[`a33a35c`](https://github.com/apache/spark/commit/a33a35ccbae7350519a3faf8d5d3d6f35692feb3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20303
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20303
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87236/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167141001
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
+projection match {
+  case Some(attrs) =>
+// use the projection attributes to avoid assigning new ids. 
fields that are not projected
+// will be assigned new ids, which is okay because they are not 
projected.
+val attrMap = attrs.map(a => a.name -> a).toMap
+schema.map(f => attrMap.getOrElse(f.name,
+  AttributeReference(f.name, f.dataType, f.nullable, 
f.metadata)()))
+  case _ =>
+schema.toAttributes
+}
+  }
+
+  private lazy val v2Options: DataSourceOptions = {
+// ensure path and table options are set correctly
+val updatedOptions = new mutable.HashMap[String, String]
+updatedOptions ++= options
+
+new DataSourceOptions(options.asJava)
+  }
+
+  private val sourceName: String = {
+source match {
+  case registered: DataSourceRegister =>
+registered.shortName()
+  case _ =>
+source.getClass.getSimpleName
+}
+  }
+
+  lazy val (
+  reader: DataSourceReader,
+  unsupportedFilters: Seq[Expression],
+  pushedFilters: Seq[Expression]) = {
+val newReader = userSchema match {
+  case Some(s) =>
+asReadSupportWithSchema.createReader(s, v2Options)
+  case _ =>
+asReadSupport.createReader(v2Options)
+}
+
+projection.foreach { attrs =>
+  DataSourceV2Relation.pushRequiredColumns(newReader, 
attrs.toStructType)
+}
+
+val (remainingFilters, pushedFilters) = filters match {
+  case Some(filterSeq) =>
+DataSourceV2Relation.pushFilters(newReader, filterSeq)
+  case _ =>
+(Nil, Nil)
+}
+
+(newReader, remainingFilters, pushedFilters)
+  }
 
-  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2Relation]
+  def writer(dfSchema: StructType, mode: SaveMode): 
Option[DataSourceWriter] = {
--- End diff --

I think we should avoid adding unused code that is needed in the future. 
The streaming data source v2 was a bad example and you already pointed it out. 
Hope we don't make the same mistake in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20303
  
**[Test build #87236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87236/testReport)**
 for PR 20303 at commit 
[`603c6d5`](https://github.com/apache/spark/commit/603c6d58ae9a72f8202236682c78cd48a9bb320e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20521
  
**[Test build #87245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87245/testReport)**
 for PR 20521 at commit 
[`6bc913f`](https://github.com/apache/spark/commit/6bc913f71bab6a7d5f04dfa465e1e67951489dc6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/738/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20541
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20521#discussion_r167140801
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala
 ---
@@ -128,32 +128,6 @@ class HiveExplainSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleto
   "src")
   }
 
-  test("SPARK-17409: The EXPLAIN output of CTAS only shows the analyzed 
plan") {
--- End diff --

This is kinda a "bad" test. The bug was we optimize the CTAS input query 
twice, but here we are testing the if the EXPLAIN result of CTAS only contains 
analyzed query, which is specific to how we fix that bug at that time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87237/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20541
  
**[Test build #87237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87237/testReport)**
 for PR 20541 at commit 
[`4f5d46b`](https://github.com/apache/spark/commit/4f5d46baca612caaa882cbabb3b35665e9c7ed8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20359: [SPARK-23186][SQL] Initialize DriverManager first...

2018-02-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20359


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20359: [SPARK-23186][SQL] Initialize DriverManager first before...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20359
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20516
  
are you sure `./project/SparkBuild.scala:795: javaOptions in Test += 
"-Dspark.testing=1"` only affect non-test code path? Then we have a lot of 
places to fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87240/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20545
  
**[Test build #87240 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87240/testReport)**
 for PR 20545 at commit 
[`664a62c`](https://github.com/apache/spark/commit/664a62c7da9ba5da2007d40ef9c157f7e82938c5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20244: [SPARK-23053][CORE] taskBinarySerialization and t...

2018-02-08 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/20244#discussion_r167138603
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2424,121 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * In this test, we simulate the scene in concurrent jobs using the same
+   * rdd which is marked to do checkpoint:
+   * Job one has already finished the spark job, and start the process of 
doCheckpoint;
+   * Job two is submitted, and submitMissingTasks is called.
+   * In submitMissingTasks, if taskSerialization is called before 
doCheckpoint is done,
+   * while part calculates from stage.rdd.partitions is called after 
doCheckpoint is done,
+   * we may get a ClassCastException when execute the task because of some 
rdd will do
+   * Partition cast.
+   *
+   * With this test case, just want to indicate that we should do 
taskSerialization and
+   * part calculate in submitMissingTasks with the same rdd checkpoint 
status.
+   */
+  test("SPARK-23053: avoid ClassCastException in concurrent execution with 
checkpoint") {
--- End diff --

hi @ivoson -- I haven't come up with a better way to test this, so I think 
for now you should

(1) change the PR to *only* include the changes to the DAGScheduler (also 
undo the `protected[spark]` changes elsewhere)
(2) put this repro on the jira as its a pretty good for showing whats going 
on.

if we come up with a way to test it, we can always do that later on.

thanks and sorry for the back and forth


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...

2018-02-08 Thread heary-cao

Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20516#discussion_r167137766
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -,7 +,7 @@ private[spark] object Utils extends Logging {
*/
   def portMaxRetries(conf: SparkConf): Int = {
 val maxRetries = conf.getOption("spark.port.maxRetries").map(_.toInt)
-if (conf.contains("spark.testing")) {
+if (isTesting || conf.contains("spark.testing")) {
--- End diff --

Sorry, may I have this understanding of one-sided point. It is not just in 
the test call. But when we need to get the default value for 
spark.port.maxRetries is 100.  still need to set the `'spark.testing` .
Or in the Spark unit test set test sign. so I added it to here. thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r167137165
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -62,6 +62,16 @@
*/
   DataWriterFactory createWriterFactory();
 
+  /**
+   * Returns whether Spark should use the commit coordinator to ensure 
that only one attempt for
--- End diff --

This is actually not a guarantee, is it?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 573 matches

Mail list logo