date:20180621

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-21 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197347130
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableOrQuery = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subQuery if subQuery.nonEmpty => s"(${subQuery}) 
spark_gen_${curId.getAndIncrement()}"
+  case subQuery => subQuery
+}
+  }
+
+  require(tableOrQuery.nonEmpty,
+s"Empty string is not allowed in either '$JDBC_TABLE_NAME' or 
'${JDBC_QUERY_STRING}' options"
+  )
+
 
-  // 
--- End diff --

nit: revert this line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92197/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #92197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92197/testReport)**
 for PR 21061 at commit 
[`195f3bd`](https://github.com/apache/spark/commit/195f3bd6b47da19b27cd0c8140bcd9aa6a063843).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve Analyze Table command

2018-06-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21608
  
This pr improves actual performance values? (My question is that the 
calculation is a bottleneck?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92192/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21606
  
**[Test build #92192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92192/testReport)**
 for PR 21606 at commit 
[`a16d9f9`](https://github.com/apache/spark/commit/a16d9f907b3ce0078da72b7e7bcc56e187cbc8f9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21482
  
I have no more comments except the one above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r197340906
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -468,6 +468,18 @@ def input_file_name():
 return Column(sc._jvm.functions.input_file_name())
 
 
+@since(2.4)
+def isinf(col):
--- End diff --

Yes, please because I see it's exposed in Column.scala.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92194/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21594
  
**[Test build #92194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92194/testReport)**
 for PR 21594 at commit 
[`2f00f2f`](https://github.com/apache/spark/commit/2f00f2fe0e1cf9a0d44285aab306ed55bd176d9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21603
  
**[Test build #92198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92198/testReport)**
 for PR 21603 at commit 
[`b9b3160`](https://github.com/apache/spark/commit/b9b3160061ef1e17ae32599ed9fbcfd44b0565b4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21603
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21603
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...

2018-06-21 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/21603#discussion_r197338867
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -270,6 +270,11 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean) {
   case sources.Not(pred) =>
 createFilter(schema, pred).map(FilterApi.not)
 
+  case sources.In(name, values) if canMakeFilterOn(name) && 
values.length < 20 =>
--- End diff --

It seems that the push-down performance is better when threshold is less 
than `300`:
https://user-images.githubusercontent.com/5399861/41757743-7e411532-7616-11e8-8844-45132c50c535.png;>

The code:
```scala
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") {
  import testImplicits._
  withTempPath { path =>
val total = 1000
(0 to total).toDF().coalesce(1)
  .write.option("parquet.block.size", 512)
  .parquet(path.getAbsolutePath)
val df = spark.read.parquet(path.getAbsolutePath)
// scalastyle:off println
var lastSize = -1
var i = 16000
while (i < total) {
  val filter = Range(0, total).filter(_ % i == 0)
  i += 100
  if (lastSize != filter.size) {
if (lastSize == -1) println(s"start size: ${filter.size}")
lastSize = filter.size
sql("set spark.sql.parquet.pushdown.inFilterThreshold=100")
val begin1 = System.currentTimeMillis()
df.where(s"id in(${filter.mkString(",")})").count()
val end1 = System.currentTimeMillis()
val time1 = end1 - begin1

sql("set spark.sql.parquet.pushdown.inFilterThreshold=10")
val begin2 = System.currentTimeMillis()
df.where(s"id in(${filter.mkString(",")})").count()
val end2 = System.currentTimeMillis()
val time2 = end2 - begin2
if (time1 <= time2) println(s"Max threshold: $lastSize")
  }
}
  }
}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21610
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21610
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21610
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21610: Updates to LICENSE and NOTICE

2018-06-21 Thread justinmclean

GitHub user justinmclean opened a pull request:

https://github.com/apache/spark/pull/21610

Updates to LICENSE and NOTICE

## What changes were proposed in this pull request?

LICENSE and NOTICE changes as per ASF policy

## How was this patch tested?

N/A

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/justinmclean/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21610.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21610


commit b9d12d700b9cb83402e42f264f21bca090e0d1e3
Author: Justin Mclean 
Date:   2018-06-22T04:20:59Z

Updates to LICENSE and NOTICE




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21609
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92196/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21609
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21609
  
**[Test build #92196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92196/testReport)**
 for PR 21609 at commit 
[`3040763`](https://github.com/apache/spark/commit/3040763e51c8d32309f2dc38ce8b9fcc740ceb3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21603#discussion_r197336527
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -270,6 +270,11 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean) {
   case sources.Not(pred) =>
 createFilter(schema, pred).map(FilterApi.not)
 
+  case sources.In(name, values) if canMakeFilterOn(name) && 
values.length < 20 =>
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92195/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21607
  
**[Test build #92195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92195/testReport)**
 for PR 21607 at commit 
[`9d7e6ea`](https://github.com/apache/spark/commit/9d7e6eafff3daa519f7fda0b1f219f74d499874d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92193/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21607
  
**[Test build #92193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92193/testReport)**
 for PR 21607 at commit 
[`0520d60`](https://github.com/apache/spark/commit/0520d60b44987369fa62d7237427cb0cf022ed41).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92190/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21606
  
**[Test build #92190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92190/testReport)**
 for PR 21606 at commit 
[`227d513`](https://github.com/apache/spark/commit/227d513ade176fd56f7e6d75a16deb6c654982db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92189/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21606
  
**[Test build #92189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92189/testReport)**
 for PR 21606 at commit 
[`5efaae7`](https://github.com/apache/spark/commit/5efaae74bf340fed4223b5209bed63475cc35516).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92191/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #92191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92191/testReport)**
 for PR 21320 at commit 
[`a255bcb`](https://github.com/apache/spark/commit/a255bcb4c480d3c97f7ff0590bca0c20de034a31).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/spark/pull/21609
  
Can this pr be merged ASAP? Currently there is an error on branch-2.2 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
Yup, will fix the hive fork thing and be back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21570: [SPARK-24564][TEST] Add test suite for RecordBina...

2018-06-21 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21570#discussion_r197328626
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
 ---
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql.execution.sort;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.memory.TaskMemoryManager;
--- End diff --

cc @jiangxb1987


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #92197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92197/testReport)**
 for PR 21061 at commit 
[`195f3bd`](https://github.com/apache/spark/commit/195f3bd6b47da19b27cd0c8140bcd9aa6a063843).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21588
  
@HyukjinKwon , I'm in favor of @vanzin 's comment, we should fix things 
first and then back to this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21548: [SPARK-24518][CORE] Using Hadoop credential provi...

2018-06-21 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21548#discussion_r197327620
  
--- Diff: core/src/main/scala/org/apache/spark/SSLOptions.scala ---
@@ -179,9 +185,11 @@ private[spark] object SSLOptions extends Logging {
 .orElse(defaults.flatMap(_.keyStore))
 
 val keyStorePassword = 
conf.getWithSubstitution(s"$ns.keyStorePassword")
+
.orElse(Option(hadoopConf.getPassword(s"$ns.keyStorePassword")).map(new 
String(_)))
--- End diff --

Hi @vanzin , I checked jdk8 doc again, I don't find a String constructor 
which takes both char array and charset as parameters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92187/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21606
  
**[Test build #92187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92187/testReport)**
 for PR 21606 at commit 
[`c884f4f`](https://github.com/apache/spark/commit/c884f4f27199b3c91f56ba0042b42d09bc243883).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21598#discussion_r197326162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1314,6 +1314,13 @@ object SQLConf {
   "Other column values can be ignored during parsing even if they are 
malformed.")
 .booleanConf
 .createWithDefault(true)
+
+  val LEGACY_SIZE_OF_NULL = buildConf("spark.sql.legacy.sizeOfNull")
--- End diff --

That's basically the same except that the postfix includes a specific 
version, which was just a rough idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

2018-06-21 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/spark/pull/21577
  
@vanzin @tgravescs , after merge this pr into branch-2.2, there is an error 
"stageAttemptNumber is not a member of org.apache.spark.TaskContext" in 
SparkHadoopMapRedUtil, I think it needs to merge PR-20082 first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21598
  
My assumption was that the PR and JIRA claim that it's the right behaviour, 
as I said multiple times. If there's no such thing, there should be of course 
no need to argue about the default value, as I said above.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...

2018-06-21 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21542
  
Even when we stop forking SpotBugs, the same error occurred.
@HyukjinKwon is there any idea? I would appreciate your thoughts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92188/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21607
  
**[Test build #92188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92188/testReport)**
 for PR 21607 at commit 
[`d1f3219`](https://github.com/apache/spark/commit/d1f3219a58f4dc4f1e65a793c6d01572b25a609e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
Will try to fix it then.

We can just enable it back. If we want to support those Hive versions in 
Hadoop 3, we could simply enable them back with some fixes at that time. Adding 
the support sounds an incremental improvement.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-06-21 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r197319579
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -2355,3 +2355,347 @@ case class ArrayRemove(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_remove"
 }
+
+object ArraySetLike {
+  def useGenericArrayData(elementSize: Int, length: Int): Boolean = {
+// Use the same calculation in UnsafeArrayData.fromPrimitiveArray()
+val headerInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(length)
+val valueRegionInBytes = elementSize.toLong * length
+val totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) / 8
+totalSizeInLongs > Integer.MAX_VALUE / 8
+  }
+
+  def throwUnionLengthOverflowException(length: Int): Unit = {
+throw new RuntimeException(s"Unsuccessful try to union arrays with 
$length " +
+  s"elements due to exceeding the array size limit " +
+  s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.")
+  }
+}
+
+
+abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast {
+  override def dataType: DataType = left.dataType
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val typeCheckResult = super.checkInputDataTypes()
+if (typeCheckResult.isSuccess) {
+  
TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType,
+s"function $prettyName")
+} else {
+  typeCheckResult
+}
+  }
+
+  @transient protected lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient protected lazy val elementTypeSupportEquals = elementType 
match {
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+}
+
+/**
+ * Returns an array of the elements in the union of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array1, array2) - Returns an array of the elements in the union 
of array1 and array2,
+  without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(1, 2, 3, 5)
+  """,
+  since = "2.4.0")
+case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike {
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  resultArray.setInt(pos, elem)
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  resultArray.setLong(pos, elem)
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  size: Int,
+  resultArray: ArrayData,
+  isLongType: Boolean): ArrayData = {
+// store elements into resultArray
+var foundNullElement = false
+var pos = 0
+Seq(array1, array2).foreach(array => {
+  var i = 0
+  while (i < array.numElements()) {
+if (array.isNullAt(i)) {
+  if (!foundNullElement) {
+resultArray.setNullAt(pos)
+pos += 1
+foundNullElement = true
+  }
+} else {
+  val assigned = if (!isLongType) {
+assignInt(array, i, resultArray, pos)
+  } else {
+assignLong(array, i, resultArray, pos)
+  }
+  if (assigned) {
+pos += 1
+  }
+}
+i += 1
+  }
+})
+resultArray
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  val hsSize = new OpenHashSet[Int]
+  Seq(array1, array2).foreach(array => {
+var i = 0
+while (i < array.numElements()) {
+  if (hsSize.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) 
{
+

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21609
  
+1 pending tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/397/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21609
  
**[Test build #92196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92196/testReport)**
 for PR 21609 at commit 
[`3040763`](https://github.com/apache/spark/commit/3040763e51c8d32309f2dc38ce8b9fcc740ceb3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21609
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/396/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21607
  
**[Test build #92195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92195/testReport)**
 for PR 21607 at commit 
[`9d7e6ea`](https://github.com/apache/spark/commit/9d7e6eafff3daa519f7fda0b1f219f74d499874d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21609
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21588
  
> The tests were passed in this PR builder

Against your private build of the Hive stuff.

Again, fix that and this will become a lot easier to discuss. 

I'm also against disabling these tests without a proper discussion of what 
that means, and I've said multiple times. If we want to support those Hive 
versions in Hadoop 3, then this is the wrong change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext

2018-06-21 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21609
  
backport to branch-2.2,  only changes was to mimaExcludes and test file 
that had one more call to TaskContext.

@vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21609: [SPARK-22897][CORE] Expose stageAttemptId in Task...

2018-06-21 Thread tgravescs

GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/21609

[SPARK-22897][CORE] Expose stageAttemptId in TaskContext

stageAttemptId added in TaskContext and corresponding construction 
modification

Added a new test in TaskContextSuite, two cases are tested:
1. Normal case without failure
2. Exception case with resubmitted stages

Link to [SPARK-22897](https://issues.apache.org/jira/browse/SPARK-22897)

Author: Xianjin YE 

Closes #20082 from advancedxy/SPARK-22897.

Conflicts:
project/MimaExcludes.scala

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-22897

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21609.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21609


commit 4bc8d2805949b6b9d4d06ff4ad0493d9b33c7063
Author: Xianjin YE 
Date:   2018-01-02T15:30:38Z

[SPARK-22897][CORE] Expose stageAttemptId in TaskContext

stageAttemptId added in TaskContext and corresponding construction 
modification

Added a new test in TaskContextSuite, two cases are tested:
1. Normal case without failure
2. Exception case with resubmitted stages

Link to [SPARK-22897](https://issues.apache.org/jira/browse/SPARK-22897)

Author: Xianjin YE 

Closes #20082 from advancedxy/SPARK-22897.

Conflicts:
project/MimaExcludes.scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21594
  
**[Test build #92194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92194/testReport)**
 for PR 21594 at commit 
[`2f00f2f`](https://github.com/apache/spark/commit/2f00f2fe0e1cf9a0d44285aab306ed55bd176d9c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/395/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/394/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21607
  
**[Test build #92193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92193/testReport)**
 for PR 21607 at commit 
[`0520d60`](https://github.com/apache/spark/commit/0520d60b44987369fa62d7237427cb0cf022ed41).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21607
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
The tests were passed in this PR builder. The only hack I used is that I 
landed a one liner fix to an artifact to use it in this PR, which is already in 
Hive, and is proposed in Hive's fork which is blocked by non-techinical reason. 
I am working on this to get through. Okay, if you think it should be blocked, 
let me get through this first.

I am not dropping it. Isn't it what we already cover? I believe this is the 
most minimised and conservative fix to make Hadoop 3 working within Spark since 
we already added it. FWIW, we didn't document Hadoop 3 profile yet, so my 
impression is that it's in progress yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21606
  
**[Test build #92192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92192/testReport)**
 for PR 21606 at commit 
[`a16d9f9`](https://github.com/apache/spark/commit/a16d9f907b3ce0078da72b7e7bcc56e187cbc8f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21606: [SPARK-24552][core][SQL] Use task ID instead of a...

2018-06-21 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21606#discussion_r197316565
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -76,13 +76,29 @@ object SparkHadoopWriter extends Logging {
 // Try to write all RDD partitions as a Hadoop OutputFormat.
 try {
   val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: 
Iterator[(K, V)]) => {
+// Generate a positive integer task ID that is unique for the 
current stage. This makes a
+// few assumptions:
+// - the task ID is always positive
+// - stages cannot have more than Int.MaxValue
+// - the sum of task counts of all active stages doesn't exceed 
Int.MaxValue
+//
+// The first two are currently the case in Spark, while the last 
one is very unlikely to
+// occur. If it does, two tasks IDs on a single stage could have a 
clashing integer value,
+// which could lead to code that generates clashing file names for 
different tasks. Still,
+// if the commit coordinator is enabled, only one task would be 
allowed to commit.
--- End diff --

Ok, I'll use that. I think Spark might fail everything before you even go 
that high in attempt numbers anyway...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21588
  
I already explained my view of why I don't think this should get in, in its 
current form.

Passing tests in someone's private environment, for me, is not a worthy 
goal.

You say the fix is needed, but I'm not even sure this is the right fix. 
You're dropping support for a bunch of Hive versions, effectively. Is that what 
we want?

If it is, you need to properly document that, and fix places where you need 
a proper error message so users are not confused.

If it's not, you need to find a solution to that problem. And for that it 
would be easier if you could actually test your change here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21607: branch-2.1: backport SPARK-24589 and SPARK-22897

2018-06-21 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21607#discussion_r197315441
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -97,48 +102,48 @@ private[spark] class OutputCommitCoordinator(conf: 
SparkConf, isDriver: Boolean)
   }
 
   /**
-   * Called by the DAGScheduler when a stage starts.
+   * Called by the DAGScheduler when a stage starts. Initializes the 
stage's state if it hasn't
+   * yet been initialized.
*
* @param stage the stage id.
* @param maxPartitionId the maximum partition id that could appear in 
this stage's tasks (i.e.
*   the maximum possible value of 
`context.partitionId`).
*/
-  private[scheduler] def stageStart(
-  stage: StageId,
-  maxPartitionId: Int): Unit = {
-val arr = new Array[TaskAttemptNumber](maxPartitionId + 1)
-java.util.Arrays.fill(arr, NO_AUTHORIZED_COMMITTER)
+  private[scheduler] def stageStart(stage: Int, maxPartitionId: Int): Unit 
= synchronized {
+val arr = Array.fill[TaskIdentifier](maxPartitionId + 1)(null)
 synchronized {
--- End diff --

we have 2 nested synchronized


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
I at least checked if this passed with that fix to fork manually. It fixes 
everything else that can be fixed in Spark. I wonder why this should be blocked 
to be honest yet. It can't be ran via Jenkins, which I accept that thiis change 
should be blocked but this fix is needed anyway and can be unblocked. If 
something is needed, I just review and merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21247: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21247
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92183/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21247: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21247
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21247: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21247
  
**[Test build #92183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92183/testReport)**
 for PR 21247 at commit 
[`ca1b243`](https://github.com/apache/spark/commit/ca1b24322edd119d1e15b39f79bb15dd22cae482).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ForeachBatchFunction(object):`
  * `case class ArrayDistinct(child: Expression)`
  * `class PythonForeachWriter(func: PythonFunction, schema: StructType)`
  * `  class UnsafeRowBuffer(taskMemoryManager: TaskMemoryManager, tempDir: 
File, numFields: Int)`
  * `trait MemorySinkBase extends BaseStreamingSink with Logging `
  * `class MemorySink(val schema: StructType, outputMode: OutputMode, 
options: DataSourceOptions)`
  * `class ForeachBatchSink[T](batchWriter: (Dataset[T], Long) => Unit, 
encoder: ExpressionEncoder[T])`
  * `trait PythonForeachBatchFunction `
  * `case class ForeachWriterProvider[T](`
  * `case class ForeachWriterFactory[T](`
  * `class ForeachDataWriter[T](`
  * `class MemoryWriter(`
  * `class MemoryStreamWriter(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidati...

2018-06-21 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21594#discussion_r197314689
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -801,4 +800,67 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
 assert(cachedData.collect === Seq(1001))
   }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation - uncache temporary 
view") {
+withView("t1", "t2") {
--- End diff --

Yes.. good catch! A mistake caused by copy-paste.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidati...

2018-06-21 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21594#discussion_r197314556
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala ---
@@ -143,9 +153,57 @@ class DatasetCacheSuite extends QueryTest with 
SharedSQLContext with TimeLimits
 df.count()
 df2.cache()
 
-val plan = df2.queryExecution.withCachedData
-assert(plan.isInstanceOf[InMemoryRelation])
-val internalPlan = 
plan.asInstanceOf[InMemoryRelation].cacheBuilder.cachedPlan
-
assert(internalPlan.find(_.isInstanceOf[InMemoryTableScanExec]).isDefined)
+assertCacheDependency(df2)
+  }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation") {
+val df = Seq(("a", 1), ("b", 2)).toDF("s", "i")
+val df2 = df.filter('i > 1)
+val df3 = df.filter('i < 2)
+
+df2.cache()
+df.cache()
+df.count()
+df3.cache()
+
+df.unpersist()
+
+// df un-cached; df2 and df3's cache plan re-compiled
+assert(df.storageLevel == StorageLevel.NONE)
+assertCacheDependency(df2, 0)
+assertCacheDependency(df3, 0)
+  }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation - verify cached data 
reuse") {
+val expensiveUDF = udf({ x: Int => Thread.sleep(5000); x })
+val df = spark.range(0, 10).toDF("a")
+val df1 = df.withColumn("b", expensiveUDF($"a"))
+val df2 = df1.groupBy('a).agg(sum('b))
+val df3 = df.agg(sum('a))
+
+df1.cache()
+df2.cache()
+df2.collect()
+df3.cache()
+
+assertCacheDependency(df2)
+
+df1.unpersist(blocking = true)
+
+// df1 un-cached; df2's cache plan re-compiled
+assert(df1.storageLevel == StorageLevel.NONE)
+assertCacheDependency(df1.groupBy('a).agg(sum('b)), 0)
+
+val df4 = df1.groupBy('a).agg(sum('b)).select("sum(b)")
+assertCached(df4)
+// reuse loaded cache
+failAfter(3 seconds) {
+  df4.collect()
+}
+
+val df5 = df.agg(sum('a)).filter($"sum(a)" > 1)
+assertCached(df5)
+// first time use, load cache
+df5.collect()
--- End diff --

We just need to prove the new InMemoryRelation works alright for building 
cache (since the plan has been re-compiled) ... maybe we should check result 
though. Plus, I deliberately made this dataframe not dependent on the UDF so it 
can finish quickly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21192
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92184/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21192
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21192
  
**[Test build #92184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92184/testReport)**
 for PR 21192 at commit 
[`eab96b4`](https://github.com/apache/spark/commit/eab96b4ed078263d8eb1df6b1204c007f6b4be4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ForeachBatchFunction(object):`
  * `case class ArrayDistinct(child: Expression)`
  * `class PythonForeachWriter(func: PythonFunction, schema: StructType)`
  * `  class UnsafeRowBuffer(taskMemoryManager: TaskMemoryManager, tempDir: 
File, numFields: Int)`
  * `trait MemorySinkBase extends BaseStreamingSink with Logging `
  * `class MemorySink(val schema: StructType, outputMode: OutputMode, 
options: DataSourceOptions)`
  * `class ForeachBatchSink[T](batchWriter: (Dataset[T], Long) => Unit, 
encoder: ExpressionEncoder[T])`
  * `trait PythonForeachBatchFunction `
  * `case class ForeachWriterProvider[T](`
  * `case class ForeachWriterFactory[T](`
  * `class ForeachDataWriter[T](`
  * `class MemoryWriter(`
  * `class MemoryStreamWriter(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21594
  
LGTM except some comments about test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidati...

2018-06-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21594#discussion_r197311829
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -801,4 +800,67 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
 assert(cachedData.collect === Seq(1001))
   }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation - uncache temporary 
view") {
+withView("t1", "t2") {
--- End diff --

`withTempView`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidati...

2018-06-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21594#discussion_r197312423
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala ---
@@ -143,9 +153,57 @@ class DatasetCacheSuite extends QueryTest with 
SharedSQLContext with TimeLimits
 df.count()
 df2.cache()
 
-val plan = df2.queryExecution.withCachedData
-assert(plan.isInstanceOf[InMemoryRelation])
-val internalPlan = 
plan.asInstanceOf[InMemoryRelation].cacheBuilder.cachedPlan
-
assert(internalPlan.find(_.isInstanceOf[InMemoryTableScanExec]).isDefined)
+assertCacheDependency(df2)
+  }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation") {
+val df = Seq(("a", 1), ("b", 2)).toDF("s", "i")
+val df2 = df.filter('i > 1)
+val df3 = df.filter('i < 2)
+
+df2.cache()
+df.cache()
+df.count()
+df3.cache()
+
+df.unpersist()
+
+// df un-cached; df2 and df3's cache plan re-compiled
+assert(df.storageLevel == StorageLevel.NONE)
+assertCacheDependency(df2, 0)
+assertCacheDependency(df3, 0)
+  }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation - verify cached data 
reuse") {
+val expensiveUDF = udf({ x: Int => Thread.sleep(5000); x })
+val df = spark.range(0, 10).toDF("a")
+val df1 = df.withColumn("b", expensiveUDF($"a"))
+val df2 = df1.groupBy('a).agg(sum('b))
+val df3 = df.agg(sum('a))
+
+df1.cache()
+df2.cache()
+df2.collect()
+df3.cache()
+
+assertCacheDependency(df2)
+
+df1.unpersist(blocking = true)
+
+// df1 un-cached; df2's cache plan re-compiled
+assert(df1.storageLevel == StorageLevel.NONE)
+assertCacheDependency(df1.groupBy('a).agg(sum('b)), 0)
+
+val df4 = df1.groupBy('a).agg(sum('b)).select("sum(b)")
+assertCached(df4)
+// reuse loaded cache
+failAfter(3 seconds) {
+  df4.collect()
+}
+
+val df5 = df.agg(sum('a)).filter($"sum(a)" > 1)
+assertCached(df5)
+// first time use, load cache
+df5.collect()
--- End diff --

how do we prove this takes more than 5 seconds?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidati...

2018-06-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21594#discussion_r197311907
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -801,4 +800,67 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
 assert(cachedData.collect === Seq(1001))
   }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation - uncache temporary 
view") {
+withView("t1", "t2") {
+  sql("CACHE TABLE t1 AS SELECT * FROM testData WHERE key > 1")
+  sql("CACHE TABLE t2 as SELECT * FROM t1 WHERE value > 1")
+
+  assert(spark.catalog.isCached("t1"))
+  assert(spark.catalog.isCached("t2"))
+  sql("UNCACHE TABLE t1")
+  assert(!spark.catalog.isCached("t1"))
+  assert(spark.catalog.isCached("t2"))
+}
+  }
+
+  test("SPARK-24596 Non-cascading Cache Invalidation - drop temporary 
view") {
+withView("t1", "t2") {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92181/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21606
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...

2018-06-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21606
  
**[Test build #92181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92181/testReport)**
 for PR 21606 at commit 
[`7233a5f`](https://github.com/apache/spark/commit/7233a5fd7b154e2a1400c5fac11d0356a22f5f98).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2018-06-21 Thread tdyas

Github user tdyas commented on the issue:

https://github.com/apache/spark/pull/11105
  
I was curious whether there were any active plans to complete this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 423 matches

Mail list logo