date:20161128

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69219/consoleFull)**
 for PR 16013 at commit 
[`29d65cc`](https://github.com/apache/spark/commit/29d65cce3e5f2e29010609c9323cd79ca889b9f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69219/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69229/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69228/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16029
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15983
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15976
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15976
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69224/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16029
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69227/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69222/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69225/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15983
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69223/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15780
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69221/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15780
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR

2016-11-28 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16029
  
This is too trivial to bother with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...

2016-11-28 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/16030

[SPARK-18108][SQL] Fix a bug to fail partition schema inference

## What changes were proposed in this pull request?
This pr is to fix a bug to fail partition schema inference;
```
scala> case class A(a: Long, b: Int)
scala> val as = Seq(A(1, 2))
scala> spark.createDataFrame(as).write.parquet("/data/a=1/")
scala> val df = spark.read.parquet("/data/")
scala> df.printSchema
root
 |-- a: long (nullable = true)
 |-- b: integer (nullable = true)
scala> df.collect
java.lang.NullPointerException
at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:283)
at 
org.apache.spark.sql.execution.vectorized.ColumnarBatch$Row.getLong(ColumnarBatch.java:191)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
```

This was because spark failed to infer the partition column as `LongType` 
and it wrongly regarded the column as `IntegerType` in `DataSource`.
Therefore, the query failed in scanning the column from a parquet file.

## How was this patch tested?
Add tests in `ParquetPartitionDiscoverySuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-18108

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16030


commit 6bd8b4cdb63b20bc292a5ec1d8ca38281ee5bfbf
Author: Takeshi YAMAMURO 
Date:   2016-11-28T07:45:30Z

Fix a bug to fail partition schema inference




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-28 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
This query passed in the released spark-2.0.2, so it seems this regression 
is involved with SPARK-18510.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69231/consoleFull)**
 for PR 15994 at commit 
[`662acfb`](https://github.com/apache/spark/commit/662acfb9ab046842f0fbe2f9344dd3c0df12ad7a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69230/consoleFull)**
 for PR 16030 at commit 
[`6bd8b4c`](https://github.com/apache/spark/commit/6bd8b4cdb63b20bc292a5ec1d8ca38281ee5bfbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16000: [SPARK-18537][Web UI]Add a REST api to spark streaming

2016-11-28 Thread ChorPangChan

Github user ChorPangChan commented on the issue:

https://github.com/apache/spark/pull/16000
  
if there is no other comment,
i believe this PR is ready to go.

@ajbozarth 
please forgive me if its not appropriate to ask
but will you please take a look on the code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89736733
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -153,19 +168,20 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
* (Scala-specific) Returns a new [[DataFrame]] that replaces null or 
NaN values in specified
* numeric columns. If a specified column is not a numeric column, it is 
ignored.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long, cols: Seq[String]): DataFrame = {
+fill1(value, cols)
--- End diff --

nit: put it in one line? i.e. `def fill... = fill1...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15976
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #69233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69233/consoleFull)**
 for PR 14136 at commit 
[`3c699ad`](https://github.com/apache/spark/commit/3c699adfee609781c1e4ce2c08493308f5e7f511).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15780
  
**[Test build #69232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69232/consoleFull)**
 for PR 15780 at commit 
[`2a1287a`](https://github.com/apache/spark/commit/2a1287a84cb303a8df9f8c310aad154e04b6b4d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89736915
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
specified
+   * numeric, string columns. If a specified column is not a numeric, 
string column,
+   * it is ignored.
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+// the fill[T] which T is  Long/Integer/Float/Double,
--- End diff --

why the T can be `Integer` and `Float`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89736984
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
specified
+   * numeric, string columns. If a specified column is not a numeric, 
string column,
+   * it is ignored.
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+// the fill[T] which T is  Long/Integer/Float/Double,
+// should apply on all the NumericType Column, for example:
+// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 
164.3)).toDF("a","b")
+// input.na.fill(3.1)
+// the result is (3,164.3), not (null, 164.3)
--- End diff --

`(3, 164.3)`? shouldn't it be 3.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89737081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
specified
+   * numeric, string columns. If a specified column is not a numeric, 
string column,
+   * it is ignored.
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+// the fill[T] which T is  Long/Integer/Float/Double,
+// should apply on all the NumericType Column, for example:
+// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 
164.3)).toDF("a","b")
+// input.na.fill(3.1)
+// the result is (3,164.3), not (null, 164.3)
+val targetType = value match {
+  case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Long => 
NumericType
--- End diff --

why we match `jd.Double` here intead of scala `Double`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15976
  
**[Test build #69234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69234/consoleFull)**
 for PR 15976 at commit 
[`6db5af9`](https://github.com/apache/spark/commit/6db5af95e456d6529a37c243f41a4632a69f40d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-28 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89737732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -643,8 +645,9 @@ case class ExternalMapToCatalyst private(
 
   override def foldable: Boolean = false
 
-  override def dataType: MapType = MapType(
-keyConverter.dataType, valueConverter.dataType, valueContainsNull = 
valueConverter.nullable)
+  override def dataType: MapType = {
+MapType(keyConverter.dataType, valueConverter.dataType, 
valueConverter.nullable)
+  }
--- End diff --

Looks no difference here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89737784
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   /**
* Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long): DataFrame = fill(value, df.columns)
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
--- End diff --

Could I ask change `[[DataFrame]]` to `` `DataFrame` ``?  It seems the 
`DataFrame` is unrecognisable via unidoc/genjavadoc (see 
https://github.com/apache/spark/pull/16013) which ends up with documentation 
build failure with Java 8.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-28 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89737836
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala
 ---
@@ -74,7 +74,8 @@ case class ReferenceToExpressions(result: Expression, 
children: Seq[Expression])
 ctx.addMutableState("boolean", classChildVarIsNull, "")
 
 val classChildVar =
-  LambdaVariable(classChildVarName, classChildVarIsNull, 
child.dataType)
+  LambdaVariable(classChildVarName, classChildVarIsNull, 
child.dataType,
+childGen.isNull != "false")
--- End diff --

Use `child.nullable` if you want to specify it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-28 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89739080
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -396,12 +396,15 @@ object JavaTypeInference {
 
 case _ if mapType.isAssignableFrom(typeToken) =>
   val (keyType, valueType) = mapKeyValueType(typeToken)
+  val (_, valueNullable) = inferDataType(valueType)
--- End diff --

good catch. done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-28 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89739106
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -643,8 +645,9 @@ case class ExternalMapToCatalyst private(
 
   override def foldable: Boolean = false
 
-  override def dataType: MapType = MapType(
-keyConverter.dataType, valueConverter.dataType, valueContainsNull = 
valueConverter.nullable)
+  override def dataType: MapType = {
+MapType(keyConverter.dataType, valueConverter.dataType, 
valueConverter.nullable)
+  }
--- End diff --

good catch. done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-28 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/14136
  
Currently `ImplicitTypeCasts` doesn't support cast between 
`ArrayType(elementType)`s, so we have to support `ArrayType(NumericType)` for 
now. When we have add that support, we could make  the code for analyze 
`percentageExpression` more concise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-28 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89739433
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala
 ---
@@ -74,7 +74,8 @@ case class ReferenceToExpressions(result: Expression, 
children: Seq[Expression])
 ctx.addMutableState("boolean", classChildVarIsNull, "")
 
 val classChildVar =
-  LambdaVariable(classChildVarName, classChildVarIsNull, 
child.dataType)
+  LambdaVariable(classChildVarName, classChildVarIsNull, 
child.dataType,
+childGen.isNull != "false")
--- End diff --

Thank you very much for your point. done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15780
  
**[Test build #69235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69235/consoleFull)**
 for PR 15780 at commit 
[`b7bf966`](https://github.com/apache/spark/commit/b7bf966a808668c08787c39632fc4634c9a8d3da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...

2016-11-28 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89739977
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala ---
@@ -67,17 +67,19 @@ private[spark] object SamplingUtils {
   }
 
   /**
-   * Returns a sampling rate that guarantees a sample of size >= 
sampleSizeLowerBound 99.99% of
-   * the time.
+   * Returns a sampling rate that guarantees a sample of size greater than 
or equal to
+   * sampleSizeLowerBound 99.99% of the time.
*
* How the sampling rate is determined:
+   *
* Let p = num / total, where num is the sample size and total is the 
total number of
-   * datapoints in the RDD. We're trying to compute q > p such that
+   * datapoints in the RDD. We're trying to compute q {@literal >} p such 
that
*   - when sampling with replacement, we're drawing each datapoint with 
prob_i ~ Pois(q),
-   * where we want to guarantee Pr[s < num] < 0.0001 for s = 
sum(prob_i for i from 0 to total),
-   * i.e. the failure rate of not having a sufficiently large sample < 
0.0001.
+   * where we want to guarantee
+   * Pr[s {@literal <} num] {@literal <} 0.0001 for s = sum(prob_i for 
i from 0 to total),
+   * i.e. the failure rate of not having a sufficiently large sample 
{@literal <} 0.0001.
* Setting q = p + 5 * sqrt(p/total) is sufficient to guarantee 
0. success rate for
-   * num > 12, but we need a slightly larger q (9 empirically 
determined).
+   * num {@literal >} 12, but we need a slightly larger q (9 
empirically determined).
--- End diff --

That's fine, but outside of actual mathematical equations, I think it's 
fine to use prose like "greater than". No big deal either way, up to your taste 
about what to change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...

2016-11-28 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16020#discussion_r89740085
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -334,10 +334,8 @@ class KMeans @Since("1.5.0") (
 val summary = new KMeansSummary(
   model.transform(dataset), $(predictionCol), $(featuresCol), $(k))
 model.setSummary(Some(summary))
+if (handlePersistence) instances.unpersist()
 instr.logSuccess(model)
-if (handlePersistence) {
--- End diff --

prefer to keep this form according to style guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...

2016-11-28 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16020#discussion_r89740051
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -255,10 +256,19 @@ class BisectingKMeans @Since("2.0.0") (
 
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): BisectingKMeansModel = {
+val handlePersistence = dataset.rdd.getStorageLevel == 
StorageLevel.NONE
--- End diff --

By the way, I've been meaning to log a ticket for this issue, but have been 
tied up.

This will actually never work. `dataset.rdd` will always have storage level 
`NONE`. To see this:

```
scala> import org.apache.spark.storage.StorageLevel
import org.apache.spark.storage.StorageLevel

scala> val df = spark.range(10).toDF("num")
df: org.apache.spark.sql.DataFrame = [num: bigint]

scala> df.storageLevel == StorageLevel.NONE
res0: Boolean = true

scala> df.persist
res1: df.type = [num: bigint]

scala> df.storageLevel == StorageLevel.MEMORY_AND_DISK
res2: Boolean = true

scala> df.rdd.getStorageLevel == StorageLevel.MEMORY_AND_DISK
res3: Boolean = false

scala> df.rdd.getStorageLevel == StorageLevel.NONE
res4: Boolean = true
```

So in fact all the algorithms that are checking for storage level using 
`dataset.rdd` are actually double-caching the data if the input DataFrame is 
actually cached, because the RDD will not appear to be cached.

So we should migrate all the checks to use `dataset.storageLevel` which was 
added in https://github.com/apache/spark/pull/13780


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...

2016-11-28 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16020#discussion_r89740159
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -273,6 +283,7 @@ class BisectingKMeans @Since("2.0.0") (
 val summary = new BisectingKMeansSummary(
   model.transform(dataset), $(predictionCol), $(featuresCol), $(k))
 model.setSummary(Some(summary))
+if (handlePersistence) rdd.unpersist()
--- End diff --

Prefer 

```
if (handlePersistence) {
  rdd.unpersist()
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89740299
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
specified
+   * numeric, string columns. If a specified column is not a numeric, 
string column,
+   * it is ignored.
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+// the fill[T] which T is  Long/Integer/Float/Double,
--- End diff --

remove them is ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89740897
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   /**
* Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long): DataFrame = fill(value, df.columns)
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
--- End diff --

change them in #16013 is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69236/consoleFull)**
 for PR 15994 at commit 
[`d1ba27f`](https://github.com/apache/spark/commit/d1ba27f96dba9f69b3c92a0f15fa5b3ada50dfaf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89741302
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   /**
* Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long): DataFrame = fill(value, df.columns)
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
--- End diff --

What if that one is merged first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...

2016-11-28 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89741513
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
 ---
@@ -36,14 +36,14 @@ import org.apache.spark.mllib.tree.loss.{LogLoss, Loss, 
SquaredError}
  * @param validationTol validationTol is a condition which decides 
iteration termination when
  *  runWithValidation is used.
  *  The end of iteration is decided based on below 
logic:
- *  If the current loss on the validation set is > 
0.01, the diff
+ *  If the current loss on the validation set is 
greater than 0.01, the diff
  *  of validation error is compared to relative 
tolerance which is
  *  validationTol * (current loss on the validation 
set).
- *  If the current loss on the validation set is <= 
0.01, the diff
- *  of validation error is compared to absolute 
tolerance which is
+ *  If the current loss on the validation set is less 
than or euqal to 0.01,
--- End diff --

typo: euqal -> equal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...

2016-11-28 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89741431
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala 
---
@@ -42,8 +42,8 @@ class SlidingRDDPartition[T](val idx: Int, val prev: 
Partition, val tail: Seq[T]
  * @param windowSize the window size, must be greater than 1
  * @param step step size for windows
  *
- * @see [[org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*]]
- * @see [[scala.collection.IterableLike.sliding(Int, Int)*]]
+ * @see `org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*`
--- End diff --

Is the trailing * intentional or a typo? no big deal anyway


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69237/consoleFull)**
 for PR 15994 at commit 
[`d7dc343`](https://github.com/apache/spark/commit/d7dc34341e8d17e892e500f7445a887b59a5f841).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...

2016-11-28 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89740182
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2063,6 +2063,7 @@ class SparkContext(config: SparkConf) extends Logging 
{
* @param jobId the job ID to cancel
* @throws InterruptedException if the cancel message cannot be sent
*/
+  @throws(classOf[InterruptedException])
--- End diff --

I think these need to be reverted too; we don't want to introduce checked 
exceptions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89742186
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   /**
* Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long): DataFrame = fill(value, df.columns)
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89742322
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
specified
+   * numeric, string columns. If a specified column is not a numeric, 
string column,
+   * it is ignored.
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+// the fill[T] which T is  Long/Integer/Float/Double,
+// should apply on all the NumericType Column, for example:
+// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 
164.3)).toDF("a","b")
+// input.na.fill(3.1)
+// the result is (3,164.3), not (null, 164.3)
+val targetType = value match {
+  case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Long => 
NumericType
--- End diff --

fixed it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69238/consoleFull)**
 for PR 15994 at commit 
[`36bff41`](https://github.com/apache/spark/commit/36bff418825a8ac98a266549b1f11d9ce87ddd15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15961: [SPARK-18523][PySpark]Make SparkContext.stop more reliab...

2016-11-28 Thread kxepal

Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/15961
  
@holdenk 
Agree with you here. The message is fixed, PR rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...

2016-11-28 Thread WangTaoTheTonic

GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/spark/pull/16031

[SPARK-18606][HISTORYSERVER]remove useless elements while searching

## What changes were proposed in this pull request?

When we search applications in HistoryServer, it will include all contents 
between  tag, which including useless elemtns like "https://cloud.githubusercontent.com/assets/5276001/20662840/28bcc874-b590-11e6-9115-12fb64e49898.jpg)

After:

![after](https://cloud.githubusercontent.com/assets/5276001/20662844/2f717af2-b590-11e6-97dc-a48b08a54247.jpg)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/spark span

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16031


commit 37aa3a2d2fddfa46fb4c5427cebed5683530153d
Author: WangTaoTheTonic 
Date:   2016-11-28T08:37:13Z

remove useless elements while searching




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16031: [SPARK-18606][HISTORYSERVER]remove useless elements whil...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16031
  
**[Test build #69239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69239/consoleFull)**
 for PR 16031 at commit 
[`37aa3a2`](https://github.com/apache/spark/commit/37aa3a2d2fddfa46fb4c5427cebed5683530153d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-28 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89744772
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ---
@@ -86,7 +86,7 @@ class FileStreamSinkSuite extends StreamTest {
 
   val outputDf = spark.read.parquet(outputDir)
   val expectedSchema = new StructType()
-.add(StructField("value", IntegerType))
+.add(StructField("value", IntegerType, nullable = false))
 .add(StructField("id", IntegerType))
--- End diff --

BTW, do you know why `id` is not `nullable == false`?
Looks both `value` and `id` are `nullable == false`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16032: [SPARK-18118][SQL] fix a compilation error due to...

2016-11-28 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/16032

[SPARK-18118][SQL] fix a compilation error due to nested JavaBeans

## What changes were proposed in this pull request?

This PR avoids a compilation error due to more than 64KB Java byte code 
size. This error occur since generated java code 
`SpecificSafeProjection.apply()` for nested JavaBeans is too big. This PR 
avoids this compilation error by splitting a big code chunk into multiple 
methods by calling `CodegenContext.splitExpression` at 
`InitializeJavaBean.doGenCode`
An object reference for JavaBean is stored to an instance variable 
`javaBean...`. Then, the instance variable will be referenced in the split 
methods.

Generated code with this PR

/* 22098 */   private void apply130_0(InternalRow i) {
...
/* 22125 */ boolean isNull238 = i.isNullAt(2);
/* 22126 */ InternalRow value238 = isNull238 ? null : (i.getStruct(2, 
3));
/* 22127 */ boolean isNull236 = false;
/* 22128 */ test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 
value236 = null;
/* 22129 */ if (!false && isNull238) {
/* 22130 */
/* 22131 */   final test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 
value239 = null;
/* 22132 */   isNull236 = true;
/* 22133 */   value236 = value239;
/* 22134 */ } else {
/* 22135 */
/* 22136 */   final test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 
value241 = false ? null : new 
test.org.apache.spark.sql.JavaDatasetSuite$Nesting1();
/* 22137 */   this.javaBean14 = value241;
/* 22138 */   if (!false) {
/* 22139 */ apply25_0(i);
/* 22140 */ apply25_1(i);
/* 22141 */ apply25_2(i);
/* 22142 */   }
/* 22143 */   isNull236 = false;
/* 22144 */   value236 = value241;
/* 22145 */ }
/* 22146 */ this.javaBean.setField2(value236);
/* 22147 */
/* 22148 */   }
...
/* 22928 */   public java.lang.Object apply(java.lang.Object _i) {
/* 22929 */ InternalRow i = (InternalRow) _i;
/* 22930 */
/* 22931 */ final 
test.org.apache.spark.sql.JavaDatasetSuite$NestedComplicatedJavaBean value1 = 
false ? null : new 
test.org.apache.spark.sql.JavaDatasetSuite$NestedComplicatedJavaBean();
/* 22932 */ this.javaBean = value1;
/* 22933 */ if (!false) {
/* 22934 */   apply130_0(i);
/* 22935 */   apply130_1(i);
/* 22936 */   apply130_2(i);
/* 22937 */   apply130_3(i);
/* 22938 */   apply130_4(i);
/* 22939 */ }
/* 22940 */ if (false) {
/* 22941 */   mutableRow.setNullAt(0);
/* 22942 */ } else {
/* 22943 */
/* 22944 */   mutableRow.update(0, value1);
/* 22945 */ }
/* 22946 */
/* 22947 */ return mutableRow;
/* 22948 */   }



## How was this patch tested?

added a test suite into `JavaDatasetSuite.java`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-18118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16032






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...

2016-11-28 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89744913
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala 
---
@@ -42,8 +42,8 @@ class SlidingRDDPartition[T](val idx: Int, val prev: 
Partition, val tail: Seq[T]
  * @param windowSize the window size, must be greater than 1
  * @param step step size for windows
  *
- * @see [[org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*]]
- * @see [[scala.collection.IterableLike.sliding(Int, Int)*]]
+ * @see `org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*`
--- End diff --

Let me please leave as they are. I am worried of getting blamed in the 
future. I will keep in mind that I should leave a comment for this when someone 
tries to change something around this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16032: [SPARK-18118][SQL] fix a compilation error due to nested...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16032
  
**[Test build #69240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69240/consoleFull)**
 for PR 16032 at commit 
[`5debc84`](https://github.com/apache/spark/commit/5debc847bba6dd2824d856e6325e0647df525870).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...

2016-11-28 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89745428
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala 
---
@@ -42,8 +42,8 @@ class SlidingRDDPartition[T](val idx: Int, val prev: 
Partition, val tail: Seq[T]
  * @param windowSize the window size, must be greater than 1
  * @param step step size for windows
  *
- * @see [[org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*]]
- * @see [[scala.collection.IterableLike.sliding(Int, Int)*]]
+ * @see `org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*`
--- End diff --

Heh, OK. I don't think it has any meaning in a hyperlink or javadoc syntax, 
and this isn't either one anyway, but it's OK to leave it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15992: [SPARK-18560][CORE][STREAMING] Receiver data can not be ...

2016-11-28 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15992
  
I am not familiar enough with this code to review it. I do think @JoshRosen 
is the right person given https://issues.apache.org/jira/browse/SPARK-13990 and 
believe he's said he will start reviewing again this week after the holiday


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69241/consoleFull)**
 for PR 16013 at commit 
[`7d44dc5`](https://github.com/apache/spark/commit/7d44dc5ee69a75aa58132bac65de2f46a21845ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-28 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89747135
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   /**
* Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long): DataFrame = fill(value, df.columns)
+
+  /**
+   * Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
--- End diff --

@HyukjinKwon yeah the bad news is that I'm sure the javadoc generation is 
going to re-break periodically. we can try to catch it with reviews and your 
work at least gets it to a working state. But we'll clean it up again before 
releases regularly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89424696
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala ---
@@ -96,28 +96,58 @@ class WatermarkSuite extends StreamTest with 
BeforeAndAfter with Logging {
 )
   }
 
-  ignore("recovery") {
-val inputData = MemoryStream[Int]
-
-val windowedAggregation = inputData.toDF()
+  test("recovery") {
+val ms = new MemoryStream[Int](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val tableName = "recovery"
+def startQuery: StreamingQuery = {
+  ms.toDF()
 .withColumn("eventTime", $"value".cast("timestamp"))
 .withWatermark("eventTime", "10 seconds")
 .groupBy(window($"eventTime", "5 seconds") as 'window)
 .agg(count("*") as 'count)
 .select($"window".getField("start").cast("long").as[Long], 
$"count".as[Long])
+.writeStream
+.format("memory")
+.queryName(tableName)
+.outputMode("append")
+.start()
+}
 
-testStream(windowedAggregation)(
-  AddData(inputData, 10, 11, 12, 13, 14, 15),
-  CheckAnswer(),
-  AddData(inputData, 25), // Advance watermark to 15 seconds
-  StopStream,
-  StartStream(),
-  CheckAnswer(),
-  AddData(inputData, 25), // Evict items less than previous watermark.
-  StopStream,
-  StartStream(),
-  CheckAnswer((10, 5))
+var q = startQuery
+ms.addData(10, 11, 12, 13, 14, 15)
+q.processAllAvailable()
+
+checkAnswer(
+  spark.table(tableName), Seq()
+)
+
+// Advance watermark to 15 seconds,
+// but do not process batch
+ms.addData(25)
+q.stop()
--- End diff --

why dont you want to process the batch? 
let it process the batch, check whether the results are correct (i.e. 
things were evicted, and then stop. 
drop the table, and restart, processAllAvailable, and restart whether the 
same result is recreated.
this will then actually verify that watermark is recovered, and used to 
evict the records again.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89746924
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
StreamExecutionMetadata}
+import org.apache.spark.sql.functions._
+import org.apache.spark.util.{SystemClock, Utils}
+
+class StreamExecutionMetadataSuite extends StreamTest {
+
+  private def newMetadataDir =
+Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
+
+  test("stream execution metadata") {
+assert(StreamExecutionMetadata(0, 0) ===
+  StreamExecutionMetadata("""{}"""))
+assert(StreamExecutionMetadata(1, 0) ===
+  StreamExecutionMetadata("""{"batchWatermarkMs":1}"""))
+assert(StreamExecutionMetadata(0, 2) ===
+  StreamExecutionMetadata("""{"batchTimestampMs":2}"""))
+assert(StreamExecutionMetadata(1, 2) ===
+  StreamExecutionMetadata(
+"""{"batchWatermarkMs":1,"batchTimestampMs":2}"""))
+  }
+
+  test("metadata is recovered from log when query is restarted") {
+import testImplicits._
+val clock = new SystemClock()
+val ms = new MemoryStream[Long](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val checkpointLoc = newMetadataDir
+val checkpointDir = new File(checkpointLoc, "complete")
+checkpointDir.mkdirs()
+assert(checkpointDir.exists())
+val tableName = "test"
+// Query that prunes timestamps less than current_timestamp, making
+// it easy to use for ensuring that a batch is re-processed with the
+// timestamp used when it was first processed.
+def startQuery: StreamingQuery = {
+  df.groupBy("a")
+.count()
+.where('a >= current_timestamp().cast("long"))
+.writeStream
+.format("memory")
+.queryName(tableName)
+.option("checkpointLocation", checkpointLoc)
+.outputMode("complete")
+.start()
+}
+// no exception here
--- End diff --

what does this comment mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89747718
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
StreamExecutionMetadata}
+import org.apache.spark.sql.functions._
+import org.apache.spark.util.{SystemClock, Utils}
+
+class StreamExecutionMetadataSuite extends StreamTest {
+
+  private def newMetadataDir =
+Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
+
+  test("stream execution metadata") {
+assert(StreamExecutionMetadata(0, 0) ===
+  StreamExecutionMetadata("""{}"""))
+assert(StreamExecutionMetadata(1, 0) ===
+  StreamExecutionMetadata("""{"batchWatermarkMs":1}"""))
+assert(StreamExecutionMetadata(0, 2) ===
+  StreamExecutionMetadata("""{"batchTimestampMs":2}"""))
+assert(StreamExecutionMetadata(1, 2) ===
+  StreamExecutionMetadata(
+"""{"batchWatermarkMs":1,"batchTimestampMs":2}"""))
+  }
+
+  test("metadata is recovered from log when query is restarted") {
+import testImplicits._
+val clock = new SystemClock()
+val ms = new MemoryStream[Long](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val checkpointLoc = newMetadataDir
+val checkpointDir = new File(checkpointLoc, "complete")
+checkpointDir.mkdirs()
+assert(checkpointDir.exists())
+val tableName = "test"
+// Query that prunes timestamps less than current_timestamp, making
+// it easy to use for ensuring that a batch is re-processed with the
+// timestamp used when it was first processed.
+def startQuery: StreamingQuery = {
+  df.groupBy("a")
+.count()
+.where('a >= current_timestamp().cast("long"))
+.writeStream
+.format("memory")
+.queryName(tableName)
+.option("checkpointLocation", checkpointLoc)
+.outputMode("complete")
+.start()
+}
+// no exception here
+val t1 = clock.getTimeMillis() + 60L * 1000L
+val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L
+val q = startQuery
+ms.addData(t1, t2)
+q.processAllAvailable()
+
+checkAnswer(
+  spark.table(tableName),
+  Seq(Row(t1, 1), Row(t2, 1))
+)
+
+q.stop()
+Thread.sleep(60L * 1000L + 5000L) // Expire t1 and t2
--- End diff --

This test will now takes 60 seconds! I think I didnt quite understand the 
test earlier, but now I do. I think the earlier 5 second was closer to being 
fine. Okay, lets just use 10 seconds. And instead of sleep, use `eventually` to 
check the conditions `t2 < clock.getTimeMillis()`. this would make the system 
sleep no more than that is necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89748128
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
StreamExecutionMetadata}
+import org.apache.spark.sql.functions._
+import org.apache.spark.util.{SystemClock, Utils}
+
+class StreamExecutionMetadataSuite extends StreamTest {
+
+  private def newMetadataDir =
+Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
+
+  test("stream execution metadata") {
+assert(StreamExecutionMetadata(0, 0) ===
+  StreamExecutionMetadata("""{}"""))
+assert(StreamExecutionMetadata(1, 0) ===
+  StreamExecutionMetadata("""{"batchWatermarkMs":1}"""))
+assert(StreamExecutionMetadata(0, 2) ===
+  StreamExecutionMetadata("""{"batchTimestampMs":2}"""))
+assert(StreamExecutionMetadata(1, 2) ===
+  StreamExecutionMetadata(
+"""{"batchWatermarkMs":1,"batchTimestampMs":2}"""))
+  }
+
+  test("metadata is recovered from log when query is restarted") {
+import testImplicits._
+val clock = new SystemClock()
+val ms = new MemoryStream[Long](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val checkpointLoc = newMetadataDir
+val checkpointDir = new File(checkpointLoc, "complete")
+checkpointDir.mkdirs()
+assert(checkpointDir.exists())
+val tableName = "test"
+// Query that prunes timestamps less than current_timestamp, making
+// it easy to use for ensuring that a batch is re-processed with the
+// timestamp used when it was first processed.
+def startQuery: StreamingQuery = {
+  df.groupBy("a")
+.count()
+.where('a >= current_timestamp().cast("long"))
+.writeStream
+.format("memory")
+.queryName(tableName)
+.option("checkpointLocation", checkpointLoc)
+.outputMode("complete")
+.start()
+}
+// no exception here
+val t1 = clock.getTimeMillis() + 60L * 1000L
+val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L
+val q = startQuery
+ms.addData(t1, t2)
+q.processAllAvailable()
+
+checkAnswer(
+  spark.table(tableName),
+  Seq(Row(t1, 1), Row(t2, 1))
+)
+
+q.stop()
+Thread.sleep(60L * 1000L + 5000L) // Expire t1 and t2
+assert(t1 < clock.getTimeMillis())
+assert(t2 < clock.getTimeMillis())
+
+spark.sql(s"drop table $tableName")
+
+// verify table is dropped
+intercept[AnalysisException](spark.table(tableName).collect())
--- End diff --

i think you can use `spark.catalog,tableExists`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89747045
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
StreamExecutionMetadata}
+import org.apache.spark.sql.functions._
+import org.apache.spark.util.{SystemClock, Utils}
+
+class StreamExecutionMetadataSuite extends StreamTest {
+
+  private def newMetadataDir =
+Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
+
+  test("stream execution metadata") {
+assert(StreamExecutionMetadata(0, 0) ===
+  StreamExecutionMetadata("""{}"""))
+assert(StreamExecutionMetadata(1, 0) ===
+  StreamExecutionMetadata("""{"batchWatermarkMs":1}"""))
+assert(StreamExecutionMetadata(0, 2) ===
+  StreamExecutionMetadata("""{"batchTimestampMs":2}"""))
+assert(StreamExecutionMetadata(1, 2) ===
+  StreamExecutionMetadata(
+"""{"batchWatermarkMs":1,"batchTimestampMs":2}"""))
+  }
+
+  test("metadata is recovered from log when query is restarted") {
+import testImplicits._
+val clock = new SystemClock()
+val ms = new MemoryStream[Long](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val checkpointLoc = newMetadataDir
+val checkpointDir = new File(checkpointLoc, "complete")
+checkpointDir.mkdirs()
+assert(checkpointDir.exists())
+val tableName = "test"
+// Query that prunes timestamps less than current_timestamp, making
+// it easy to use for ensuring that a batch is re-processed with the
+// timestamp used when it was first processed.
+def startQuery: StreamingQuery = {
+  df.groupBy("a")
+.count()
+.where('a >= current_timestamp().cast("long"))
+.writeStream
+.format("memory")
+.queryName(tableName)
+.option("checkpointLocation", checkpointLoc)
+.outputMode("complete")
+.start()
+}
+// no exception here
+val t1 = clock.getTimeMillis() + 60L * 1000L
+val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L
--- End diff --

add a comment explaining how the test works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89745095
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ---
@@ -235,4 +239,85 @@ class StreamingAggregationSuite extends StreamTest 
with BeforeAndAfterAll {
   CheckLastBatch(("a", 30), ("b", 3), ("c", 1))
 )
   }
+
+  test("prune results by current_time, complete mode") {
+import testImplicits._
+import StreamingAggregationSuite._
+clock = new StreamManualClock
+
+val inputData = MemoryStream[Long]
+
+val aggregated =
+  inputData.toDF()
+.groupBy($"value")
+.agg(count("*"))
+.where('value >= current_timestamp().cast("long") - 10L)
+
+testStream(aggregated, Complete)(
+  StartStream(ProcessingTime("10 seconds"), triggerClock = clock),
+
+  // advance clock to 10 seconds
+  AddData(inputData, 0L, 5L, 5L, 10L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((0L, 1), (5L, 2), (10L, 1)),
+
+  // advance clock to 20 seconds, should retain keys >= 10
+  AddData(inputData, 15L, 15L, 20L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((10L, 1), (15L, 2), (20L, 1)),
+
+  // advance clock to 30 seconds, should retain keys >= 20
+  AddData(inputData, 0L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((20L, 1)),
+
+  // advance clock to 40 seconds, should retain keys >= 30
+  AddData(inputData, 25L, 30L, 40L, 45L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((30L, 1), (40L, 1), (45L, 1))
+)
+  }
+
--- End diff --

nit: extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89744923
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala ---
@@ -96,27 +96,42 @@ class WatermarkSuite extends StreamTest with 
BeforeAndAfter with Logging {
 )
   }
 
-  ignore("recovery") {
+  test("recovery") {
 val inputData = MemoryStream[Int]
-
-val windowedAggregation = inputData.toDF()
-.withColumn("eventTime", $"value".cast("timestamp"))
-.withWatermark("eventTime", "10 seconds")
-.groupBy(window($"eventTime", "5 seconds") as 'window)
-.agg(count("*") as 'count)
-.select($"window".getField("start").cast("long").as[Long], 
$"count".as[Long])
-
-testStream(windowedAggregation)(
+val df = inputData.toDF()
+  .withColumn("eventTime", $"value".cast("timestamp"))
+  .withWatermark("eventTime", "10 seconds")
+  .groupBy(window($"eventTime", "5 seconds") as 'window)
+  .agg(count("*") as 'count)
+  .select($"window".getField("start").cast("long").as[Long], 
$"count".as[Long])
+val outputMode = OutputMode.Append
+val memorySink = new MemorySink(df.schema, outputMode)
+testStream(df)(
   AddData(inputData, 10, 11, 12, 13, 14, 15),
   CheckAnswer(),
   AddData(inputData, 25), // Advance watermark to 15 seconds
   StopStream,
   StartStream(),
-  CheckAnswer(),
+  CheckLastBatch(),
   AddData(inputData, 25), // Evict items less than previous watermark.
+  CheckLastBatch((10, 5)),
   StopStream,
+  AssertOnQuery { q => // clear the sink
+q.sink.asInstanceOf[MemorySink].clear()
+true
+  },
   StartStream(),
-  CheckAnswer((10, 5))
+  CheckLastBatch((10, 5)),
--- End diff --

nit: add comment to explain  // Should recompute the last batch and 
re-evict timestamp 10


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89739836
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ---
@@ -235,4 +239,85 @@ class StreamingAggregationSuite extends StreamTest 
with BeforeAndAfterAll {
   CheckLastBatch(("a", 30), ("b", 3), ("c", 1))
 )
   }
+
+  test("prune results by current_time, complete mode") {
+import testImplicits._
+import StreamingAggregationSuite._
+clock = new StreamManualClock
+
+val inputData = MemoryStream[Long]
+
+val aggregated =
+  inputData.toDF()
+.groupBy($"value")
+.agg(count("*"))
+.where('value >= current_timestamp().cast("long") - 10L)
+
+testStream(aggregated, Complete)(
+  StartStream(ProcessingTime("10 seconds"), triggerClock = clock),
+
+  // advance clock to 10 seconds
+  AddData(inputData, 0L, 5L, 5L, 10L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((0L, 1), (5L, 2), (10L, 1)),
+
+  // advance clock to 20 seconds, should retain keys >= 10
+  AddData(inputData, 15L, 15L, 20L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((10L, 1), (15L, 2), (20L, 1)),
+
+  // advance clock to 30 seconds, should retain keys >= 20
+  AddData(inputData, 0L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((20L, 1)),
+
+  // advance clock to 40 seconds, should retain keys >= 30
+  AddData(inputData, 25L, 30L, 40L, 45L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((30L, 1), (40L, 1), (45L, 1))
+)
+  }
+
+
+  test("prune results by current_date, complete mode") {
+import testImplicits._
+import StreamingAggregationSuite._
+clock = new StreamManualClock
+val tz = TimeZone.getDefault.getID
+val inputData = MemoryStream[Long]
+val aggregated =
+  inputData.toDF()
+.select(to_utc_timestamp(from_unixtime('value * 
DateTimeUtils.SECONDS_PER_DAY), tz))
+.toDF("value")
+.groupBy($"value")
+.agg(count("*"))
+// .select('value, date_sub(current_date(), 
10).cast("timestamp").alias("t"))
+// .select('value, 't, 'value >= 't)
+.where($"value".cast("date") >= date_sub(current_date(), 10))
+.select(($"value".cast("long") / 
DateTimeUtils.SECONDS_PER_DAY).cast("long"), $"count(1)")
+testStream(aggregated, Complete)(
+  StartStream(ProcessingTime("10 day"), triggerClock = clock),
+  // advance clock to 10 days, should retain all keys
+  AddData(inputData, 0L, 5L, 5L, 10L),
+  AdvanceManualClock(DateTimeUtils.MILLIS_PER_DAY * 10),
+  CheckLastBatch((0L, 1), (5L, 2), (10L, 1)),
+  // advance clock to 20 days, should retain keys >= 10
+  AddData(inputData, 15L, 15L, 20L),
+  AdvanceManualClock(DateTimeUtils.MILLIS_PER_DAY * 10),
+  CheckLastBatch((10L, 1), (15L, 2), (20L, 1)),
+  // advance clock to 30 days, should retain keys >= 20
+  AddData(inputData, 0L),
+  AdvanceManualClock(DateTimeUtils.MILLIS_PER_DAY * 10),
+  CheckLastBatch((20L, 1)),
+  // advance clock to 40 seconds, should retain keys >= 30
--- End diff --

40 seconds -> days


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89739983
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala ---
@@ -96,27 +96,42 @@ class WatermarkSuite extends StreamTest with 
BeforeAndAfter with Logging {
 )
   }
 
-  ignore("recovery") {
+  test("recovery") {
 val inputData = MemoryStream[Int]
-
-val windowedAggregation = inputData.toDF()
-.withColumn("eventTime", $"value".cast("timestamp"))
-.withWatermark("eventTime", "10 seconds")
-.groupBy(window($"eventTime", "5 seconds") as 'window)
-.agg(count("*") as 'count)
-.select($"window".getField("start").cast("long").as[Long], 
$"count".as[Long])
-
-testStream(windowedAggregation)(
+val df = inputData.toDF()
+  .withColumn("eventTime", $"value".cast("timestamp"))
+  .withWatermark("eventTime", "10 seconds")
+  .groupBy(window($"eventTime", "5 seconds") as 'window)
+  .agg(count("*") as 'count)
+  .select($"window".getField("start").cast("long").as[Long], 
$"count".as[Long])
+val outputMode = OutputMode.Append
+val memorySink = new MemorySink(df.schema, outputMode)
+testStream(df)(
   AddData(inputData, 10, 11, 12, 13, 14, 15),
   CheckAnswer(),
--- End diff --

nit: Make this CheckAnswer -> CheckLastBatch for being consistent with rest 
of the checks in this test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89746706
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
StreamExecutionMetadata}
+import org.apache.spark.sql.functions._
+import org.apache.spark.util.{SystemClock, Utils}
+
+class StreamExecutionMetadataSuite extends StreamTest {
+
+  private def newMetadataDir =
+Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
+
+  test("stream execution metadata") {
+assert(StreamExecutionMetadata(0, 0) ===
+  StreamExecutionMetadata("""{}"""))
+assert(StreamExecutionMetadata(1, 0) ===
+  StreamExecutionMetadata("""{"batchWatermarkMs":1}"""))
+assert(StreamExecutionMetadata(0, 2) ===
+  StreamExecutionMetadata("""{"batchTimestampMs":2}"""))
+assert(StreamExecutionMetadata(1, 2) ===
+  StreamExecutionMetadata(
+"""{"batchWatermarkMs":1,"batchTimestampMs":2}"""))
+  }
+
+  test("metadata is recovered from log when query is restarted") {
+import testImplicits._
+val clock = new SystemClock()
+val ms = new MemoryStream[Long](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val checkpointLoc = newMetadataDir
+val checkpointDir = new File(checkpointLoc, "complete")
+checkpointDir.mkdirs()
+assert(checkpointDir.exists())
+val tableName = "test"
+// Query that prunes timestamps less than current_timestamp, making
+// it easy to use for ensuring that a batch is re-processed with the
+// timestamp used when it was first processed.
+def startQuery: StreamingQuery = {
--- End diff --

nit: functions that have sideeffects (like starting a thread), usually have 
`()` at the end, and is used with the `()`. for example, `val q = startQuery()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89739781
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ---
@@ -235,4 +239,85 @@ class StreamingAggregationSuite extends StreamTest 
with BeforeAndAfterAll {
   CheckLastBatch(("a", 30), ("b", 3), ("c", 1))
 )
   }
+
+  test("prune results by current_time, complete mode") {
+import testImplicits._
+import StreamingAggregationSuite._
+clock = new StreamManualClock
+
+val inputData = MemoryStream[Long]
+
+val aggregated =
+  inputData.toDF()
+.groupBy($"value")
+.agg(count("*"))
+.where('value >= current_timestamp().cast("long") - 10L)
+
+testStream(aggregated, Complete)(
+  StartStream(ProcessingTime("10 seconds"), triggerClock = clock),
+
+  // advance clock to 10 seconds
+  AddData(inputData, 0L, 5L, 5L, 10L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((0L, 1), (5L, 2), (10L, 1)),
+
+  // advance clock to 20 seconds, should retain keys >= 10
+  AddData(inputData, 15L, 15L, 20L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((10L, 1), (15L, 2), (20L, 1)),
+
+  // advance clock to 30 seconds, should retain keys >= 20
+  AddData(inputData, 0L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((20L, 1)),
+
+  // advance clock to 40 seconds, should retain keys >= 30
+  AddData(inputData, 25L, 30L, 40L, 45L),
+  AdvanceManualClock(10 * 1000),
+  CheckLastBatch((30L, 1), (40L, 1), (45L, 1))
+)
+  }
+
+
+  test("prune results by current_date, complete mode") {
+import testImplicits._
+import StreamingAggregationSuite._
+clock = new StreamManualClock
+val tz = TimeZone.getDefault.getID
+val inputData = MemoryStream[Long]
+val aggregated =
+  inputData.toDF()
+.select(to_utc_timestamp(from_unixtime('value * 
DateTimeUtils.SECONDS_PER_DAY), tz))
+.toDF("value")
+.groupBy($"value")
+.agg(count("*"))
+// .select('value, date_sub(current_date(), 
10).cast("timestamp").alias("t"))
+// .select('value, 't, 'value >= 't)
--- End diff --

please remove these lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...

2016-11-28 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15949#discussion_r89745152
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
StreamExecutionMetadata}
+import org.apache.spark.sql.functions._
+import org.apache.spark.util.{SystemClock, Utils}
+
+class StreamExecutionMetadataSuite extends StreamTest {
+
+  private def newMetadataDir =
+Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
+
+  test("stream execution metadata") {
+assert(StreamExecutionMetadata(0, 0) ===
+  StreamExecutionMetadata("""{}"""))
+assert(StreamExecutionMetadata(1, 0) ===
+  StreamExecutionMetadata("""{"batchWatermarkMs":1}"""))
+assert(StreamExecutionMetadata(0, 2) ===
+  StreamExecutionMetadata("""{"batchTimestampMs":2}"""))
+assert(StreamExecutionMetadata(1, 2) ===
+  StreamExecutionMetadata(
+"""{"batchWatermarkMs":1,"batchTimestampMs":2}"""))
+  }
+
+  test("metadata is recovered from log when query is restarted") {
+import testImplicits._
+val clock = new SystemClock()
+val ms = new MemoryStream[Long](0, sqlContext)
+val df = ms.toDF().toDF("a")
+val checkpointLoc = newMetadataDir
+val checkpointDir = new File(checkpointLoc, "complete")
+checkpointDir.mkdirs()
+assert(checkpointDir.exists())
+val tableName = "test"
+// Query that prunes timestamps less than current_timestamp, making
+// it easy to use for ensuring that a batch is re-processed with the
+// timestamp used when it was first processed.
+def startQuery: StreamingQuery = {
+  df.groupBy("a")
+.count()
+.where('a >= current_timestamp().cast("long"))
+.writeStream
+.format("memory")
+.queryName(tableName)
+.option("checkpointLocation", checkpointLoc)
+.outputMode("complete")
+.start()
+}
+// no exception here
+val t1 = clock.getTimeMillis() + 60L * 1000L
+val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L
+val q = startQuery
+ms.addData(t1, t2)
+q.processAllAvailable()
+
+checkAnswer(
+  spark.table(tableName),
+  Seq(Row(t1, 1), Row(t2, 1))
+)
+
+q.stop()
+Thread.sleep(60L * 1000L + 5000L) // Expire t1 and t2
+assert(t1 < clock.getTimeMillis())
+assert(t2 < clock.getTimeMillis())
+
+spark.sql(s"drop table $tableName")
+
+// verify table is dropped
+intercept[AnalysisException](spark.table(tableName).collect())
+val q2 = startQuery
+q2.processAllAvailable()
+checkAnswer(
+  spark.table(tableName),
+  Seq(Row(t1, 1), Row(t2, 1))
+)
+
+q2.stop()
+
--- End diff --

nit: extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69230/consoleFull)**
 for PR 16030 at commit 
[`6bd8b4c`](https://github.com/apache/spark/commit/6bd8b4cdb63b20bc292a5ec1d8ca38281ee5bfbf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69230/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-28 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
@brkyvz @tdas Could you check this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69231/consoleFull)**
 for PR 15994 at commit 
[`662acfb`](https://github.com/apache/spark/commit/662acfb9ab046842f0fbe2f9344dd3c0df12ad7a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69231/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16027: [SPARK-18604][SQL] Make sure CollapseWindow returns the ...

2016-11-28 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16027
  
Merging to master/2.1. Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16027: [SPARK-18604][SQL] Make sure CollapseWindow retur...

2016-11-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16027


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...

2016-11-28 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/spark/pull/16031#discussion_r89758734
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js 
---
@@ -78,6 +78,12 @@ jQuery.extend( jQuery.fn.dataTableExt.oSort, {
 }
 } );
 
+jQuery.extend( jQuery.fn.dataTableExt.ofnSearch, {
+"appid-numeric": function ( a ) {
+return a.replace(/[\r\n]/g, " ").replace(/<.*?>/g, "");
--- End diff --

Refer to `jquery.dataTables.1.10.4.min.js`. I'd like to change it to better 
style if there's any :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15976
  
**[Test build #69234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69234/consoleFull)**
 for PR 15976 at commit 
[`6db5af9`](https://github.com/apache/spark/commit/6db5af95e456d6529a37c243f41a4632a69f40d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15976
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15976
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69234/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15780
  
**[Test build #69232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69232/consoleFull)**
 for PR 15780 at commit 
[`2a1287a`](https://github.com/apache/spark/commit/2a1287a84cb303a8df9f8c310aad154e04b6b4d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LambdaVariable(value: String, isNull: String, dataType: 
DataType,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15780
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69232/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #69233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69233/consoleFull)**
 for PR 14136 at commit 
[`3c699ad`](https://github.com/apache/spark/commit/3c699adfee609781c1e4ce2c08493308f5e7f511).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69233/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16033: SPARK-18607 get a result on a percent of the task...

2016-11-28 Thread Ru-Xiang

GitHub user Ru-Xiang opened a pull request:

https://github.com/apache/spark/pull/16033

SPARK-18607 get a result on a percent of the tasks succeed

## What changes were proposed in this pull request?

In this patch, we modify the codes corresponding to runApproximateJob so 
that we can get a result when the specified percent of tasks succeed.
In a production environment, 'long tail' is a common urgent problem. In 
practice, as long as we can get a specified percent of tasks' results, we can 
guarantee the final results. And this is a common requirement in the practice 
of machine learning algorithms.
## How was this patch tested?

We compile the codes by dev/make-distribution.sh, and deploy it on a 
cluster. and run a test codes reduce on the cluster, and we get the desired 
results.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ru-Xiang/spark my_change

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16033.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16033






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16033: SPARK-18607 get a result on a percent of the tasks succe...

2016-11-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16033
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 676 matches

Mail list logo