date:20161127

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-27 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16011
  
As far as I recall, the idea is that the `Bucketizer` can be used 
standalone, and because the `QuantileDiscretizer` itself produced the same 
thing as a bucketizer, it was used as the model rather than having a dedicated 
`QuantileDiscretizerModel`.

`Bucketizer` is already a separate transformer (it is not required to be 
produced by a `QuantileDiscretizer`), since it's a `Model` and the constructor 
is public (by design). So it by itself can be used in a pipeline, and the 
`splits` param could be selected via cross-validation (for example).

What you propose here makes using `QuantileDiscretizer` and a non-default 
`handleInvalid` param together with cross-validation impossible. In addition, 
as you've pointed out in your code example above, this would force a pretty 
clunky "workaround" to set the `handleInvalid` param in a pipeline.

Why do this? What is the actual problem with what exists currently? To me 
it seems better the way it is. Also, I don't see any major benefit to adding a 
new `QuantileDiscretizerModel`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15995
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15995
  
**[Test build #69220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69220/consoleFull)**
 for PR 15995 at commit 
[`354a860`](https://github.com/apache/spark/commit/354a8605b5e539341f67f59ea507cc6f07a23eb3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15995
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69220/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if p...

2016-11-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16008


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if possible...

2016-11-27 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16008
  
Thanks - merging in master/branch-2.1. We should look into the Janino 
change in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16013
  
I think it is ready to be reviewed - @srowen. Thank you for your close look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if possible...

2016-11-27 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16008
  
Hmm, it is a great idea, but I think it would be very hard to submit such a 
patch (at least for me now).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89727420
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -40,14 +40,9 @@ case class JdbcType(databaseTypeDefinition : String, 
jdbcNullType : Int)
  * SQL dialect of a certain database or jdbc driver.
  * Lots of databases define types that aren't explicitly supported
  * by the JDBC spec.  Some JDBC drivers also report inaccurate
- * information---for instance,
- *
- * {{{
- * BIT(n>1)
- * }}}
- *
- * being reported as a BIT type is quite common, even though BIT in JDBC 
is meant for single-bit
- * values. Also, there does not appear to be a standard name for an 
unbounded string or binary
+ * information---for instance, BIT(n{@literal >}1) being reported as a BIT 
type is quite
--- End diff --

- Java
  ![2016-11-28 4 20 
51](https://cloud.githubusercontent.com/assets/6477701/20659399/b19d2c4c-b586-11e6-9098-f56c75676711.png)

- Scala
  ![2016-11-28 4 21 
03](https://cloud.githubusercontent.com/assets/6477701/20659401/b3f8dfa4-b586-11e6-9792-e8a8a1f85ca3.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-27 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89727312
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * (Scala-specific) Returns a new [[DataFrame]] that replaces null or 
NaN values in specified
+   * numeric, bool, string columns. If a specified column is not a 
numeric, boolean, string column,
+   * it is ignored.
+   *
+   * @since 2.1.0
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+value match {
+  case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Boolean | _: 
jl.Long | _: String =>
+  case _ =>
+throw new IllegalArgumentException(
+  s"Unsupported value type ${value.getClass.getName} ($value).")
+}
+
+val targetColumnType = value match {
+  case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Long => 
NumericType
+  case _: jl.Boolean => BooleanType
+  case _: String => StringType
+}
+
+val columnEquals = df.sparkSession.sessionState.analyzer.resolver
+val projections = df.schema.fields.map { f =>
+  // Only fill if the column is part of the cols list.
+  if (((f.dataType.isInstanceOf[NumericType] && targetColumnType == 
NumericType)
--- End diff --

Thanks!
 I have modified except one:
If T is a double type , this should be apply to all Numeric columns(include 
LongType/IntegerType), or just apply to FractionType?
The fill(value Double) apply to all Numeric columns, and I think fill(value 
Long) also keep the logic.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89727297
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -41,7 +41,7 @@ private[ml] trait VectorIndexerParams extends Params with 
HasInputCol with HasOu
 
   /**
* Threshold for the number of values a categorical feature can take.
-   * If a feature is found to have greater than maxCategories values, then 
it is declared
+   * If a feature is found to have {@literal >} maxCategories values, then 
it is declared
--- End diff --

Scaladoc/javadoc not found.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89727185
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala 
---
@@ -24,9 +24,7 @@ import org.apache.spark.Partition
 /**
  * Enumeration to manage state transitions of an RDD through checkpointing
  *
- * {{{
- * [ Initialized --> checkpointing in progress --> checkpointed ]
- * }}}
+ * [ Initialized --{@literal >} checkpointing in progress --{@literal >} 
checkpointed ]
--- End diff --

- Java
  ![2016-11-28 3 41 
11](https://cloud.githubusercontent.com/assets/6477701/20658544/1c962f72-b581-11e6-9126-1b0a6fc8354a.png)

Scaladoc not found.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89727091
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -422,13 +422,8 @@ private[spark] object UIUtils extends Logging {
* the whole string will rendered as a simple escaped text.
*
* Note: In terms of security, only anchor tags with root relative links 
are supported. So any
-   * attempts to embed links outside Spark UI, or other tags like
-   *
-   * {{{
-   *

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89727059
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala ---
@@ -33,9 +33,9 @@ import org.apache.spark.sql.types.StructType
  * use Spark SQL built-in function and UDFs to operate on these selected 
columns.
  * For example, [[SQLTransformer]] supports statements like:
  * {{{
- *  - SELECT a, a + b AS a_b FROM __THIS__
- *  - SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5
- *  - SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b
+ *  SELECT a, a + b AS a_b FROM __THIS__
+ *  SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5
+ *  SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b
--- End diff --

- Java
  ![2016-11-28 2 28 
18](https://cloud.githubusercontent.com/assets/6477701/20657138/f208e7ae-b576-11e6-82ee-bb709ee03c2f.png)

- Scala
  ![2016-11-28 2 28 
24](https://cloud.githubusercontent.com/assets/6477701/20657139/f35c7242-b576-11e6-9a4e-1b68943ad4b1.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89727120
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -41,9 +41,9 @@ private[spark] class JdbcPartition(idx: Int, val lower: 
Long, val upper: Long) e
  *   The RDD takes care of closing the connection.
  * @param sql the text of the query.
  *   The query must contain two ? placeholders for parameters used to 
partition the results.
- *
+ *   For example,
  *   {{{
- *   E.g. "select title, author from books where ? <= id and id <= ?"
+ *   select title, author from books where ? <= id and id <= ?
  *   }}}
--- End diff --

- Java
  ![2016-11-28 2 21 
59](https://cloud.githubusercontent.com/assets/6477701/20657043/29e4d314-b576-11e6-9946-fc6502920d4b.png)

- Scala
  ![2016-11-28 2 22 
34](https://cloud.githubusercontent.com/assets/6477701/20657044/29eaa686-b576-11e6-86cd-c0fcb9449e0f.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69228/consoleFull)**
 for PR 16013 at commit 
[`d2c6e86`](https://github.com/apache/spark/commit/d2c6e8606fd61e21f5bbe9bee4f70b7599b525f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69229/consoleFull)**
 for PR 15994 at commit 
[`2043283`](https://github.com/apache/spark/commit/2043283f84fe046aa80232f3921918a176b06540).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89726626
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -153,11 +153,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   /**
* Compute a histogram using the provided buckets. The buckets are all 
open
* to the right except for the last which is closed.
-   * {{{
*  e.g. for the array
*  [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
-   *  e.g 1<=x<10 , 10<=x<20, 20<=x<=50
-   * }}}
+   *  e.g {@code <=x<10, 10<=x<20, 20<=x<=50}
--- End diff --

- Java
  ![2016-11-28 3 03 
44](https://cloud.githubusercontent.com/assets/6477701/20657813/4176191a-b57c-11e6-92b5-72e88667354f.png)

- Scala
![2016-11-28 3 03 
31](https://cloud.githubusercontent.com/assets/6477701/20657814/4177599c-b57c-11e6-83b1-83f98a57ef2a.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16029
  
**[Test build #69227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69227/consoleFull)**
 for PR 16029 at commit 
[`2e01a62`](https://github.com/apache/spark/commit/2e01a622ce06a2d92390d3e32da145c556231520).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16029: [MINOR][ML] Remove duplicate import in GLR

2016-11-27 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/16029

[MINOR][ML] Remove duplicate import in GLR

## What changes were proposed in this pull request?
there were two `import GeneralizedLinearRegression._` in trait 
GLR.GeneralizedLinearRegressionBase
## How was this patch tested?
existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark del_duplicate_import

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16029


commit 2e01a622ce06a2d92390d3e32da145c556231520
Author: Zheng RuiFeng 
Date:   2016-11-28T07:04:29Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16028
  
**[Test build #69226 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69226/consoleFull)**
 for PR 16028 at commit 
[`ae74a3e`](https://github.com/apache/spark/commit/ae74a3e3272e8a9e40cc3225f65ae80e87e7e0ed).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69226/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16028
  
**[Test build #69226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69226/consoleFull)**
 for PR 16028 at commit 
[`ae74a3e`](https://github.com/apache/spark/commit/ae74a3e3272e8a9e40cc3225f65ae80e87e7e0ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15986
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15986
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69218/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2016-11-27 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/16028

[SPARK-18518][ML] HasSolver supports override

## What changes were proposed in this pull request?
1, make param support non-final with `finalFields` option
2, generate `HasSolver` with `finalFields = false`
3, override `solver` in LiR, GLR, and make MLPC inherit `HasSolver`

## How was this patch tested?
existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark param_non_final

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16028.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16028


commit 58349ca56267350241b9714810aa1411dd3a5d71
Author: Zheng RuiFeng 
Date:   2016-11-25T11:18:55Z

create pr

commit ae74a3e3272e8a9e40cc3225f65ae80e87e7e0ed
Author: Zheng RuiFeng 
Date:   2016-11-28T06:16:15Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15986
  
**[Test build #69218 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69218/consoleFull)**
 for PR 15986 at commit 
[`9c6ce7e`](https://github.com/apache/spark/commit/9c6ce7e8ceadcdae3ce36a147aac7cf680d5a86f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89724422
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala 
---
@@ -23,7 +23,8 @@ import org.apache.spark.Partition
 
 /**
  * Enumeration to manage state transitions of an RDD through checkpointing
- * [ Initialized --> checkpointing in progress --> checkpointed ].
+ *
+ * [ Initialized --{@literal >} checkpointing in progress --{@literal >} 
checkpointed ]
--- End diff --

- Java
  ![2016-11-28 3 41 
11](https://cloud.githubusercontent.com/assets/6477701/20658544/1c962f72-b581-11e6-9126-1b0a6fc8354a.png)

Scaladoc not found.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69225/consoleFull)**
 for PR 16013 at commit 
[`7b13fad`](https://github.com/apache/spark/commit/7b13fad10fe93a8ee2c6f84626209d98745dc313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15983
  
**[Test build #69223 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69223/consoleFull)**
 for PR 15983 at commit 
[`ca75331`](https://github.com/apache/spark/commit/ca753311a6d61452d7c29a349b8c34e66998f5ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15976
  
**[Test build #69224 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69224/consoleFull)**
 for PR 15976 at commit 
[`6db5af9`](https://github.com/apache/spark/commit/6db5af95e456d6529a37c243f41a4632a69f40d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15979
  
retest it please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15983
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...

2016-11-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15837
  
@kiszk Do you mean to avoid subexpression elimination?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15976
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15975
  
LGTM except https://github.com/apache/spark/pull/15975/files#r89722356, 
what's the status of it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15975#discussion_r89722356
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -404,6 +425,7 @@ class JDBCSuite extends SparkFunSuite
   numPartitions = 0,
--- End diff --

it's merged, has it been fixed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...

2016-11-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15837
  
@cloud-fan Sure, no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15837
  
Sorry for the delay, but I may not have time to review it before the 2.1 
release, can you hold it off until 2.1 release? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69222/consoleFull)**
 for PR 16013 at commit 
[`0e6ed2b`](https://github.com/apache/spark/commit/0e6ed2b5098af4c5d2abbdeca6e2ed45523e00e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89721863
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -405,13 +406,14 @@ case class WrapOption(child: Expression, optType: 
DataType)
  * A place holder for the loop variable used in [[MapObjects]].  This 
should never be constructed
  * manually, but will instead be passed into the provided lambda function.
  */
-case class LambdaVariable(value: String, isNull: String, dataType: 
DataType) extends LeafExpression
+case class LambdaVariable(value: String, isNull: String, dataType: 
DataType,
+valueNullable: Boolean = true) extends LeafExpression
--- End diff --

I meant that we could use the parameter name `nullable` like:

```scala
case class LambdaVariable(value: String, isNull: String, dataType: DataType,
nullable: Boolean = true) extends LeafExpression
```

and remove `override def nullable: Boolean = valueNullable`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15780
  
**[Test build #69221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69221/consoleFull)**
 for PR 15780 at commit 
[`214c6bb`](https://github.com/apache/spark/commit/214c6bb2d7aaf773d01a846795eb78f1e07e4ed1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89721435
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

i see. done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89721405
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * (Scala-specific) Returns a new [[DataFrame]] that replaces null or 
NaN values in specified
+   * numeric, bool, string columns. If a specified column is not a 
numeric, boolean, string column,
+   * it is ignored.
+   *
+   * @since 2.1.0
+   */
+  private def fill1[T](value: T, cols: Seq[String]): DataFrame = {
+value match {
+  case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Boolean | _: 
jl.Long | _: String =>
+  case _ =>
+throw new IllegalArgumentException(
+  s"Unsupported value type ${value.getClass.getName} ($value).")
+}
+
+val targetColumnType = value match {
--- End diff --

nit: we can combine the check here:
```
val targetType = value match {
  case _: Long => LongType
  case _: Double => DoubleType
  case _: String => StringType
  case _ => throw ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89721142
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 case v => throw new IllegalArgumentException(
   s"Unsupported value type ${v.getClass.getName} ($v).")
   }
+
+  /**
+   * (Scala-specific) Returns a new [[DataFrame]] that replaces null or 
NaN values in specified
--- End diff --

we don't need `(Scala-specific)` and the `since` tag for private methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89721112
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -153,19 +168,20 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
* (Scala-specific) Returns a new [[DataFrame]] that replaces null or 
NaN values in specified
* numeric columns. If a specified column is not a numeric column, it is 
ignored.
*
+   * @since 2.1.0
+   */
+  def fill(value: Long, cols: Seq[String]): DataFrame = {
+fill1[Long](value, cols)
--- End diff --

nit: `fill1(value, cols)` should work, scala has type inference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-27 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89721087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -128,66 +128,49 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   }
 
   /**
-   * Returns a new [[DataFrame]] that replaces null or NaN values in 
numeric columns with `value`.
+   * Returns a new [[DataFrame]] that replaces null or NaN values
+   * in numeric, boolean, string columns with `value`.
*
* @since 1.3.1
*/
-  def fill(value: Double): DataFrame = fill(value, df.columns)
+  def fill[T](value: T): DataFrame = fill(value, df.columns)
--- End diff --

ok, thanks a lot ! I have put it as a private. @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16003: [SPARK-18482][SQL] make sure Spark can access the...

2016-11-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16003


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15995
  
**[Test build #69220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69220/consoleFull)**
 for PR 15995 at commit 
[`354a860`](https://github.com/apache/spark/commit/354a8605b5e539341f67f59ea507cc6f07a23eb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-11-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15995
  
@ericl you are right, I pushed a new commit to do `convertStaticPartitions` 
right before we convert `InsertIntoTable` to `InsertIntoHadoopFsRelation`, so 
the partitioning information won't be erased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...

2016-11-27 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16003
  
Merging in master/branch-2.1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16003
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16003
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69217/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16003
  
**[Test build #69217 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69217/consoleFull)**
 for PR 16003 at commit 
[`117f532`](https://github.com/apache/spark/commit/117f5321cac62f01a5726c308efaf7369a9cdc9d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89719801
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLS.scala ---
@@ -53,8 +53,13 @@ private[spark] object NNLS {
* projected gradient method.  That is, find x minimising ||Ax - b||_2 
given A^T A and A^T b.
*
* We solve the problem
-   *   min_x  1/2 x^T ata x^T - x^T atb
-   *   subject to x = 0
+   *
+   * 
+   *$$
+   *min_x 1/2 x^T ata x^T - x^T atb
+   *$$
+   * 
--- End diff --

- Java
  ![2016-11-28 2 32 
13](https://cloud.githubusercontent.com/assets/6477701/20657212/799229f6-b577-11e6-9616-30a1e3f7ee1f.png)


- Scala (not found but manually built after changing the access modifier)
  ![2016-11-28 2 03 
57](https://cloud.githubusercontent.com/assets/6477701/20657201/5a8d37a8-b577-11e6-8af5-8ed07c65a0ac.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15986
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89719600
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala ---
@@ -32,9 +32,11 @@ import org.apache.spark.sql.types.StructType
  * the output, it can be any select clause that Spark SQL supports. Users 
can also
  * use Spark SQL built-in function and UDFs to operate on these selected 
columns.
  * For example, [[SQLTransformer]] supports statements like:
- *  - SELECT a, a + b AS a_b FROM __THIS__
- *  - SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5
- *  - SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b
+ * {{{
+ *  SELECT a, a + b AS a_b FROM __THIS__
+ *  SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5
+ *  SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b
+ * }}}
--- End diff --

- Java
  ![2016-11-28 2 28 
18](https://cloud.githubusercontent.com/assets/6477701/20657138/f208e7ae-b576-11e6-82ee-bb709ee03c2f.png)

- Scala
  ![2016-11-28 2 28 
24](https://cloud.githubusercontent.com/assets/6477701/20657139/f35c7242-b576-11e6-9a4e-1b68943ad4b1.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15986
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69216/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15986
  
**[Test build #69216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69216/consoleFull)**
 for PR 15986 at commit 
[`3392903`](https://github.com/apache/spark/commit/3392903734bf5f00258f0652c971938846e64bcd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89719290
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -41,7 +41,10 @@ private[spark] class JdbcPartition(idx: Int, val lower: 
Long, val upper: Long) e
  *   The RDD takes care of closing the connection.
  * @param sql the text of the query.
  *   The query must contain two ? placeholders for parameters used to 
partition the results.
- *   E.g. "select title, author from books where ? <= id and id <= ?"
+ *   For example,
+ *   {{{
+ *   select title, author from books where ? <= id and id <= ?
+ *   }}}
--- End diff --

- Java
  ![2016-11-28 2 21 
59](https://cloud.githubusercontent.com/assets/6477701/20657043/29e4d314-b576-11e6-9946-fc6502920d4b.png)

- Scala
  ![2016-11-28 2 22 
34](https://cloud.githubusercontent.com/assets/6477701/20657044/29eaa686-b576-11e6-86cd-c0fcb9449e0f.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89719223
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -152,10 +152,10 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
 
   /**
* Compute a histogram using the provided buckets. The buckets are all 
open
-   * to the right except for the last which is closed
+   * to the right except for the last which is closed.
*  e.g. for the array
*  [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
-   *  e.g 1<=x<10 , 10<=x<20, 20<=x<=50
+   *  e.g {@code <=x<10 , 10<=x<20, 20<=x<=50}
--- End diff --

- Java
  ![2016-11-28 2 20 
56](https://cloud.githubusercontent.com/assets/6477701/20657020/f32869bc-b575-11e6-968b-59642b5edfc6.png)

- Scala
  ![2016-11-28 2 21 
14](https://cloud.githubusercontent.com/assets/6477701/20657021/f48b5cce-b575-11e6-9d0e-d399a539acb4.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10942: [SPARK-12850] [SQL] Support Bucket Pruning (Predicate Pu...

2016-11-27 Thread yucai

Github user yucai commented on the issue:

https://github.com/apache/spark/pull/10942
  
@gatorsmile, seems like getBuckets() is removed by below PR, which makes 
this feture not work any more, could you kindly help check?
[SPARK-14535][SQL] Remove buildInternalScan from FileFormat

Much thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69219/consoleFull)**
 for PR 16013 at commit 
[`29d65cc`](https://github.com/apache/spark/commit/29d65cce3e5f2e29010609c9323cd79ca889b9f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16013
  
Let me leave some images that I changed and some comments to double check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89718720
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -57,9 +57,17 @@ import org.apache.spark.util.SerializableJobConf
  * @param partition a map from the partition key to the partition value 
(optional). If the partition
  *  value is optional, dynamic partition insert will be 
performed.
  *  As an example, `INSERT INTO tbl PARTITION (a=1, b=2) 
AS ...` would have
- *  Map('a' - Some('1'), 'b' - Some('2')),
+ *
+ *  {{{
+ *  Map('a' -> Some('1'), 'b' -> Some('2'))
+ *  }}},
+ *
  *  and `INSERT INTO tbl PARTITION (a=1, b) AS ...`
- *  would have Map('a' - Some('1'), 'b' - None).
+ *  would have
+ *
+ *  {{{
+ *  Map('a' -> Some('1'), 'b' -> None)
+ *  }}}.
--- End diff --

Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89718517
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

Yes, that's right.
When `needNullCheck == true`, i.e. `propagateNull && 
arguments.exists(_.nullable)`, if there is null argument, `Invoke` returns null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89718401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

OK. I see. What you want is to have `Invoke` as null if any input argument 
is null, no matters what the invoked method returns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89718250
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

When `returnNullable` is false, meaning the invoked method doesn't return a 
null, and `targetObject` is also not null, why `Invoke.nullable` is true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89718035
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

What do you think the behavior of `Invoke` when `returnNullable == false && 
needNullCheck == true`?

IMO, `returnNullable` is for the calling method itself, not for `Invoke`.
I think there will be the case that we want to propagate null if the method 
won't return a null value.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-27 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16011
  
@MLnick Yeah, I think this is the most common case that copying Params from 
estimators to models. However, I also found some algorithms do not comply this 
rule, such as ```ALS``` which has ```ALSParams``` and ```ALSModelParams``` for 
estimator and model separately.

I think we can set params to models not via estimator, for example:
```
val discretizer = new QuantileDiscretizer()
val pipeline = new Pipeline().setStages(Array(discretizer))
val model = pipeline.fit(df)
model.stages(0).asInstanceOf[Bucketizer].setHandleInvalid("skip")
```
I know this way is a little tricky, a better way may be we can have 
```QuantileDiscretizerModel``` which is produced by ```QuantileDiscretizer```.
Think more about it, ```Bucketizer``` is a separate transformer which 
mainly has two params(```splits``` and ```handleInvalid```) can be set. Users 
can provides candidates for these two params when doing cross validation to 
select the best model. But if we constrict it must be produced by 
```QuantileDiscretizer```, the ```splits``` would be a member variable of the 
model rather than a param. From this perspective, it's more make sense to see 
```Bucketizer``` as a separate transformer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89716676
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

If `returnNullable` is given false from caller, I think it means we do make 
sure this `Invoke` won't return a null value, e.g., for a primitive type. And 
it should be even `needNullCheck` is true.

Under this case (`returnNullable == false`), the only case this `Invoke` 
returning null should be `targetObject` is nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69214/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

2016-11-27 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/15986#discussion_r89716522
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -350,20 +350,16 @@ private[spark] class TaskSchedulerImpl(
 removeExecutor(execId, reason.get)
 failedExecutor = Some(execId)
   }
+}
+if (TaskState.isFinished(state)) {
+  cleanupTaskState(tid)
--- End diff --

I don't think that's necessary because all access to the TSM is gated on 
the TaskSchedulerImpl, so even though the TaskResultGetter might do some stuff, 
the TSM's state won't be accessed until the later handleSuccessfulTask call to 
the TaskSchedulerImpl


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69214/consoleFull)**
 for PR 15994 at commit 
[`508aaa0`](https://github.com/apache/spark/commit/508aaa0b68f049fb463f1334784b6417d739e816).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16027: Make sure CollapseWindow returns the attributes in the s...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16027
  
Could you add `[SPARK-18604][SQL]` before merging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15986
  
**[Test build #69218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69218/consoleFull)**
 for PR 15986 at commit 
[`9c6ce7e`](https://github.com/apache/spark/commit/9c6ce7e8ceadcdae3ce36a147aac7cf680d5a86f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16027: Make sure CollapseWindow returns the attributes i...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16027#discussion_r89716395
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseWindowSuite.scala
 ---
@@ -46,12 +46,15 @@ class CollapseWindowSuite extends PlanTest {
   .window(Seq(sum(b).as('sum_b)), partitionSpec1, orderSpec1)
   .window(Seq(avg(b).as('avg_b)), partitionSpec1, orderSpec1)
 
-val optimized = Optimize.execute(query.analyze)
+val analyzed = query.analyze
+val optimized = Optimize.execute(analyzed)
+assert(analyzed.output === optimized.output)
+
 val correctAnswer = testRelation.window(Seq(
-avg(b).as('avg_b),
-sum(b).as('sum_b),
-max(a).as('max_a),
-min(a).as('min_a)), partitionSpec1, orderSpec1)
--- End diff --

While making this, I didn't notice this is strange. :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69213/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16027: Make sure CollapseWindow returns the attributes in the s...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16027
  
LGTM. Thank you for correct this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16027: Make sure CollapseWindow returns the attributes i...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16027#discussion_r89716308
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -545,7 +545,7 @@ object CollapseRepartition extends Rule[LogicalPlan] {
 object CollapseWindow extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
 case w @ Window(we1, ps1, os1, Window(we2, ps2, os2, grandChild)) if 
ps1 == ps2 && os1 == os2 =>
-  w.copy(windowExpressions = we1 ++ we2, child = grandChild)
--- End diff --

Thank you for fixing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15994
  
**[Test build #69213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69213/consoleFull)**
 for PR 15994 at commit 
[`7117447`](https://github.com/apache/spark/commit/71174472e1d01be450162cd22843345e4d14b00c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

2016-11-27 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15986#discussion_r89716251
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -350,20 +350,16 @@ private[spark] class TaskSchedulerImpl(
 removeExecutor(execId, reason.get)
 failedExecutor = Some(execId)
   }
+}
+if (TaskState.isFinished(state)) {
+  cleanupTaskState(tid)
--- End diff --

I can do that, but I we might want to make sure that 
`taskSet.removeRunningTask` is called prior to the `taskResultGetter` call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15874
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15874
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69215/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15874
  
**[Test build #69215 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69215/consoleFull)**
 for PR 15874 at commit 
[`e198080`](https://github.com/apache/spark/commit/e198080557c598286363184855a6f368d60b45e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ClusteringSummary(JavaWrapper):`
  * `class GaussianMixtureSummary(ClusteringSummary):`
  * `class BisectingKMeansSummary(ClusteringSummary):`
  * `trait CollectionGenerator extends Generator `
  * `case class Stack(children: Seq[Expression]) extends Generator `
  * `abstract class ExplodeBase extends UnaryExpression with 
CollectionGenerator with Serializable `
  * `case class Explode(child: Expression) extends ExplodeBase `
  * `case class PosExplode(child: Expression) extends ExplodeBase `
  * `case class Inline(child: Expression) extends UnaryExpression with 
CollectionGenerator `
  * `case class OuterReference(e: NamedExpression)`
  * `trait InvokeLike extends Expression with NonSQLExpression `
  * `case class ColumnStat(`
  * `case class UncacheTableCommand(`
  * `case class OffsetSeq(offsets: Seq[Option[Offset]], metadata: 
Option[String] = None) `
  * `case class SparkListenerDriverAccumUpdates(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89715587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

Yes, I agree that this part is little tricky. After waiting for other 
comments, I will rephrase the comment on Tuesday.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89715546
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

I think we can only guarantee that the `inputObject` is not null in the 
false case, so we should use `AssertNotNull()` for `inputObject` and the 
`fieldValue` will be like:

```scala
val fieldValue = Invoke(
  AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(fieldType),
  returnNullable = !fieldType.typeSymbol.asClass.isPrimitive)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89712321
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
 
-  override def nullable: Boolean = true
+  override def nullable: Boolean = targetObject.nullable || returnNullable
--- End diff --

I think this would be `targetObject.nullable || needNullCheck || 
returnNullable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89713304
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -177,9 +177,10 @@ case class Invoke(
 functionName: String,
 dataType: DataType,
 arguments: Seq[Expression] = Nil,
-propagateNull: Boolean = true) extends InvokeLike {
+propagateNull: Boolean = true,
+returnNullable : Boolean = true) extends InvokeLike {
--- End diff --

Add a `@param` document for `returnNullable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89712401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -405,13 +406,14 @@ case class WrapOption(child: Expression, optType: 
DataType)
  * A place holder for the loop variable used in [[MapObjects]].  This 
should never be constructed
  * manually, but will instead be passed into the provided lambda function.
  */
-case class LambdaVariable(value: String, isNull: String, dataType: 
DataType) extends LeafExpression
+case class LambdaVariable(value: String, isNull: String, dataType: 
DataType,
+valueNullable: Boolean = true) extends LeafExpression
--- End diff --

We can use `nullable: Boolean` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16027: Make sure CollapseWindow returns the attributes in the s...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16027
  
Oh, I missed this. Yep. I'll take a look at this, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16026: [SPARK-18597][SQL] Do push-down predicates to rig...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16026#discussion_r89715298
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
 ---
@@ -514,6 +514,39 @@ class FilterPushdownSuite extends PlanTest {
 comparePlans(optimized, 
analysis.EliminateSubqueryAliases(correctAnswer))
   }
 
+  test("joins: push down where clause into left anti join") {
+val x = testRelation.subquery('x)
+val y = testRelation.subquery('y)
+val originalQuery =
+  x.join(y, LeftAnti, Some("x.b".attr === "y.b".attr))
+.where("x.a".attr > 10)
+.analyze
+val optimized = Optimize.execute(originalQuery)
+val correctAnswer =
+  x.where("x.a".attr > 10)
+.join(y, LeftAnti, Some("x.b".attr === "y.b".attr))
+.analyze
+comparePlans(optimized, 
analysis.EliminateSubqueryAliases(correctAnswer))
+  }
+
+  test("joins: only push down to the right of a left anti join") {
--- End diff --

Do we need a JIRA issue number here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16026: [SPARK-18597][SQL] Do push-down predicates to right side...

2016-11-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16026
  
LGTM, @hvanhovell .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16026: [SPARK-18597][SQL] Do push-down predicates to right side...

2016-11-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16026
  
LGTM except a minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

2016-11-27 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/15986#discussion_r89715118
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -350,20 +350,16 @@ private[spark] class TaskSchedulerImpl(
 removeExecutor(execId, reason.get)
 failedExecutor = Some(execId)
   }
+}
+if (TaskState.isFinished(state)) {
+  cleanupTaskState(tid)
--- End diff --

ok last comment: do you think it's more readable to structure this code as:

`if (TaskState.isFinished(state) {
  if (state == TaskState.LOST) {

taskResultGetter.enqueueFailed
  } else if (SET(TaskState.FAILED, TaskState.KILLED).contains(state)) {
   taskResultGetter.enqueueFailedTask(taskSet, tid, state, 
serializedData)
  } else if (state == TaskState.FINISHED) {
   taskResultGetter.enqueueSuccessful(...)
  }

  cleanupTaskState
  taskSet.removeRunningTask
}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

2016-11-27 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/15986#discussion_r89715211
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -274,4 +276,70 @@ class TaskSchedulerImplSuite extends SparkFunSuite 
with LocalSparkContext with L
 assert("executor1" === taskDescriptions3(0).executorId)
   }
 
+  test("if an executor is lost then the state for its running tasks is 
cleaned up (SPARK-18553)") {
+sc = new SparkContext("local", "TaskSchedulerImplSuite")
+val taskScheduler = new TaskSchedulerImpl(sc)
+taskScheduler.initialize(new FakeSchedulerBackend)
+// Need to initialize a DAGScheduler for the taskScheduler to use for 
callbacks.
+new DAGScheduler(sc, taskScheduler) {
+  override def taskStarted(task: Task[_], taskInfo: TaskInfo) {}
+  override def executorAdded(execId: String, host: String) {}
+}
+
+val e0Offers = Seq(WorkerOffer("executor0", "host0", 1))
+val attempt1 = FakeTask.createTaskSet(1)
+
+// submit attempt 1, offer resources, task gets scheduled
+taskScheduler.submitTasks(attempt1)
+val taskDescriptions = taskScheduler.resourceOffers(e0Offers).flatten
+assert(1 === taskDescriptions.length)
+
+// mark executor0 as dead
+taskScheduler.executorLost("executor0", SlaveLost())
+assert(!taskScheduler.isExecutorAlive("executor0"))
+assert(!taskScheduler.hasExecutorsAliveOnHost("host0"))
+assert(taskScheduler.getExecutorsAliveOnHost("host0").isEmpty)
+
+
+// Check that state associated with the lost task attempt is cleaned 
up:
+assert(taskScheduler.taskIdToExecutorId.isEmpty)
+assert(taskScheduler.taskIdToTaskSetManager.isEmpty)
+
assert(taskScheduler.runningTasksByExecutors().get("executor0").isEmpty)
+  }
+
+  test("if a task finishes with TaskState.LOST then mark its executor as 
dead") {
--- End diff --

super nit but can you write this as "if a task finishes with TaskState.LOST 
its executor is marked as dead"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter met...

2016-11-27 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16017#discussion_r89715218
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -52,33 +52,49 @@ class DecisionTreeClassifier @Since("1.4.0") (
 
   // Override parameter setters from parent trait for Java API 
compatibility.
 
+  /** @group setParam */
   @Since("1.4.0")
-  override def setMaxDepth(value: Int): this.type = 
super.setMaxDepth(value)
+  override def setMaxDepth(value: Int): this.type = set(maxDepth, value)
 
+  /** @group setParam */
   @Since("1.4.0")
-  override def setMaxBins(value: Int): this.type = super.setMaxBins(value)
+  override def setMaxBins(value: Int): this.type = set(maxBins, value)
 
+  /** @group setParam */
   @Since("1.4.0")
-  override def setMinInstancesPerNode(value: Int): this.type =
-super.setMinInstancesPerNode(value)
+  override def setMinInstancesPerNode(value: Int): this.type = 
set(minInstancesPerNode, value)
 
+  /** @group setParam */
   @Since("1.4.0")
-  override def setMinInfoGain(value: Double): this.type = 
super.setMinInfoGain(value)
+  override def setMinInfoGain(value: Double): this.type = set(minInfoGain, 
value)
 
+  /** @group expertSetParam */
   @Since("1.4.0")
-  override def setMaxMemoryInMB(value: Int): this.type = 
super.setMaxMemoryInMB(value)
+  override def setMaxMemoryInMB(value: Int): this.type = 
set(maxMemoryInMB, value)
 
+  /** @group expertSetParam */
   @Since("1.4.0")
-  override def setCacheNodeIds(value: Boolean): this.type = 
super.setCacheNodeIds(value)
+  override def setCacheNodeIds(value: Boolean): this.type = 
set(cacheNodeIds, value)
 
+  /**
+   * Specifies how often to checkpoint the cached node IDs.
+   * E.g. 10 means that the cache will get checkpointed every 10 
iterations.
+   * This is only used if cacheNodeIds is true and if the checkpoint 
directory is set in
+   * [[org.apache.spark.SparkContext]].
+   * Must be >= 1.
+   * (default = 10)
+   * @group setParam
--- End diff --

The cause of this change was suggested at 
https://github.com/apache/spark/pull/15913#discussion_r89662469 , since Param 
setter methods in traits used to have the wrong type in Java. We would like to 
remove the setter method from the trait since it does not make sense to have it 
in the Model classes. We could put the setter method in each subclass and then 
deprecate the method in the Model classes.

So if we remove the setter method from the traits, we can not inherit docs 
from them. BTW, the current change is consistent with other ML algorithms which 
inherit traits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 285 matches

Mail list logo