[GitHub] spark issue #13690: [SPARK-15767][R][ML] Decision Tree Regression wrapper in...

2016-11-11 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13690
  
@shivaram I will update this today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15365: [SPARK-17157][SPARKR]: Add multiclass logistic re...

2016-10-08 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15365#discussion_r82497112
  
--- Diff: R/pkg/R/mllib.R ---
@@ -117,7 +124,7 @@ NULL
 #' @export
 #' @seealso \link{spark.glm}, \link{glm},
 #' @seealso \link{spark.als}, \link{spark.gaussianMixture}, 
\link{spark.isoreg}, \link{spark.kmeans},
-#' @seealso \link{spark.mlp}, \link{spark.naiveBayes}, \link{spark.survreg}
+#' @seealso \link{spark.mlp}, \link{spark.naiveBayes}, 
\link{spark.survreg}, \link{spark.logit}
--- End diff --

same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15365: [SPARK-17157][SPARKR]: Add multiclass logistic re...

2016-10-08 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15365#discussion_r82497107
  
--- Diff: R/pkg/R/mllib.R ---
@@ -105,7 +112,7 @@ setClass("KSTest", representation(jobj = "jobj"))
 #' @seealso \link{spark.glm}, \link{glm},
 #' @seealso \link{spark.als}, \link{spark.gaussianMixture}, 
\link{spark.isoreg}, \link{spark.kmeans},
 #' @seealso \link{spark.lda}, \link{spark.mlp}, \link{spark.naiveBayes}, 
\link{spark.survreg}
-#' @seealso \link{read.ml}
+#' @seealso \link{spark.logit}, \link{read.ml}
--- End diff --

this group of links could be sorted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13690: [SPARK-15767][R][ML] Decision Tree Regression wrapper in...

2016-10-06 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13690
  
@felixcheung @shivaram @junyangq It's ready for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/15365
  
@wangmiao1981 I saw the similar error on Jekin. Same with question with you.
Regarding to `e1071`, I think we only need to install that package locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13690: [SPARK-15767][R][ML] Decision Tree Regression wrapper in...

2016-09-29 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13690
  
@felixcheung I'll update the changes in this two days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13690: [SPARK-15767][R][ML] Decision Tree Regression wrapper in...

2016-08-23 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13690
  
@junyangq I have started working on random forest wrapper. I will open PR 
as soon as possible. Also, I'll update this PR very soon. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13690: [SPARK-15767][R][ML] Decision Tree Regression wrapper in...

2016-08-13 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13690
  
Yes, sure. But I'm in a vacation this week. I will keep working on this and
update as soon as possible when I get back next week.

On Thu, Aug 11, 2016, 19:46 Felix Cheung <notificati...@github.com> wrote:

> Hi @vectorijk <https://github.com/vectorijk> would you be interested in
> continuing this work?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/13690#issuecomment-239326553>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ADQu6Z5doEmjTpTXYESSYVlyiIM0c2sJks5qe7RFgaJpZM4I2xmp>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13922: [SPARK-11938][PySpark] Expose numFeatures in all ...

2016-07-18 Thread vectorijk
Github user vectorijk closed the pull request at:

https://github.com/apache/spark/pull/13922


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13922: [SPARK-11938][PySpark] Expose numFeatures in all ML Pred...

2016-07-18 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13922
  
@MLnick Thanks! I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-07-14 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/14136
  
We also need to remove line here 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala#L240.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-07-14 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r70834638
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -475,4 +475,20 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   spark.sql("select avg(a) over () from values 1.0, 2.0, 3.0 T(a)"),
   Row(2.0) :: Row(2.0) :: Row(2.0) :: Nil)
   }
+
+  test("percentile functions") {
+val df = Seq(1, 3, 3, 6, 5, 4, 17, 38, 29, 400).toDF("a")
+checkAnswer(
+  df.select(percentile($"a", 0.5d), percentile($"a", Seq(0d, 0.75d, 
1d))),
+  Seq(Row(Seq(5.5), Seq(1.0, 26.0, 400.0)))
+)
+  }
+
+  test("percentile functions with zero input rows.") {
+val df = Seq(1, 3, 3, 6, 5, 4, 17, 38, 29, 400).toDF("a").where($"a" < 
0)
+checkAnswer(
+  df.select(percentile($"a", 0.5d)),
+  Seq(Row(Seq.empty))
+)
+  }
--- End diff --

I think we should also give a test to see what if pass an empty array of 
percentiles. It may throw an exception or error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression wrapper ...

2016-07-13 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/14182
  
@wangmiao1981 
TODO:
`summary()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...

2016-07-13 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14182#discussion_r70682078
  
--- Diff: R/pkg/R/mllib.R ---
@@ -53,6 +53,13 @@ setClass("AFTSurvivalRegressionModel", 
representation(jobj = "jobj"))
 #' @note KMeansModel since 2.0.0
 setClass("KMeansModel", representation(jobj = "jobj"))
 
+#' S4 class that represents an IsotonicRegressionModel
+#'
+#' @param jobj a Java object reference to the backing Scala 
IsotonicRegressionModel
+#' @export
+#' @note IsotonicRegressionModel since 2.0.0
--- End diff --

`2.1.0`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-07-11 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r70294379
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.GenericArrayData
+import org.apache.spark.sql.types._
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * The Percentile aggregate function computes the exact percentile(s) of 
expr at pc with range in
+ * [0, 1].
+ * The parameter pc can be a DoubleType or DoubleType array.
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(epxr, pc) - Returns the percentile(s) of expr at pc 
(range: [0,1]). pc can be
+  a double or double array.""")
+case class Percentile(
+ child: Expression,
+ pc: Seq[Double],
+ mutableAggBufferOffset: Int = 0,
+ inputAggBufferOffset: Int = 0)
+  extends ImperativeAggregate {
+
+  def this(child: Expression, pc: Double) = {
+this(child = child, pc = Seq(pc), mutableAggBufferOffset = 0, 
inputAggBufferOffset = 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): ImperativeAggregate =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
ImperativeAggregate =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  var counts = new OpenHashMap[Long, Long]()
--- End diff --

@hvanhovell Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-07-11 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r70285450
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.GenericArrayData
+import org.apache.spark.sql.types._
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * The Percentile aggregate function computes the exact percentile(s) of 
expr at pc with range in
+ * [0, 1].
+ * The parameter pc can be a DoubleType or DoubleType array.
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(epxr, pc) - Returns the percentile(s) of expr at pc 
(range: [0,1]). pc can be
+  a double or double array.""")
+case class Percentile(
+ child: Expression,
+ pc: Seq[Double],
+ mutableAggBufferOffset: Int = 0,
+ inputAggBufferOffset: Int = 0)
+  extends ImperativeAggregate {
+
+  def this(child: Expression, pc: Double) = {
+this(child = child, pc = Seq(pc), mutableAggBufferOffset = 0, 
inputAggBufferOffset = 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): ImperativeAggregate =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
ImperativeAggregate =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  var counts = new OpenHashMap[Long, Long]()
--- End diff --

@jiangxb1987 I am just curious about why we use `OpenHashMap` here instead 
of using `mutable.Map` to correspond with code 
[here](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.java#L58)
 in hive. Is there any specific reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13922: [SPARK-11938][PySpark] Expose numFeatures in all ML Pred...

2016-06-27 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13922
  
cc @jkbradley @yanboliang @Lewuathe 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9936: [SPARK-11938][ML] Expose numFeatures in all ML Prediction...

2016-06-27 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/9936
  
@Lewuathe Thanks! I opened a new PR #13922 here for this issue. Would you 
mind closing this PR later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13922: [SPARK-11938][PySpark] Expose numFeatures in all ...

2016-06-27 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/13922

[SPARK-11938][PySpark] Expose numFeatures in all ML PredictionModel for 
PySpark

## What changes were proposed in this pull request?
JIRA: 
[https://issues.apache.org/jira/browse/SPARK-11938](https://issues.apache.org/jira/browse/SPARK-11938)

[SPARK-9715](https://issues.apache.org/jira/browse/SPARK-9715) provided 
support for numFeatures in all ML `PredictionModel`s, we should also expose it 
at PySpark.
## How was this patch tested?
Unit Test for PySpark

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-11938

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13922.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13922


commit 8f2e5c67b7adc6f5a0ae853837ebb516967bca6c
Author: lewuathe <lewua...@me.com>
Date:   2016-04-01T14:29:53Z

[SPARK-11938] Expose numFeatures in all ML PredictionModel for PySpark

Aggregate numFeatures property in HasNumFeaturesModel in base.py.

commit 461b7c64a274a745f88f8eeb0d9c39470c7aa958
Author: lewuathe <lewua...@me.com>
Date:   2016-04-02T02:16:33Z

[SPARK-11938] Fix style

commit 872d384be9710eb5fd5c381ac4dd9eb7e40fa00a
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-27T07:50:14Z

export numFeatures in ML PredictionModel




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13248: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...

2016-06-24 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13248
  
ping @praveendareddy21 Is this still active? If not, I could help with this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9936: [SPARK-11938][ML] Expose numFeatures in all ML Prediction...

2016-06-24 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/9936
  
ping @Lewuathe Is this still active? If not, I could help with this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13820#discussion_r67968653
  
--- Diff: R/pkg/R/mllib.R ---
@@ -99,10 +114,8 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 return(new("GeneralizedLinearRegressionModel", jobj = jobj))
   })
 
-#' Fits a generalized linear model (R-compliant).
-#'
-#' Fits a generalized linear model, similarly to R's glm().
-#'
+#' @title Generalized Linear Models (R-compliant)
--- End diff --

Also changes below? @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13820#discussion_r67968505
  
--- Diff: R/pkg/R/mllib.R ---
@@ -99,10 +114,8 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 return(new("GeneralizedLinearRegressionModel", jobj = jobj))
   })
 
-#' Fits a generalized linear model (R-compliant).
-#'
-#' Fits a generalized linear model, similarly to R's glm().
-#'
+#' @title Generalized Linear Models (R-compliant)
--- End diff --

Also, as talked with this discussion 
https://github.com/apache/spark/pull/13394#discussion-diff-66177097, should we 
follow first sentence as the convention to define the title in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13660#discussion_r67945982
  
--- Diff: docs/sparkr.md ---
@@ -262,6 +262,83 @@ head(df)
 {% endhighlight %}
 
 
+### Applying User-defined Function
+In SparkR, we support several kinds for User-defined Functions:
+
+ Run a given function on a large dataset using `dapply` or 
`dapplyCollect`
+
+# dapply
+Apply a function to each partition of `SparkDataFrame`. The function to be 
applied to each partition of the `SparkDataFrame`
--- End diff --

@NarineK I filed a JIRA 
[SPARK-16112](https://issues.apache.org/jira/browse/SPARK-16112) for gapply 
programming guide so that you could open a PR for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-20 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13660#discussion_r67782797
  
--- Diff: docs/sparkr.md ---
@@ -262,6 +262,79 @@ head(df)
 {% endhighlight %}
 
 
+### Applying User-defined Function
+In SparkR, we support several kinds for User-defined Functions:
+
+ Run a given function on a large dataset using `dapply` or 
`dapplyCollect`
+
+# dapply
+Apply a function to each partition of `SparkDataFrame`. The function to be 
applied to each partition of the `SparkDataFrame`
+and should have only one parameter, to which a `data.frame` corresponds to 
each partition will be passed. The output of function
+should be a `data.frame`. Schema specifies the row format of the resulting 
`SparkDataFrame`. It must match the R function's output.
+
+{% highlight r %}
+
+# Convert waiting time from hours to seconds.
+# Note that we can apply UDF to DataFrame.
+schema <- structType(structField("eruptions", "double"), 
structField("waiting", "double"),
+ structField("waiting_secs", "double"))
+df1 <- dapply(df, function(x) {x <- cbind(x, x$waiting * 60)}, schema)
+head(collect(df1))
+##  eruptions waiting waiting_secs
+##1 3.600  79 4740
+##2 1.800  54 3240
+##3 3.333  74 4440
+##4 2.283  62 3720
+##5 4.533  85 5100
+##6 2.883  55 3300
+{% endhighlight %}
+
+
+# dapplyCollect
+Like `dapply`, apply a function to each partition of `SparkDataFrame` and 
collect the result back.
+
+{% highlight r %}
+
+# Convert waiting time from hours to seconds.
+# Note that we can apply UDF to DataFrame and return a R's data.frame
+ldf <- dapplyCollect(
+ df,
+ function(x) {
+   x <- cbind(x, "waiting_secs"=x$waiting * 60)
+ })
+head(ldf, 3)
+##  eruptions waiting waiting_secs
+##1 3.600  79 4740
+##2 1.800  54 3240
+##3 3.333  74 4440
+
+{% endhighlight %}
+
+
+ Run many functions in parallel using `spark.lapply`
+
+# lapply
+Similar to `lapply` in native R, `spark.lapply` runs a function over a 
list of elements and distributes the computations with Spark.
+Applies a function in a manner that is similar to `doParallel` or `lapply` 
to elements of a list.
--- End diff --

Thanks so much for pointing out this! I will update those very soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-20 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13660
  
Jenkins test this again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-19 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/12836
  
@NarineK I am not quite sure. Maybe you could create a new JIRA for 
gapply's programming guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-18 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13660
  
@jkbradley @shivaram @felixcheung addressed comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-17 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/12836
  
@NarineK Cool~ I think it is better to open a separate PR to track `gapply` 
programming guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-17 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/12836
  
@NarineK Which way do you want to include programming guide for `gapply`, 
in separate PR or in #13660?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-15 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r67259794
  
--- Diff: R/pkg/R/mllib.R ---
@@ -402,6 +406,8 @@ setMethod("spark.naiveBayes", signature(data = 
"SparkDataFrame", formula = "form
 return(new("NaiveBayesModel", jobj = jobj))
 })
 
+#' Save fitted MLlib model to the input path
--- End diff --

@jkbradley Likewise, I changed title `write.ml` to `Save fitted MLlib model 
to the input path` rather than `Save the Bernoulli naive Bayes model to the 
input path.` for all four different models.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13690: [SPARK-15767][R][ML][WIP] Decision Tree Regressio...

2016-06-15 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/13690

[SPARK-15767][R][ML][WIP] Decision Tree Regression wrapper in SparkR

## What changes were proposed in this pull request?
Implement a wrapper in SparkR to support decision tree regression. R's 
naive Decision Tree Regression implementation is from package rpart with 
signature `rpart(formula, dataframe, method="anova")`. I propose we could 
implement API like `spark.rpart(dataframe, formula, ...)` . After having 
implemented decision tree classification, we could refactor this two into an 
API more like `rpart()`.
## How was this patch tested?
Test with unit test in SparkR

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark DEV-DTRegression

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13690


commit 7ea95448829c3e981ae05542e1aaae77f2f996ce
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-04T11:29:41Z

init step

lack of implementing unit test for spark.rpart




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API ...

2016-06-15 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13394
  
Thanks! @jkbradley @felixcheung @shivaram Sure. How about use title 
`Predicted values based on model object` instead of using `predict` (like 
[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html))
and use title `Compute histogram statistics for given column` instead of 
`Histogram`  ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-14 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13660#discussion_r67020817
  
--- Diff: docs/sparkr.md ---
@@ -262,6 +262,67 @@ head(df)
 {% endhighlight %}
 
 
+### Applying User-defined Function
+
+ dapply
+Apply a function to each partition of `SparkDataFrame`. The function to be 
applied to each partition of the `SparkDataFrame` and should have only one 
parameter, to which a `data.frame` corresponds to each partition will be 
passed. The output of function should be a `data.frame`.
+
+{% highlight r %}
+
+# Convert waiting time from hours to seconds.
+# Note that we can apply UDF to DataFrame.
+
+df1 <- dapply(df, function(x) {x}, schema(df))
--- End diff --

ok, I will improve this example more specifically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-14 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13660
  
cc @jkbradley @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-14 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/13660

[SPARK-15672][R][DOC] R programming guide update

## What changes were proposed in this pull request?
Guide for
- UDFs with dapply, dapplyCollect
- spark.lapply for running parallel R functions

## How was this patch tested?
build locally
https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png;>


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-15672-R-guide-update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13660


commit adee4d46551c379cb3c092d603041551f779c630
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-06T04:41:42Z

revise documentation for sparkr

commit 9081a0bf5072da5f5255f7ffcf398758ff19b46c
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-06T04:41:42Z

revise documentation for sparkr

commit 1ba263498c68ec41ec0096cdb305205a8f99f058
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-14T09:31:08Z

Merge branch 'spark-15672-R-guide-update' of github.com:vectorijk/spark 
into spark-15672-R-guide-update

commit 2611549e60f68d6ba12ec10b471dc96944508873
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-14T10:08:25Z

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-14 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66917105
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,7 +201,7 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
+#' predict
--- End diff --

@jkbradley The reason I want add title here is the title of current 
documentation is like  `Make predictions from a generalized linear model` not 
`predict`. 
![qq20160613-0 
2x](https://cloud.githubusercontent.com/assets/3419881/16033524/4ab5e288-31c1-11e6-892c-c9c15258cc05.png)
But with adding title `predict`, it looks like this
![qq20160613-1 
2x](https://cloud.githubusercontent.com/assets/3419881/16033671/24ab8fce-31c2-11e6-86a4-7b7771c10451.png)
So which one do you think is better?





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66719273
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -851,6 +849,8 @@ setMethod("nrow",
 count(x)
   })
 
+#' ncol
--- End diff --

Yes, this doesn't seem to be consistent. I reverted these since the 
description is sufficient to explain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66719192
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2766,18 +2780,21 @@ setMethod("histogram",
 return(histStats)
   })
 
-#' Saves the content of the SparkDataFrame to an external database table 
via JDBC
+#' Save the content of DataFrame to an external database table via JDBC.
 #'
-#' Additional JDBC database connection properties can be set (...)
+#' Saves the content of the SparkDataFrame to an external database table 
via JDBC. Additional JDBC
--- End diff --

I also think we should consociate the plurality of first word which is 
verbal in description


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66718939
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2766,18 +2780,21 @@ setMethod("histogram",
 return(histStats)
   })
 
-#' Saves the content of the SparkDataFrame to an external database table 
via JDBC
+#' Save the content of DataFrame to an external database table via JDBC.
 #'
-#' Additional JDBC database connection properties can be set (...)
+#' Saves the content of the SparkDataFrame to an external database table 
via JDBC. Additional JDBC
--- End diff --

It should be same and I changed both these two to plural.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66718189
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

Ok, I will do this in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-06 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65849144
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -628,8 +628,6 @@ setMethod("repartition",
 #'
 #' @param x A SparkDataFrame
 #' @return A StringRRDD of JSON objects
-#' @family SparkDataFrame functions
--- End diff --

@felixcheung I removed these two lines in toJSON part. Correct me, if I am 
wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13488: [MINOR][R][DOC] Fix R documentation generation instructi...

2016-06-05 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13488
  
Thanks

On Sun, Jun 5, 2016, 13:05 asfgit <notificati...@github.com> wrote:

> Closed #13488 <https://github.com/apache/spark/pull/13488> via 8a91105
> 
<https://github.com/apache/spark/commit/8a9110510c9e4cbbcb0dede62cb4b9dd1c6bc8cc>
> .
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/13488#event-682240859>, or mute the
> thread
> 
<https://github.com/notifications/unsubscribe/ADQu6VB435ELngH30LYYhvbF_QfbWW19ks5qIyv1gaJpZM4ItHJr>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13248: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...

2016-06-04 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13248
  
@praveendareddy21 For generating documentation for this API correctly, you 
could include this in `spark/python/docs/pyspark.ml.rst`
```
pyspark.ml.stat module


.. automodule:: pyspark.ml.stat
:members:
:undoc-members:
:inherited-members:
```

Also, just like @MechCoder said, you could run `spark/dev/lint-python` to 
make sure you pass all PEP8 checking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-04 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r65798896
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+
+>>> mu = Vectors.dense([0.0, 0.0])
+>>> sigma= DenseMatrix(2, 2, [1.0, 1.0, 1.0, 1.0])
+>>> x = Vectors.dense([1.0, 1.0])
+>>> m = MultivariateGaussian(mu, sigma)
+>>> m.pdf(x)
+0.0682586811486
+
+"""
+
+def __init__(self, mu, sigma):
+"""
+__init__(self, mu, sigma)
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+mu and sigma must be instances of DenseVector and DenseMatrix 
respectively.
+
+"""
+
+
+assert (isinstance(mu, DenseVector)), "mu must be a DenseVector 
Object"
+assert (isinstance(sigma, DenseMatrix)), "sigma must be a 
DenseMatrix Object"
+
+sigma_shape=sigma.toArray().shape
+assert (sigma_shape[0]==sigma_shape[1]) , "Covariance matrix must 
be square"
+assert (sigma_shape[0]==mu.size) , "Mean vector length must match 
covariance matrix size"
+
+# initialize eagerly precomputed attributes
+
+self.mu=mu
+
+# storing sigma as numpy.ndarray
+# furthur calculations are done ndarray only
+self.sigma=sigma.toArray()
+
+
+# initialize attributes to be computed later
+
+self.prec_U = None
+self.log_det_cov = None
+
+# compute distribution dependent constants
+self.__calculateCovarianceConstants()
+
+
+def pdf(self,x):
+"""
+Returns density of this multivariate Gaussian at a point given by 
Vector x
+"""
+assert (isinstance(x, Vector)), "x must be of Vector Type"
+return float(self.__pdf(x))
+
+def logpdf(self,x):
+"""
+Returns the log-density of this multivariate Gaussian at a point 
given by Vector x
+"""
+assert (isinstance(x, Vector)), "x must be of Vector Type"
+return float(self.__logpdf(x))
+
+def __calculateCovarianceConstants(self):
+"""
+Calculates distribution dependent components used for the density 
function
+based on scipy multivariate library
+refer 
https://github.com/scipy/scipy/blob/master/scipy/stats/_multivariate.py
--- End diff --

same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-04 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r65798890
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
--- End diff --

you could use
```

`<http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case>`_
```
to make sure link displayed correctly in documentation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-03 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65691008
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1069,6 +1079,8 @@ setMethod("first",
 #'
 #' @param x A SparkDataFrame
 #'
+#' @family SparkDataFrame functions
+#' @rdname tordd
--- End diff --

@felixcheung ok, should we also remove these two lines in `toJSON` part in 
line 631?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13488: [MINOR][R][DOC] Fix R documentation generation instructi...

2016-06-02 Thread vectorijk
Github user vectorijk commented on the issue:

https://github.com/apache/spark/pull/13488
  
cc @jkbradley @shivaram @felixcheung 
result you could see 
[here](https://github.com/vectorijk/spark/tree/R-Readme/R)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13488: [MINOR][R][DOC] Fix R documentation generation in...

2016-06-02 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/13488

[MINOR][R][DOC] Fix R documentation generation instruction.

## What changes were proposed in this pull request?
changes in R/README.md

- Make step of generating SparkR document more clear.
- link R/DOCUMENTATION.md from R/README.md
- turn on some code syntax highlight in R/README.md

## How was this patch tested?
local test




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark R-Readme

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13488


commit cc504337940f7303806a77154c2eb77713966a0e
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-06-02T23:02:13Z

changes in R/README.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-01 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65418448
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2514,7 +2529,9 @@ setMethod("attach",
 #' environment. Then, the given expression is evaluated in this new
 #' environment.
 #'
+#' @title with
--- End diff --

@jkbradley I will do it ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API ...

2016-05-31 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65284357
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2514,7 +2529,9 @@ setMethod("attach",
 #' environment. Then, the given expression is evaluated in this new
 #' environment.
 #'
+#' @title with
--- End diff --

@shivaram Yes, I also notice titles of other examples are not consistent. 
Which one should we use? Short description or just the name of the method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15177] [SparkR] [ML] SparkR 2.0 QA: New R APIs an...

2016-05-31 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13023
  
Suggested by this 
[comment](https://github.com/apache/spark/pull/13394#issuecomment-222560187), I 
was wondering if we also need to update the docs for k-means and naive bayes in 
[http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/sparkr.html](url).
 Maybe we can include that change in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API ...

2016-05-31 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13394
  
@shivaram For updating the programming guide, I'd love to do this in a 
separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API ...

2016-05-31 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65163283
  
--- Diff: R/pkg/R/stats.R ---
@@ -19,12 +19,11 @@
 
 setOldClass("jobj")
 
-#' crosstab
-#'
 #' Computes a pair-wise frequency table of the given columns. Also known 
as a contingency
 #' table. The number of distinct values for each column should be less 
than 1e4. At most 1e6
 #' non-zero pair frequencies will be returned.
 #'
+#' @title Statistic functions for SparkDataFrames
--- End diff --

I will remove title here. Meanwhile, I will leave links revise here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API ...

2016-05-31 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r65162041
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1069,7 +1080,10 @@ setMethod("first",
 #'
 #' @param x A SparkDataFrame
 #'
-#' @noRd
+#' @family SparkDataFrame functions
+#' @rdname toRDD
--- End diff --

ok, I will change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R API...

2016-05-29 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13394#issuecomment-222385824
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R API...

2016-05-29 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13394#issuecomment-222378580
  
Jenkins test this again please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R API...

2016-05-29 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13394#issuecomment-222354451
  
cc @felixcheung @shivaram @sun-rui 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R API...

2016-05-29 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/13394

[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib 
changes

## What changes were proposed in this pull request?
R Docs changes
include typos, format, layout.
## How was this patch tested?
Test locally.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-15490

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13394


commit 7961bbe8b346ae47a70fa324b18219070197ded8
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-05-29T02:09:22Z

QA for non-MLlib changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...

2016-05-25 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13284#issuecomment-221486634
  
> @shivaram I will make the change with R version check. 

@wangmiao1981  FYI, I have switched to R 3.1.3 before on Mac. It seems fail 
too.  Do you mind try it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...

2016-05-25 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/13284#issuecomment-221485421
  
We may also investigate why unit test run differently on Jenkins and local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10592] [ML] [PySpark] Deprecate weights...

2016-05-24 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/9311#issuecomment-221444959
  
@bharath-official I don't think it's really necessary because it was 
changed in Spark 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10592] [ML] [PySpark] Deprecate weights...

2016-05-24 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/9311#issuecomment-221440730
  
@bharath-official You're right. 
Aha. Like you said, it should work fine if this warning message just shows 
during the process of calling model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10592] [ML] [PySpark] Deprecate weights...

2016-05-24 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/9311#issuecomment-221437455
  
@bharath-official Try to use coefficients instead. This is warning 
information. Did you get error message? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-928][CORE] Add support for Unsafe-based...

2016-05-04 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12913#issuecomment-217031852
  
@techaddict According to 
[this](https://github.com/EsotericSoftware/kryo#-disclaimer-about-using-unsafe-based-io-),
 I am just wondering what will happen when Unsafe-based IO is not compatible 
with Kryo's Input and Output streams.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14978][PySpark] PySpark TrainValidation...

2016-04-28 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12767#discussion_r61508511
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -586,10 +589,13 @@ def test_fit_maximize_metric(self):
 tvsModel = tvs.fit(dataset)
 bestModel = tvsModel.bestModel
 bestModelMetric = evaluator.evaluate(bestModel.transform(dataset))
+validationMetrics = tvsModel.validationMetrics
 
 self.assertEqual(0.0, bestModel.getOrDefault('inducedError'),
  "Best model should have zero induced error")
 self.assertEqual(1.0, bestModelMetric, "Best model has R-squared 
of 1")
+self.assertEqual(len(grid), len(validationMetrics),
--- End diff --

same as here I suppose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14978][PySpark] PySpark TrainValidation...

2016-04-28 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12767#discussion_r61508260
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -616,6 +622,7 @@ def test_save_load(self):
 tvsModel.save(tvsModelPath)
 loadedModel = TrainValidationSplitModel.load(tvsModelPath)
 self.assertEqual(loadedModel.bestModel.uid, tvsModel.bestModel.uid)
+self.assertEqual(len(loadedModel.validationMetrics), 
len(tvsModel.validationMetrics))
--- End diff --

It would be better to compare each `validationMetric` in the list not only 
the length of `validationMetrics`   cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14978][PySpark] PySpark TrainValidation...

2016-04-28 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12767#discussion_r61507716
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -613,7 +615,9 @@ def copy(self, extra=None):
 """
 if extra is None:
 extra = dict()
-return TrainValidationSplitModel(self.bestModel.copy(extra))
+bestModel = self.bestModel.copy(extra)
+validationMetrics = self.validationMetrics
+return TrainValidationSplitModel(bestModel, validationMetrics)
--- End diff --

It would be great if you could add test case for 
`TrainValidationSplitModel` copy method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-28 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12464#discussion_r61491169
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -461,6 +461,31 @@ def _fit(self, dataset):
 
 class CrossValidatorTests(PySparkTestCase):
 
+def test_copy(self):
+sqlContext = SQLContext(self.sc)
+dataset = sqlContext.createDataFrame([
+(10, 10.0),
+(50, 50.0),
+(100, 100.0),
+(500, 500.0)] * 10,
+["feature", "label"])
+
+iee = InducedErrorEstimator()
+evaluator = RegressionEvaluator(metricName="rmse")
+
+grid = (ParamGridBuilder()
+.addGrid(iee.inducedError, [100.0, 0.0, 1.0])
+.build())
+cv = CrossValidator(estimator=iee, estimatorParamMaps=grid, 
evaluator=evaluator)
+cvCopied = cv.copy()
+self.assertEqual(cv.getEstimator().uid, 
cvCopied.getEstimator().uid)
+
+cvModel = cv.fit(dataset)
+cvModelCopied = cvModel.copy()
+for index in range(len(cvModel.avgMetrics)):
+self.assertTrue(abs(cvModel.avgMetrics[index] - 
cvModelCopied.avgMetrics[index])
--- End diff --

I have tried `assertEqual` before. This test case causes loss of precision 
under python2 if we use `assertEqual`. But under python3, it passes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-27 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12464#issuecomment-215067357
  
I also notice that `validationMetrics` in `TrainValidationSplitModel` 
should also be supported in Python.
Should we support that after this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-27 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12464#issuecomment-215065890
  
@jkbradley 25959e5 this commit is trying to
- update `metrics` in `CrossValidator` to float list (like [0.0] * number) .
- use `numpy.testing.assert_almost_equal` to assert two float lists.
- test `CrossValidator` and `CrossValidatorModel` with copy

Do you mind review again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-27 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12464#discussion_r61247126
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -534,6 +534,8 @@ def test_save_load(self):
 cvModel.save(cvModelPath)
 loadedModel = CrossValidatorModel.load(cvModelPath)
 self.assertEqual(loadedModel.bestModel.uid, cvModel.bestModel.uid)
+for index in range(len(loadedModel.avgMetrics)):
--- End diff --

@holdenk Thanks for suggestion. I used `assert_almost_equal` 
[here](https://github.com/vectorijk/spark/blob/spark-12810/python/pyspark/ml/tests.py#L561).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-19 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12464#discussion_r60221706
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -367,7 +368,9 @@ def copy(self, extra=None):
 """
 if extra is None:
 extra = dict()
-return CrossValidatorModel(self.bestModel.copy(extra))
+bestModel = self.bestModel.copy(extra)
+avgMetrics = [am.copy(extra) for am in self.avgMetrics]
--- End diff --

Sure, I will do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-17 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12464#issuecomment-211210916
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-17 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12464#issuecomment-211205576
  
cc @feynmanliang @jkbradley @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12810][PySpark] PySpark CrossValidatorM...

2016-04-17 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/12464

[SPARK-12810][PySpark] PySpark CrossValidatorModel should support avgMetrics

## What changes were proposed in this pull request?
support avgMetrics in CrossValidatorModel with Python
## How was this patch tested?
Doctest and `test_save_load` in `pyspark/ml/test.py`
[JIRA](https://issues.apache.org/jira/browse/SPARK-12810)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-12810

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12464


commit 93a43bc5007acbc9c55a3eaf591f2b16df614c68
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-04-16T19:27:44Z

supporting avgMetrics in CrossValidatorModel with Python




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-04-11 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11468#issuecomment-208559614
  
@jkbradley I have addressed all the comments. Could you review this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-04-08 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11468#issuecomment-207689752
  
@yanboliang Thanks! I have addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14373] [PySpark] PySpark RandomForestCl...

2016-04-08 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12238#issuecomment-207639513
  
Thanks! @holdenk @jkbradley @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-04-08 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11468#discussion_r59099605
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -934,6 +935,146 @@ def predict(self, features):
 return self._call_java("predict", features)
 
 
+@inherit_doc
+class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, 
HasFeaturesCol, HasPredictionCol,
+  HasFitIntercept, HasMaxIter, HasTol, 
HasRegParam, HasWeightCol,
+  HasSolver, JavaMLWritable, 
JavaMLReadable):
+"""
+Generalized Linear Regression.
+
+Fit a Generalized Linear Model specified by giving a symbolic 
description of the linear
+predictor (link function) and a description of the error distribution 
(family). It supports
+"gaussian", "binomial", "poisson" and "gamma" as family. Valid link 
functions for each family
+is listed below. The first link function of each family is the default 
one.
+- "gaussian" -> "identity", "log", "inverse"
+- "binomial" -> "logit", "probit", "cloglog"
+- "poisson"  -> "log", "identity", "sqrt"
+- "gamma"-> "inverse", "identity", "log"
+
+.. seealso:: `GLM 
<https://en.wikipedia.org/wiki/Generalized_linear_model>`_
+
+>>> from pyspark.mllib.linalg import Vectors
+>>> df = sqlContext.createDataFrame([
+... (1.0, Vectors.dense(1.0, 0.0)),
+... (1.0, Vectors.dense(1.0, 2.0)),], ["label", "features"])
+>>> glr = GeneralizedLinearRegression(family="gaussian", 
link="identity")
+>>> model = glr.fit(df)
+>>> abs(model.transform(df).head().prediction - 1.0) < 0.001
+True
+>>> model.coefficients
+DenseVector([0.0, 0.0])
+>>> abs(model.intercept - 1.0) < 0.001
+True
+>>> glr_path = temp_path + "/glr"
+>>> glr.save(glr_path)
+>>> glr2 = GeneralizedLinearRegression.load(glr_path)
+>>> glr.getFamily() == glr2.getFamily()
+True
+>>> model_path = temp_path + "/glr_model"
+>>> model.save(model_path)
+>>> model2 = GeneralizedLinearRegressionModel.load(model_path)
+>>> abs(model.intercept - model2.intercept) < 0.001
+True
--- End diff --

@yanboliang addressed your comments. Also, I modified the test data so that 
coefficients are not zero.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14373] [PySpark] PySpark RandomForestCl...

2016-04-07 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/12238#issuecomment-206906967
  
cc @jkbradley @mengxr @yanboliang It is ready for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14373] [PySpark] PySpark RandomForestCl...

2016-04-07 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/12238

[SPARK-14373] [PySpark] PySpark RandomForestClassifier, Regressor support 
export/import

## What changes were proposed in this pull request?
supporting `RandomForest{Classifier, Regressor}` save/load for Python API.
[JIRA](https://issues.apache.org/jira/browse/SPARK-14373)
## How was this patch tested?
doctest

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-14373

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12238


commit d5d21e26667ac346daa12b06de27d6d74fb5fe23
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-04-07T13:14:10Z

RandomForest{Classifier, Regressor} supports export/import for Python API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-04-05 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11468#discussion_r58511272
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -934,6 +935,146 @@ def predict(self, features):
 return self._call_java("predict", features)
 
 
+@inherit_doc
+class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, 
HasFeaturesCol, HasPredictionCol,
+  HasFitIntercept, HasMaxIter, HasTol, 
HasRegParam, HasWeightCol,
+  HasSolver, JavaMLWritable, 
JavaMLReadable):
+"""
+Generalized Linear Regression.
+
+Fit a Generalized Linear Model specified by giving a symbolic 
description of the linear
+predictor (link function) and a description of the error distribution 
(family). It supports
+"gaussian", "binomial", "poisson" and "gamma" as family. Valid link 
functions for each family
+is listed below. The first link function of each family is the default 
one.
+- "gaussian" -> "identity", "log", "inverse"
+- "binomial" -> "logit", "probit", "cloglog"
+- "poisson"  -> "log", "identity", "sqrt"
+- "gamma"-> "inverse", "identity", "log"
+
+.. seealso:: `GLM 
<https://en.wikipedia.org/wiki/Generalized_linear_model>`_
+
+>>> from pyspark.mllib.linalg import Vectors
+>>> df = sqlContext.createDataFrame([
+... (1.0, Vectors.dense(1.0, 0.0)),
+... (1.0, Vectors.dense(1.0, 2.0)),], ["label", "features"])
+>>> glr = GeneralizedLinearRegression(family="gaussian", 
link="identity")
+>>> model = glr.fit(df)
+>>> abs(model.transform(df).head().prediction - 1.0) < 0.001
+True
+>>> model.coefficients
+DenseVector([0.0, 0.0])
+>>> abs(model.intercept - 1.0) < 0.001
+True
+>>> glr_path = temp_path + "/glr"
+>>> glr.save(glr_path)
+>>> glr2 = GeneralizedLinearRegression.load(glr_path)
+>>> glr.getFamily() == glr2.getFamily()
+True
+>>> model_path = temp_path + "/glr_model"
+>>> model.save(model_path)
+>>> model2 = GeneralizedLinearRegressionModel.load(model_path)
+>>> abs(model.intercept - model2.intercept) < 0.001
+True
+
+.. versionadded:: 2.0.0
+"""
+
+family = Param(Params._dummy(), "family", "The name of family which is 
a description of " +
+   "the error distribution to be used in the model. 
Supported options: " +
+   "gaussian(default), binomial, poisson and gamma.")
+link = Param(Params._dummy(), "link", "The name of link function which 
provides the " +
+ "relationship between the linear predictor and the mean 
of the distribution " +
+ "function. Supported options: identity, log, inverse, 
logit, probit, cloglog " +
+ "and sqrt.")
+
+@keyword_only
+def __init__(self, labelCol="label", featuresCol="features", 
predictionCol="prediction",
+ family="gaussian", link="identity", fitIntercept=True, 
maxIter=25, tol=1e-6,
+ regParam=0.0, weightCol=None, solver="irls"):
+"""
+__init__(self, labelCol="label", featuresCol="features", 
predictionCol="prediction", \
+ family="gaussian", link="identity", fitIntercept=True, 
maxIter=25, tol=1e-6, \
+ regParam=0.0, weightCol=None, solver="irls")
+"""
+super(GeneralizedLinearRegression, self).__init__()
+self._java_obj = self._new_java_obj(
+"org.apache.spark.ml.regression.GeneralizedLinearRegression", 
self.uid)
+self._setDefault(family="gaussian", maxIter=25, tol=1e-6, 
regParam=0.0, solver="irls")
--- End diff --

@yanboliang addressed what you mentioned. Could you review this again? Thx!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-04-05 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11468#issuecomment-205730049
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12461] [SQL] Add ExpressionDescription ...

2016-04-01 Thread vectorijk
Github user vectorijk closed the pull request at:

https://github.com/apache/spark/pull/10489


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12461] [SQL] Add ExpressionDescription ...

2016-04-01 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/10489#issuecomment-204528226
  
I will close this PR at this time. If needed, I might re-open this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-04-01 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11468#issuecomment-204452107
  
Sorry about late response. Yes, i will catch this today.

On Fri, Apr 1, 2016, 09:02 Yanbo Liang <notificati...@github.com> wrote:

    > @vectorijk <https://github.com/vectorijk> Do you have time to update this
> PR? If not, I can help.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/11468#issuecomment-204450309>
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-03-02 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11468#discussion_r54834301
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -857,6 +858,146 @@ def predict(self, features):
 return self._call_java("predict", features)
 
 
+@inherit_doc
+class GeneralizedLinearRegression(JavaEstimator, HasLabelCol, 
HasFeaturesCol, HasPredictionCol,
+  HasFitIntercept, HasMaxIter, HasTol, 
HasRegParam, HasWeightCol,
+  HasSolver):
+"""
+Generalized Linear Regression.
+
+Fit a Generalized Linear Model specified by giving a symbolic 
description of the linear
+predictor (link function) and a description of the error distribution 
(family). It supports
+"gaussian", "binomial", "poisson" and "gamma" as family. Valid link 
functions for each family
+is listed below. The first link function of each family is the default 
one.
+- "gaussian" -> "identity", "log", "inverse"
+- "binomial" -> "logit", "probit", "cloglog"
+- "poisson"  -> "log", "identity", "sqrt"
+- "gamma"-> "inverse", "identity", "log"
+
+.. seealso:: `GLM 
<https://en.wikipedia.org/wiki/Generalized_linear_model>`_
+
+>>> from pyspark.mllib.linalg import Vectors
+>>> df = sqlContext.createDataFrame([
+... (17.05224, Vectors.dense(3.55954, 11.19528)),
+... (13.46161, Vectors.dense(2.34561, 9.65407)),
+... (17.13384, Vectors.dense(3.37980, 12.03069)),
+... (13.84938, Vectors.dense(2.51969, 9.64902)),], ["label", 
"features"])
+>>> glr = GeneralizedLinearRegression()
+>>> model = glr.setFamily("gaussian").setLink("identity").fit(df)
+>>> model.transform(df).show()
+++--+--+
+|   label|  features|prediction|
+++--+--+
+|17.05224|[3.55954,11.19528]|17.052776698886376|
+|13.46161| [2.34561,9.65407]|13.463078911930246|
+|17.13384| [3.3798,12.03069]| 17.13348844246882|
+|13.84938| [2.51969,9.64902]|13.847725946714558|
+++--+--+
+...
+>>> model.coefficients
+DenseVector([2.2263, 0.5756])
+>>> model.intercept
+2.6841196897757795
+
+.. versionadded:: 2.0.0
+"""
+
+family = Param(Params._dummy(), "family", "The name of family which is 
a description of " +
+   "the error distribution to be used in the model. 
Supported options: " +
+   "gaussian(default), binomial, poisson and gamma.")
+link = Param(Params._dummy(), "link", "The name of link function which 
provides the " +
+ "relationship between the linear predictor and the mean 
of the distribution " +
+ "function. Supported options: identity, log, inverse, 
logit, probit, cloglog " +
+ "and sqrt.")
+
+@keyword_only
+def __init__(self, labelCol="label", featuresCol="features", 
predictionCol="prediction",
+ fitIntercept=True, maxIter=25, tol=1e-6, regParam=0.0, 
weightCol=None,
+ solver="irls"):
+"""
+__init__(self, labelCol="label", featuresCol="features", 
predictionCol="prediction", \
+ fitIntercept=True, maxIter=25, tol=1e-6, regParam=0.0, 
weightCol=None, \
+ solver="irls")
+"""
+super(GeneralizedLinearRegression, self).__init__()
+self._java_obj = self._new_java_obj(
+"org.apache.spark.ml.regression.GeneralizedLinearRegression", 
self.uid)
+self._setDefault(family="gaussian", link="identity")
--- End diff --

You mean the link would be set once family set, right? If so, could we just 
let `link` empty and don't pass it anything?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-03-02 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11468#issuecomment-191228337
  
cc @mengxr @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13597][PySpark][ML] Python API for Gene...

2016-03-02 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/11468

[SPARK-13597][PySpark][ML] Python API for GeneralizedLinearRegression

## What changes were proposed in this pull request?

Python API for GeneralizedLinearRegression
JIRA: https://issues.apache.org/jira/browse/SPARK-13597

## How was this patch tested?

The patch is tested with Python doctest.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-13597

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11468


commit af5f7ebc17e5e64b3bf3a7c05d3c02c26dbca801
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-03-02T12:03:30Z

[SPARK-13597][PySpark][ML] Python API for GeneralizedLinearRegression




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-24 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11321#issuecomment-188658913
  
@mengxr, Thanks for replying!

Definitely, I will post a rough draft proposal on JIRA later.

On Wed, Feb 24, 2016 at 11:31 PM, Xiangrui Meng <notificati...@github.com>
wrote:

> LGTM. Merged into master. Thanks!
>
> For GSoC, I created https://issues.apache.org/jira/browse/SPARK-13489 to
> collect some project ideas. Let's move our discussion there. If I don't
> have time to mentor a GSoC project, other committers might be interested.
> Could you prepare a draft proposal and post it on the JIRA?
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/11321#issuecomment-188651306>.
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-24 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11321#discussion_r53924161
  
--- Diff: python/pyspark/mllib/fpm.py ---
@@ -40,6 +41,11 @@ class FPGrowthModel(JavaModelWrapper):
 >>> model = FPGrowth.train(rdd, 0.6, 2)
 >>> sorted(model.freqItemsets().collect())
 [FreqItemset(items=[u'a'], freq=4), FreqItemset(items=[u'c'], freq=3), 
...
+>>> model_path = temp_path + "/fpg_model"
--- End diff --

ok, I have done this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-23 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11321#issuecomment-188006126
  
@mengxr Thanks, I didn't notice that. Addressed comment.
Also, I take a look at 9ca79c1 and it only moved cleanup temp file code for 
doctest under ml directory. Meanwhile, it seems like we should do the same 
thing under mllib. Should we also need to create a JIRA issue for this?

_off-topic_

I was wondering if Spark community would be interested in mentoring 
students for Google Summer of Code(GSoC) under Apache Software Foundation this 
year. Last time, I was impressed by the MechCoder's project mentored by 
@mengxr. Therefore, I look forward to having a chance to do something 
interesting and continue to contribute to codebase during this summer.

@mengxr, Are you still interested in mentoring a MLlib PySpark related 
project this summer? If so, I am very willing to brainstorm with you and others 
on JIRA about what could be probably worked on during this summer. And I might 
start to write the proposal for GSoC. If there are some pre-GSoC issues related 
to project, I would love to work on those.

P.S Here is the 
[post](http://apache-spark-developers-list.1001551.n3.nabble.com/GSoC-Interested-in-GSoC-2016-ideas-td16224.html)
 I published on dev mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-23 Thread vectorijk
GitHub user vectorijk opened a pull request:

https://github.com/apache/spark/pull/11321

[SPARK-7106][MLlib][PySpark] Support model save/load in Python's FPGrowth

## What changes were proposed in this pull request?

Python API supports mode save/load in FPGrowth
JIRA: 
[https://issues.apache.org/jira/browse/SPARK-7106](https://issues.apache.org/jira/browse/SPARK-7106)
## How was the this patch tested?

The patch is tested with Python doctest.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vectorijk/spark spark-7106

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11321


commit b7145ac7e09d4dcc9b5b5874c59d48cf9e0f0860
Author: Kai Jiang <jiang...@gmail.com>
Date:   2016-02-21T05:06:59Z

[SPARK-7106][MLlib][PySpark] Support model save/load in Python's FPGrowth




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-23 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11321#issuecomment-187676196
  
cc @mengxr @yanboliang  Could you take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13037][ML][PySpark] PySpark ml.recommen...

2016-02-22 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11044#issuecomment-187588613
  
_off-topic_

Hi Spark Devs!

I was wondering if Spark community would be interested in mentoring 
students for Google Summer of Code(GSoC) under Apache Software Foundation this 
year. Last time, I was impressed by the MechCoder's project mentored by 
@mengxr. Therefore, I look forward to having a chance to do something 
interesting and continue to contribute to codebase during this summer.

@mengxr, Are you still interested in mentoring a MLlib PySpark related 
project this summer? If so, I am very willing to brainstorm with you and others 
on JIRA about what could be probably worked on during this summer. And I might 
start to write the proposal for GSoC. If there are some pre-GSoC issues related 
to project, I would love to work on those.

P.S Here is the 
[post](http://apache-spark-developers-list.1001551.n3.nabble.com/GSoC-Interested-in-GSoC-2016-ideas-td16224.html)
 I published on dev mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-02-20 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-186738840
  
Thanks so much for suggestion! I will open a new PR for update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-02-19 Thread vectorijk
Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-186523636
  
Ok, Sure. I will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-02-19 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10527#discussion_r53545904
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1931,6 +1931,42 @@ object functions extends LegacyFunctions {
 new Murmur3Hash(cols.map(_.expr))
   }
 
+  /**
+   * Encrypts input using AES and Returns the result as a binary column.
+   * Key lengths of 128, 192 or 256 bits can be used. 192 and 256 bits 
keys can be used if Java
+   * Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy 
Files are installed. If
+   * either argument is NULL, the result will also be null. If input is 
invalid, key length is not
+   * one of the permitted values or using 192/256 bits key before 
installing JCE, an exception will
+   * be thrown.
+   *
+   * @param input binary column to encrypt input
+   * @param key binary column of 128, 192 or 256 bits key
+   *
+   * @group misc_funcs
+   * @since 2.0.0
+   */
+  def aes_encrypt(input: Column, key: Column): Column = withExpr {
--- End diff --

I also think so. The examples is just one of those cases. Key also could be 
`abcdef1234567890`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-02-19 Thread vectorijk
Github user vectorijk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10527#discussion_r53442582
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscFunctionsSuite.scala
 ---
@@ -132,4 +132,88 @@ class MiscFunctionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
   }
+
+  test("aesEncrypt") {
+val expr1 = AesEncrypt(Literal("ABC".getBytes), 
Literal("1234567890123456".getBytes))
+val expr2 = AesEncrypt(Literal("".getBytes), 
Literal("1234567890123456".getBytes))
+
+checkEvaluation(Base64(expr1), "y6Ss+zCYObpCbgfWfyNWTw==")
+checkEvaluation(Base64(expr2), "BQGHoM3lqYcsurCRq3PlUw==")
+
+// input is null
+checkEvaluation(AesEncrypt(Literal.create(null, BinaryType),
+  Literal("1234567890123456".getBytes)), null)
+// key is null
+checkEvaluation(AesEncrypt(Literal("ABC".getBytes),
+  Literal.create(null, BinaryType)), null)
+// both are null
+checkEvaluation(AesEncrypt(Literal.create(null, BinaryType),
+  Literal.create(null, BinaryType)), null)
+
+val expr3 = AesEncrypt(Literal("ABC".getBytes), 
Literal("1234567890".getBytes))
+// key length (80 bits) is not one of the permitted values (128, 192 
or 256 bits)
+intercept[java.security.InvalidKeyException] {
+  evaluate(expr3)
+}
+intercept[java.security.InvalidKeyException] {
+  UnsafeProjection.create(expr3::Nil).apply(null)
+}
+  }
+
+  test("aesDecrypt") {
+val expr1 = AesDecrypt(UnBase64(Literal("y6Ss+zCYObpCbgfWfyNWTw==")),
+  Literal("1234567890123456".getBytes))
+val expr2 = AesDecrypt(UnBase64(Literal("BQGHoM3lqYcsurCRq3PlUw==")),
+  Literal("1234567890123456".getBytes))
+
+checkEvaluation(expr1, "ABC")
+checkEvaluation(expr2, "")
+
+// input is null
+checkEvaluation(AesDecrypt(UnBase64(Literal.create(null, StringType)),
+  Literal("1234567890123456".getBytes)), null)
+// key is null
+
checkEvaluation(AesDecrypt(UnBase64(Literal("y6Ss+zCYObpCbgfWfyNWTw==")),
+  Literal.create(null, BinaryType)), null)
+// both are null
+checkEvaluation(AesDecrypt(UnBase64(Literal.create(null, StringType)),
+  Literal.create(null, BinaryType)), null)
+
+val expr3 = AesDecrypt(UnBase64(Literal("y6Ss+zCYObpCbgfWfyNWTw==")),
+  Literal("1234567890".getBytes))
+val expr4 = 
AesDecrypt(UnBase64(Literal("y6Ss+zCsdYObpCbgfWfyNW3Twewr")),
+  Literal("1234567890123456".getBytes))
+val expr5 = AesDecrypt(UnBase64(Literal("t6Ss+zCYObpCbgfWfyNWTw==")),
+  Literal("1234567890123456".getBytes))
+
+// key length (80 bits) is not one of the permitted values (128, 192 
or 256 bits)
+intercept[java.security.InvalidKeyException] {
+  evaluate(expr3)
+}
+intercept[java.security.InvalidKeyException] {
+  UnsafeProjection.create(expr3::Nil).apply(null)
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >