[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16015
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16015
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69181/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16015
  
Unknown Jenkins failure.
```
Traceback (most recent call last):
  File "./dev/run-tests-jenkins.py", line 232, in 
main()
  File "./dev/run-tests-jenkins.py", line 219, in main
test_result_code, test_result_note = run_tests(tests_timeout)
  File "./dev/run-tests-jenkins.py", line 140, in run_tests
test_result_note = ' * This patch **fails %s**.' % 
failure_note_by_errcode[test_result_code]
KeyError: -9
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16015
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16015
  
**[Test build #69182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69182/consoleFull)**
 for PR 16015 at commit 
[`9d965e7`](https://github.com/apache/spark/commit/9d965e74be85dcb1ae75ee102ee63a15c411a4d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15963: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15963
  
Ping @AnthonyTruchet


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15960: [SPARK-18521] Add `NoRedundantStringInterpolator` Scala ...

2016-11-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15960
  
Let's close this for now and consider changing these on a case-by-case 
basis if it appears to be a real performance bottleneck or while the code is 
being changed otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16014#discussion_r89668616
  
--- Diff: dev/create-release/release-build.sh ---
@@ -189,6 +189,9 @@ if [[ "$1" == "package" ]]; then
   SHA512 $PYTHON_DIST_NAME > \
   $PYTHON_DIST_NAME.sha
 
+echo "Copying R source package"
+cp spark-$SPARK_VERSION-bin-$NAME/R/SparkR_$SPARK_VERSION.tar.gz .
--- End diff --

For clarity, this is the heart of the change? we were including R source 
before in releases, right, at least the source release? does this add something 
different?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-26 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89669049
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val (returnPercentileArray: Boolean, percentages: 
Seq[Number]) =
+evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override def dataType: DataType =
+if (returnPercentileArray) ArrayType(DoubleType) else DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(NumericType, TypeCollection(NumericType, ArrayType))
+
+  override def checkInputDataTypes(): TypeCheckResult =
+TypeUtils.checkForNumericExpr(child.dataType, "function percentile")
+
+  override def createAggregationBuffer(): Countings = {
+// Initialize new Countings instance here.
+Countings()
+  }
+
+  private def evalPercentages(expr: Expression): (Boolean, Seq[Number]) = {
+val (isArrayType, values) = (expr.da

[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16015
  
**[Test build #69182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69182/consoleFull)**
 for PR 16015 at commit 
[`9d965e7`](https://github.com/apache/spark/commit/9d965e74be85dcb1ae75ee102ee63a15c411a4d8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class OuterReference(e: NamedExpression)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16015
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69182/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...

2016-11-26 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15913
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15913
  
**[Test build #69183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69183/consoleFull)**
 for PR 15913 at commit 
[`864be6e`](https://github.com/apache/spark/commit/864be6e1fd09080af0234800bf26d1e248e245d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated me...

2016-11-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15913#discussion_r89670270
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -432,24 +418,26 @@ private[ml] trait GBTParams extends 
TreeEnsembleParams with HasMaxIter with HasS
   // final val validationTol: DoubleParam = new DoubleParam(this, 
"validationTol", "")
   // validationTol -> 1e-5
 
-  setDefault(maxIter -> 20, stepSize -> 0.1)
-
   /** @group setParam */
   def setMaxIter(value: Int): this.type = set(maxIter, value)
 
   /**
-   * Step size (a.k.a. learning rate) in interval (0, 1] for shrinking the 
contribution of each
-   * estimator.
+   * Param for Step size (a.k.a. learning rate) in interval (0, 1] for 
shrinking
+   * the contribution of each estimator.
* (default = 0.1)
-   * @group setParam
+   * @group param
*/
+  final val stepSize: DoubleParam = new DoubleParam(this, "stepSize", 
"Step size " +
+"(a.k.a. learning rate) in interval (0, 1] for shrinking the 
contribution of each estimator.",
+ParamValidators.inRange(0, 1, lowerInclusive = false, upperInclusive = 
true))
+
+  /** @group getParam */
+  final def getStepSize: Double = $(stepSize)
+
+  /** @group setParam */
   def setStepSize(value: Double): this.type = set(stepSize, value)
--- End diff --

Yeah, I understand what you mean. If we would like to correct the setter 
methods in traits, we  involves changes to lots of traits which include 
```DecisionTreeParams```, ```TreeClassifierParams```, 
```TreeRegressorParams```, ```RandomForestParams```, ```GBTParams```, etc. So i 
will merge this firstly after it pass Jenkins and address this issue in a 
separate follow-up PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16009
  
**[Test build #69184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69184/consoleFull)**
 for PR 16009 at commit 
[`0b1cced`](https://github.com/apache/spark/commit/0b1cced9324db61d5a592410e55f20725bfafa30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scal...

2016-11-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16009#discussion_r89671075
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -49,15 +49,13 @@ private[feature] trait ChiSqSelectorParams extends 
Params
*
* @group param
*/
-  @Since("1.6.0")
--- End diff --

Usually we don't add ```since``` tag to variables and functions in traits, 
since they may be inherited by new child classes later on and the tag is 
incorrect for them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16009
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69184/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16009
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16009
  
**[Test build #69184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69184/consoleFull)**
 for PR 16009 at commit 
[`0b1cced`](https://github.com/apache/spark/commit/0b1cced9324db61d5a592410e55f20725bfafa30).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16009
  
**[Test build #69185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69185/consoleFull)**
 for PR 16009 at commit 
[`eae0d2c`](https://github.com/apache/spark/commit/eae0d2c6a58a7002f8558a7b18a4a277abd510a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-11-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15297
  
This is a really big change - and handling skewed data in joins is 
certainly an important consideration - have you considered making a design 
document and running it by the dev list? Maybe something similar to the 
recently proposed Spark Improvement Proposals process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15961: [SPARK-18523][PySpark]Make SparkContext.stop more...

2016-11-26 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15961#discussion_r89671365
  
--- Diff: python/pyspark/context.py ---
@@ -373,8 +375,19 @@ def stop(self):
 Shut down the SparkContext.
 """
 if getattr(self, "_jsc", None):
-self._jsc.stop()
-self._jsc = None
+try:
+self._jsc.stop()
+except Py4JError:
+# Case: SPARK-18523
+warnings.warn(
+'Unable to cleanly shutdown Spark JVM process.'
+' It is possible that the process has crashed'
+' or been killed, but may also be in a zombie state.',
--- End diff --

sgtm either way :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #69186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69186/consoleFull)**
 for PR 14136 at commit 
[`b0aabf9`](https://github.com/apache/spark/commit/b0aabf9824b85f1d249b25870ccda9a3a79d9691).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15913
  
**[Test build #69183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69183/consoleFull)**
 for PR 15913 at commit 
[`864be6e`](https://github.com/apache/spark/commit/864be6e1fd09080af0234800bf26d1e248e245d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15913
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69183/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15913
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...

2016-11-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15817
  
Thanks for working on this @techaddict - one super minor point , but could 
you also maybe update the PR description to mention the testing is done with 
new doctests? This is really minor but for people skimming the changelog the PR 
description will end up as the commit message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods f...

2016-11-26 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15913
  
Merged into master and branch-2.1. Thanks for all reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated me...

2016-11-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15913


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15961: [SPARK-18523][PySpark]Make SparkContext.stop more reliab...

2016-11-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15961
  
This looks good to me pending @rxins proposed wording change. I think 
restarting a stopped SparkContext which has been killed by an OOM killer or 
other issue is probably not a good thing to generally do and the warning text 
makes it clear enough that the machine may be in a bad state so the user can 
investigate if necessary.

While I'm unlikely to use this, it sounds like it could make life easier 
for some notebook users to avoid having to restart their notebook kernel. 
Thanks for taking the time on this PR @kxepal :)

Note: that if you do have automatic retry logic that likely should not be 
implemented this way - there is a chance this might lead to a very bad state 
(hence the warning message). [Just mentioning this since it was mentioned as 
one of the possible uses in our discussion].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15871: [SPARK-17116][Pyspark] Allow parameters to be {string,va...

2016-11-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15871
  
@aditya1702 - thanks that sounds like a good reason for us to be consistent 
between the two. Let me know how adding the tests goes :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-11-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/14579
  
just a gentle ping - would be cool to add this for 2.1 we have the time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16009
  
**[Test build #69185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69185/consoleFull)**
 for PR 16009 at commit 
[`eae0d2c`](https://github.com/apache/spark/commit/eae0d2c6a58a7002f8558a7b18a4a277abd510a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69185/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scala APIs,...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16009
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89672415
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, DataOutputStream}
+import java.util
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val returnPercentileArray = 
percentageExpression.dataType.isInstanceOf[ArrayType]
+
+  @transient
+  private lazy val percentages = evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override lazy val dataType: DataType = percentageExpression.dataType 
match {
+case _: ArrayType => ArrayType(DoubleType, false)
+case _ => DoubleType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = 
percentageExpression.dataType match {
+case _: ArrayType => Seq(NumericType, ArrayType(DoubleType, false))
+case _ => Seq(NumericType, DoubleType)
+  }
+
   

[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89672476
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, DataOutputStream}
+import java.util
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val returnPercentileArray = 
percentageExpression.dataType.isInstanceOf[ArrayType]
--- End diff --

Mark it `@transient`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89672494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, DataOutputStream}
+import java.util
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val returnPercentileArray = 
percentageExpression.dataType.isInstanceOf[ArrayType]
+
+  @transient
+  private lazy val percentages = evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override lazy val dataType: DataType = percentageExpression.dataType 
match {
+case _: ArrayType => ArrayType(DoubleType, false)
+case _ => DoubleType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = 
percentageExpression.dataType match {
+case _: ArrayType => Seq(NumericType, ArrayType(DoubleType, false))
+case _ => Seq(NumericType, DoubleType)
+  }
+
   

[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89672499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, DataOutputStream}
+import java.util
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val returnPercentileArray = 
percentageExpression.dataType.isInstanceOf[ArrayType]
+
+  @transient
+  private lazy val percentages = evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override lazy val dataType: DataType = percentageExpression.dataType 
match {
+case _: ArrayType => ArrayType(DoubleType, false)
+case _ => DoubleType
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = 
percentageExpression.dataType match {
+case _: ArrayType => Seq(NumericType, ArrayType(DoubleType, false))
+case _ => Seq(NumericType, DoubleType)
+  }
+
   

[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89672530
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val (returnPercentileArray: Boolean, percentages: 
Seq[Number]) =
+evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override def dataType: DataType =
+if (returnPercentileArray) ArrayType(DoubleType) else DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(NumericType, TypeCollection(NumericType, ArrayType))
+
+  override def checkInputDataTypes(): TypeCheckResult =
+TypeUtils.checkForNumericExpr(child.dataType, "function percentile")
+
+  override def createAggregationBuffer(): Countings = {
+// Initialize new Countings instance here.
+Countings()
+  }
+
+  private def evalPercentages(expr: Expression): (Boolean, Seq[Number]) = {
+val (isArrayType, values) = (expr.dat

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #69186 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69186/consoleFull)**
 for PR 14136 at commit 
[`b0aabf9`](https://github.com/apache/spark/commit/b0aabf9824b85f1d249b25870ccda9a3a79d9691).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69186/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-26 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r89672577
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -603,7 +603,14 @@ case class ExternalMapToCatalyst private(
 
   override def foldable: Boolean = false
 
-  override def dataType: MapType = MapType(keyConverter.dataType, 
valueConverter.dataType)
+  override def dataType: MapType = {
+val isPrimitiveType = valueType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType |
+FloatType | DoubleType => true
+  case _ => false
+}
+MapType(keyConverter.dataType, valueConverter.dataType, 
!isPrimitiveType)
--- End diff --

I fixed failures for a nested struct based on the [suggested 
approach](https://github.com/apache/spark/pull/15780#discussion_r88587549) 
adding an argument to `Invoke`.
@cloud-fan @ueshin @viirya Is it OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15941: [WIP][SQL][DOC] Fix incorrect `code` tag

2016-11-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15941
  
Merged to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15941: [WIP][SQL][DOC] Fix incorrect `code` tag

2016-11-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15941


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter met...

2016-11-26 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/16017

[SPARK-18592][ML] Move DT/RF/GBT Param setter methods to subclasses

## What changes were proposed in this pull request?
Mainly two changes:
* Move DT/RF/GBT Param setter methods to subclasses.
* Deprecate corresponding setter methods in the model classes.

See discussion here 
https://github.com/apache/spark/pull/15913#discussion_r89662469.

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-18592

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16017


commit 7e52a0856c1381cd751ddc10f0195866bd40b404
Author: Yanbo Liang 
Date:   2016-11-26T15:36:23Z

Put DT/RF/GBT setter methods into each subclass.

commit 39cbf4267d9d136f6a16f85e8e2d88939a35e22f
Author: Yanbo Liang 
Date:   2016-11-26T15:46:40Z

Deprecate DT/RF/GBT setter methods in the Model classes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-26 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16017
  
cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16017
  
**[Test build #69187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69187/consoleFull)**
 for PR 16017 at commit 
[`39cbf42`](https://github.com/apache/spark/commit/39cbf4267d9d136f6a16f85e8e2d88939a35e22f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15913: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated me...

2016-11-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15913#discussion_r89673786
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -432,24 +418,26 @@ private[ml] trait GBTParams extends 
TreeEnsembleParams with HasMaxIter with HasS
   // final val validationTol: DoubleParam = new DoubleParam(this, 
"validationTol", "")
   // validationTol -> 1e-5
 
-  setDefault(maxIter -> 20, stepSize -> 0.1)
-
   /** @group setParam */
   def setMaxIter(value: Int): this.type = set(maxIter, value)
 
   /**
-   * Step size (a.k.a. learning rate) in interval (0, 1] for shrinking the 
contribution of each
-   * estimator.
+   * Param for Step size (a.k.a. learning rate) in interval (0, 1] for 
shrinking
+   * the contribution of each estimator.
* (default = 0.1)
-   * @group setParam
+   * @group param
*/
+  final val stepSize: DoubleParam = new DoubleParam(this, "stepSize", 
"Step size " +
+"(a.k.a. learning rate) in interval (0, 1] for shrinking the 
contribution of each estimator.",
+ParamValidators.inRange(0, 1, lowerInclusive = false, upperInclusive = 
true))
+
+  /** @group getParam */
+  final def getStepSize: Double = $(stepSize)
+
+  /** @group setParam */
   def setStepSize(value: Double): this.type = set(stepSize, value)
--- End diff --

@jkbradley I have sent #16017 to fix this issue, please feel free to 
comment that. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #3439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3439/consoleFull)**
 for PR 16013 at commit 
[`73fcd35`](https://github.com/apache/spark/commit/73fcd355a565c5ea433b1f8ca11e08ee6c3f2a9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #69188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69188/consoleFull)**
 for PR 14136 at commit 
[`5b8cd4d`](https://github.com/apache/spark/commit/5b8cd4d5ba5b2cec4e7dac45a9831303f52a84ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89674331
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -155,7 +155,7 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
* to the right except for the last which is closed
*  e.g. for the array
*  [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
-   *  e.g 1<=x<10 , 10<=x<20, 20<=x<=50
+   *  e.g 1<=x<10 , 10<=x<20, 20<=x<=50
--- End diff --

note to myself, It seems inlined tags such as 

```
{@code ... < ...}
```

and

```
{@literal >}
```

work also okay for both but they are valid ones for javadoc. For scaladoc, 
they are dealt with monospace text (like ` `<` ` or `` `... < ...` ``). As 
genjavadoc seems not replacing it, it seems they work apparently. I guess we 
should avoid those though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16017
  
**[Test build #69187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69187/consoleFull)**
 for PR 16017 at commit 
[`39cbf42`](https://github.com/apache/spark/commit/39cbf4267d9d136f6a16f85e8e2d88939a35e22f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16017
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69187/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89675432
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -103,7 +103,8 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
* @param withReplacement can elements be sampled multiple times 
(replaced when sampled out)
* @param fraction expected size of the sample as a fraction of this 
RDD's size
*  without replacement: probability that each element is chosen; 
fraction must be [0, 1]
-   *  with replacement: expected number of times each element is chosen; 
fraction must be >= 0
+   *  with replacement: expected number of times each element is chosen; 
fraction must be greater
+   *  than or equal to 0
--- End diff --

I can work around this to `{@code >=}` if this looks too verbose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69189/consoleFull)**
 for PR 16013 at commit 
[`a2a2011`](https://github.com/apache/spark/commit/a2a2011da220b272121b01734bbf640567ef6ae3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89675600
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -750,8 +751,10 @@ abstract class RDD[T: ClassTag](
*print line function (like out.println()) as 
the 2nd parameter.
*An example of pipe the RDD data of groupBy() 
in a streaming way,
*instead of constructing a huge String to 
concat all the elements:
-   *def printRDDElement(record:(String, 
Seq[String]), f:String=>Unit) =
-   *  for (e <- record._2) {f(e)}
+   *{{{
+   *def printRDDElement(record:(String, 
Seq[String]), f:String=>Unit) =
+   *  for (e <- record._2) {f(e)}
+   *}}}
--- End diff --

- Scala
  https://cloud.githubusercontent.com/assets/6477701/20642307/492d3acc-b44f-11e6-8ba2-8ed6276324f6.png";>

- Java
  https://cloud.githubusercontent.com/assets/6477701/20642309/4d80d8a4-b44f-11e6-8710-60f6b10f42cf.png";>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69190/consoleFull)**
 for PR 16013 at commit 
[`ee3b96b`](https://github.com/apache/spark/commit/ee3b96b82d5fab48c14754c13a826c7507bcbef8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89675629
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1184,8 +1187,13 @@ abstract class RDD[T: ClassTag](
*
* @note This method should only be used if the resulting map is 
expected to be small, as
* the whole thing is loaded into the driver's memory.
-   * To handle very large results, consider using rdd.map(x => (x, 
1L)).reduceByKey(_ + _), which
-   * returns an RDD[T, Long] instead of a map.
+   * To handle very large results, consider using
+   *
+   * {{{
+   * rdd.map(x => (x, 1L)).reduceByKey(_ + _)
+   * }}},
--- End diff --

- Scala

  https://cloud.githubusercontent.com/assets/6477701/20642332/8a64265e-b44f-11e6-9406-02d6c32ab710.png";>


- Java

  https://cloud.githubusercontent.com/assets/6477701/20642333/8cf8f728-b44f-11e6-9019-1d773244aee7.png";>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-11-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #69188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69188/consoleFull)**
 for PR 14136 at commit 
[`5b8cd4d`](https://github.com/apache/spark/commit/5b8cd4d5ba5b2cec4e7dac45a9831303f52a84ba).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89675738
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -422,8 +422,13 @@ private[spark] object UIUtils extends Logging {
* the whole string will rendered as a simple escaped text.
*
* Note: In terms of security, only anchor tags with root relative links 
are supported. So any
-   * attempts to embed links outside Spark UI, or other tags like