[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79639/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79639/testReport)**
 for PR 18555 at commit 
[`3983744`](https://github.com/apache/spark/commit/3983744794359f36b20111d5d7e60d042c027426).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18644: case class should be independent

2017-07-15 Thread piyushknoldus
GitHub user piyushknoldus opened a pull request:

https://github.com/apache/spark/pull/18644

case class should be independent

Scala style says case class should be independent not be part of any object 
. so it could be access outside of that object , without interrupting it .

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/piyushknoldus/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18644


commit 80f95c93f1bafb2d5359c1f2d6528603d56e324d
Author: Piyush Rana 
Date:   2017-07-16T06:26:37Z

case class should be independent

Scala style says case class should be independent not be part of any object 
. so it could be access outside of that object , without interrupting it .




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18644: case class should be independent

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18644
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18615: [SPARK-21394][PYTHON] Reviving callable object support i...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18615
  
(gentle ping @holdenk)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18643: [SPARK-21426] [2.0] [SQL] [TEST] Fix test failure due to...

2017-07-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18643
  
cc @cloud-fan @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2017-07-15 Thread j143
Github user j143 commented on the issue:

https://github.com/apache/spark/pull/14995
  
Hi @witgo, Did you test this PR in production environment. If yes, can you 
share the results on this 
[jira](https://issues.apache.org/jira/browse/SPARK-6235). If you have problems 
with downloading this branch please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79639/testReport)**
 for PR 18555 at commit 
[`3983744`](https://github.com/apache/spark/commit/3983744794359f36b20111d5d7e60d042c027426).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18643: [SPARK-21426] [2.0] [SQL] [TEST] Fix test failure due to...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18643
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79638/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18643: [SPARK-21426] [2.0] [SQL] [TEST] Fix test failure due to...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18643
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18643: [SPARK-21426] [2.0] [SQL] [TEST] Fix test failure due to...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18643
  
**[Test build #79638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79638/consoleFull)**
 for PR 18643 at commit 
[`c40bfaf`](https://github.com/apache/spark/commit/c40bfaf4f479c78719083ebbd8a8bf2738bf36e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18643: [SPARK-21426] [2.0] [SQL] [TEST] Fix test failure due to...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18643
  
**[Test build #79638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79638/consoleFull)**
 for PR 18643 at commit 
[`c40bfaf`](https://github.com/apache/spark/commit/c40bfaf4f479c78719083ebbd8a8bf2738bf36e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18643: [SPARK-21426] [2.0] [SQL] [TEST] Fix test failure...

2017-07-15 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/18643

[SPARK-21426] [2.0] [SQL] [TEST] Fix test failure due to missing literal 
representation 

## What changes were proposed in this pull request?
SPARK 2.0 does not support hex literal. Thus, the test case failed after 
backporting https://github.com/apache/spark/pull/18571

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark fixTestFailure2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18643


commit 6e9bca7f674f57a3cbd66f04550ca1080682707a
Author: gatorsmile 
Date:   2017-07-16T00:23:03Z

fix.

commit c40bfaf4f479c78719083ebbd8a8bf2738bf36e6
Author: gatorsmile 
Date:   2017-07-16T00:27:10Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127595777
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an An expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an 
long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined 
by the range [`min_value`, `max_value`].",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = {
+MathUtils.widthBucket(
+  ex.asInstanceOf[Double],
--- End diff --

What happened if the input is not a constant, but an foldable expression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127595711
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an An expression that resolves to
--- End diff --

`an An` -> `an`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18641
  
@viirya @cloud-fan could you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-15 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127590053
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.Vectors
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * Feature hashing projects a set of categorical or numerical features 
into a feature vector of
+ * specified dimension (typically substantially smaller than that of the 
original feature
+ * space). This is done using the hashing trick 
(https://en.wikipedia.org/wiki/Feature_hashing)
+ * to map features to indices in the feature vector.
+ *
+ * The [[FeatureHasher]] transformer operates on multiple columns. Each 
column may be numeric
+ * (representing a real feature) or string (representing a categorical 
feature). Boolean columns
+ * are also supported, and treated as categorical features. For numeric 
features, the hash value of
+ * the column name is used to map the feature value to its index in the 
feature vector.
+ * For categorical features, the hash value of the string 
"column_name=value" is used to map to the
+ * vector index, with an indicator value of `1.0`. Thus, categorical 
features are "one-hot" encoded
+ * (similarly to using [[OneHotEncoder]] with `dropLast=false`).
+ *
+ * Null (missing) values are ignored (implicitly zero in the resulting 
feature vector).
+ *
+ * Since a simple modulo is used to transform the hash function to a 
vector index,
+ * it is advisable to use a power of two as the numFeatures parameter;
+ * otherwise the features will not be mapped evenly to the vector indices.
+ *
+ * {{{
+ *   val df = Seq(
+ *(2.0, true, "1", "foo"),
+ *(3.0, false, "2", "bar")
+ *   ).toDF("real", "bool", "stringNum", "string")
+ *
+ *   val hasher = new FeatureHasher()
+ *.setInputCols("real", "bool", "stringNum", "num")
+ *.setOutputCol("features")
+ *
+ *   hasher.transform(df).show()
+ *
+ *   ++-+-+--++
+ *   |real| bool|stringNum|string|features|
+ *   ++-+-+--++
+ *   | 2.0| true|1|   foo|(262144,[51871,63...|
+ *   | 3.0|false|2|   bar|(262144,[6031,806...|
+ *   ++-+-+--++
+ * }}}
+ */
+@Since("2.3.0")
+class FeatureHasher(@Since("2.3.0") override val uid: String) extends 
Transformer
+  with HasInputCols with HasOutputCol with DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("featureHasher"))
+
+  /**
+   * Number of features. Should be greater than 0.
+   * (default = 2^18^)
+   * @group param
+   */
+  @Since("2.3.0")
+  val numFeatures = new IntParam(this, "numFeatures", "number of features 
(> 0)",
+ParamValidators.gt(0))
+
+  setDefault(numFeatures -> (1 << 18))
+
+  /** @group getParam */
+  @Since("2.3.0")
+  def getNumFeatures: Int = $(numFeatures)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+
+  /** @group s

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-15 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127589970
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.Vectors
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * Feature hashing projects a set of categorical or numerical features 
into a feature vector of
+ * specified dimension (typically substantially smaller than that of the 
original feature
+ * space). This is done using the hashing trick 
(https://en.wikipedia.org/wiki/Feature_hashing)
+ * to map features to indices in the feature vector.
+ *
+ * The [[FeatureHasher]] transformer operates on multiple columns. Each 
column may be numeric
+ * (representing a real feature) or string (representing a categorical 
feature). Boolean columns
+ * are also supported, and treated as categorical features. For numeric 
features, the hash value of
+ * the column name is used to map the feature value to its index in the 
feature vector.
+ * For categorical features, the hash value of the string 
"column_name=value" is used to map to the
+ * vector index, with an indicator value of `1.0`. Thus, categorical 
features are "one-hot" encoded
+ * (similarly to using [[OneHotEncoder]] with `dropLast=false`).
+ *
+ * Null (missing) values are ignored (implicitly zero in the resulting 
feature vector).
+ *
+ * Since a simple modulo is used to transform the hash function to a 
vector index,
+ * it is advisable to use a power of two as the numFeatures parameter;
+ * otherwise the features will not be mapped evenly to the vector indices.
+ *
+ * {{{
+ *   val df = Seq(
+ *(2.0, true, "1", "foo"),
+ *(3.0, false, "2", "bar")
+ *   ).toDF("real", "bool", "stringNum", "string")
+ *
+ *   val hasher = new FeatureHasher()
+ *.setInputCols("real", "bool", "stringNum", "num")
+ *.setOutputCol("features")
+ *
+ *   hasher.transform(df).show()
+ *
+ *   ++-+-+--++
+ *   |real| bool|stringNum|string|features|
+ *   ++-+-+--++
+ *   | 2.0| true|1|   foo|(262144,[51871,63...|
+ *   | 3.0|false|2|   bar|(262144,[6031,806...|
+ *   ++-+-+--++
+ * }}}
+ */
+@Since("2.3.0")
+class FeatureHasher(@Since("2.3.0") override val uid: String) extends 
Transformer
+  with HasInputCols with HasOutputCol with DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("featureHasher"))
+
+  /**
+   * Number of features. Should be greater than 0.
+   * (default = 2^18^)
+   * @group param
+   */
+  @Since("2.3.0")
+  val numFeatures = new IntParam(this, "numFeatures", "number of features 
(> 0)",
+ParamValidators.gt(0))
+
+  setDefault(numFeatures -> (1 << 18))
+
+  /** @group getParam */
+  @Since("2.3.0")
+  def getNumFeatures: Int = $(numFeatures)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+
+  /** @group s

[GitHub] spark issue #18605: [SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid f...

2017-07-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/18605
  
Sure. I am reading the #18613 comments. Just come back from a business 
travel. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79637/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79637 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79637/testReport)**
 for PR 18555 at commit 
[`17afdd7`](https://github.com/apache/spark/commit/17afdd7a2f5fa4333bd243cad78265f28eb1f62d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18641
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79636/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18641
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18641
  
**[Test build #79636 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79636/testReport)**
 for PR 18641 at commit 
[`acfdf54`](https://github.com/apache/spark/commit/acfdf542e39d9eae9780af5260fde03c96df8a7e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18640
  
Hi, @rxin , @srowen , @sameeragarwal , @cloud-fan , @hvanhovell , 
@gatorsmile , @ueshin , @viirya , @kiszk .

Could you review this small PR about depedency change?

This is a start of upgrade to Apache ORC in order to reduce the old Hive 
dependency in Apache Spark 2.3 for the following issues.

- SPARK-20901 Feature parity for ORC with Parquet
- SPARK-20682 Support a new faster ORC data source based on Apache ORC
- SPARK-20728 Make ORCFileFormat configurable between sql/hive and sql/core
- SPARK-16060 Vectorized Orc Reader

I've heard that Apache Spark will not drop ORC data source from 
@sameeragarwal . If then, could we move forward a small step like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79637/testReport)**
 for PR 18555 at commit 
[`17afdd7`](https://github.com/apache/spark/commit/17afdd7a2f5fa4333bd243cad78265f28eb1f62d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18641
  
**[Test build #79636 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79636/testReport)**
 for PR 18641 at commit 
[`acfdf54`](https://github.com/apache/spark/commit/acfdf542e39d9eae9780af5260fde03c96df8a7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-07-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17980
  
I developed new ORC component and tests in `sql/core` module without 
`sql/hive` module. Do you mean add a `sql/hive` module test-dependency into 
`sql` module?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127584948
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1023,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
--- End diff --

Oh, I take it back. This is a possibly hot path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127584880
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,10 @@
 import functools
 import time
 import datetime
+import array
+import math
--- End diff --

BTW, this import looks not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127584800
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -57,11 +57,11 @@ private[spark] object SerDeUtil extends Logging {
 //  };
 // TODO: support Py_UNICODE with 2 bytes
--- End diff --

Yea.. actually, this is the reason why I don't like to leave a todo. Yes, 
it looks apparently fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127584731
  
--- Diff: python/pyspark/sql/types.py ---
@@ -915,6 +916,90 @@ def _parse_datatype_json_value(json_value):
 long: LongType,
 })
 
+# Mapping Python array types to Spark SQL DataType
+# We should be careful here. The size of these types in python depends on C
+# implementation. We need to make sure that this conversion does not lose 
any
+# precision. Also, JVM only support signed types, when converting unsigned 
types,
+# keep in mind that it required 1 more bit when stored as singed types.
+#
+# Reference for C integer size, see:
+# ISO/IEC 9899:201x specification, chapter 5.2.4.2.1 Sizes of integer 
types .
+# Reference for python array typecode, see:
+# https://docs.python.org/2/library/array.html
+# https://docs.python.org/3.6/library/array.html
+# Reference for JVM's supported integral types:
+# http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.3.1
+
+_array_signed_int_typecode_ctype_mappings = {
+'b': ctypes.c_byte,
+'h': ctypes.c_short,
+'i': ctypes.c_int,
+'l': ctypes.c_long,
+}
+
+_array_unsigned_int_typecode_ctype_mappings = {
+'B': ctypes.c_ubyte,
+'H': ctypes.c_ushort,
+'I': ctypes.c_uint,
+'L': ctypes.c_ulong
+}
+
+# TODO: [SPARK-21420]
+# Uncomment this when 'q' and 'Q' are supported by net.razorvine.pickle
+# Type code 'q' and 'Q' are not available at python 2
+# if sys.version_info[0] >= 3:
+# _array_signed_int_typecode_ctype_mappings['q'] = ctypes.c_longlong
+# _array_unsigned_int_typecode_ctype_mappings['Q'] = ctypes.c_ulonglong
--- End diff --

Personally, I don't like leaving a todo in codes. I am pretty sure that we 
have unresolved (or probably not removed, although already fixed) todos. At 
least, I cleared few todos before.

I guess `SPARK-21420` has a good description. Let's remove this as the 
commit log and discussion keep the code changes and context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127584576
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,10 @@
 import functools
 import time
 import datetime
+import array
+import math
+import ctypes
+
--- End diff --

little nit: could we remove this extra newline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127584600
  
--- Diff: python/pyspark/sql/types.py ---
@@ -938,12 +1023,17 @@ def _infer_type(obj):
 return MapType(_infer_type(key), _infer_type(value), True)
 else:
 return MapType(NullType(), NullType(), True)
-elif isinstance(obj, (list, array)):
+elif isinstance(obj, list):
 for v in obj:
 if v is not None:
 return ArrayType(_infer_type(obj[0]), True)
 else:
 return ArrayType(NullType(), True)
+elif isinstance(obj, array):
+if obj.typecode in _array_type_mappings:
--- End diff --

Could we do explicitly `_array_type_mappings.keys()` if you are fine? I 
found it is confusing sometimes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18612: [SPARK-21388][ML][PySpark] GBTs inherit from HasS...

2017-07-15 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18612#discussion_r127584690
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -162,7 +162,7 @@ private[ml] trait HasThreshold extends Params {
* Param for threshold in binary classification prediction, in range [0, 
1].
* @group param
*/
-  final val threshold: DoubleParam = new DoubleParam(this, "threshold", 
"threshold in binary classification prediction, in range [0, 1]", 
ParamValidators.inRange(0, 1))
+  val threshold: DoubleParam = new DoubleParam(this, "threshold", 
"threshold in binary classification prediction, in range [0, 1]", 
ParamValidators.inRange(0, 1))
 
   setDefault(threshold, 0.5)
--- End diff --

Maybe the error can be eliminated after removing this line, and actually we 
should set default value in the concrete class if it's inherited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula sho...

2017-07-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18613


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should han...

2017-07-15 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18613
  
Merged into master. Thanks for all reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18642
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18642
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79633/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18642
  
**[Test build #79633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79633/testReport)**
 for PR 18642 at commit 
[`ce51466`](https://github.com/apache/spark/commit/ce51466411983181b52cac07610d2bece2ffc5e8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79635/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79635/testReport)**
 for PR 18555 at commit 
[`5b2c9cc`](https://github.com/apache/spark/commit/5b2c9cc9464b00a0e9925b7bc06e16389bbcfc7c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79635/testReport)**
 for PR 18555 at commit 
[`5b2c9cc`](https://github.com/apache/spark/commit/5b2c9cc9464b00a0e9925b7bc06e16389bbcfc7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79634/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79634/testReport)**
 for PR 18555 at commit 
[`1c04036`](https://github.com/apache/spark/commit/1c04036d347f51a6899b91fd83f8c7e97cada3f0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18555
  
**[Test build #79634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79634/testReport)**
 for PR 18555 at commit 
[`1c04036`](https://github.com/apache/spark/commit/1c04036d347f51a6899b91fd83f8c7e97cada3f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-15 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18555
  
@gatorsmile 
thank you.
I have update it.
please review it again.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18641
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79632/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18641
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18641
  
**[Test build #79632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79632/testReport)**
 for PR 18641 at commit 
[`19ae0dc`](https://github.com/apache/spark/commit/19ae0dce58f468e672cb03a6c94d0eab54504473).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-15 Thread LeoIV
Github user LeoIV commented on the issue:

https://github.com/apache/spark/pull/17373
  
Right 🙈  That explains why it works perfectly fine in with my classifier 
:-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18642
  
**[Test build #79633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79633/testReport)**
 for PR 18642 at commit 
[`ce51466`](https://github.com/apache/spark/commit/ce51466411983181b52cac07610d2bece2ffc5e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGr...

2017-07-15 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/18642

[MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWithState uses 
flatMapGroupsWithState

## What changes were proposed in this pull request?

Refactored `KeyValueGroupedDataset.mapGroupsWithState` to use 
`flatMapGroupsWithState` explicitly (so it's clear that the former is almost 
the latter).

## How was this patch tested?

local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark mapGroupsWithState

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18642


commit ce51466411983181b52cac07610d2bece2ffc5e8
Author: Jacek Laskowski 
Date:   2017-07-15T09:31:55Z

[MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWithState uses 
flatMapGroupsWithState




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem i...

2017-07-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18641
  
**[Test build #79632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79632/testReport)**
 for PR 18641 at commit 
[`19ae0dc`](https://github.com/apache/spark/commit/19ae0dce58f468e672cb03a6c94d0eab54504473).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18641: [SPARK-21413][SQL] Fix 64KB JVM bytecode limit pr...

2017-07-15 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/18641

[SPARK-21413][SQL] Fix 64KB JVM bytecode limit problem in multiple 
projections with CASE WHEN

## What changes were proposed in this pull request?

This PR changes casewhen's code generation to place condition and then 
expressions' generated code into separated methods if these size could be large.

When the method is newly generated, variables for `isNull` and `value` are 
declared as an instance variable.

Before this PR
```java
/* 005 */ class SpecificMutableProjection extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection {
...
/* 034 */   public java.lang.Object apply(java.lang.Object _i) {
/* 035 */ InternalRow i = (InternalRow) _i;
/* 036 */
/* 037 */
/* 038 */
/* 039 */ boolean isNull = true;
/* 040 */ int value = -1;
/* 041 */
/* 042 */
/* 043 */ boolean isNull1 = true;
/* 044 */ boolean value1 = false;
/* 045 */
/* 046 */ boolean isNull2 = true;
/* 047 */ int value2 = -1;
/* 048 */
/* 049 */
/* 050 */ boolean isNull3 = true;
/* 051 */ boolean value3 = false;
/* 052 */
/* 053 */ boolean isNull4 = true;
/* 054 */ int value4 = -1;
/* 055 */
/* 056 */
/* 057 */ boolean isNull5 = true;
/* 058 */ boolean value5 = false;
/* 059 */
/* 060 */ boolean isNull6 = true;
/* 061 */ int value6 = -1;
/* 062 */
/* 063 */
/* 064 */ boolean isNull7 = true;
/* 065 */ boolean value7 = false;
/* 066 */
/* 067 */ boolean isNull8 = true;
/* 068 */ int value8 = -1;
/* 069 */
/* 070 */
/* 071 */ boolean isNull9 = true;
/* 072 */ boolean value9 = false;
/* 073 */
/* 074 */ boolean isNull10 = true;
/* 075 */ int value10 = -1;
/* 076 */
/* 077 */
/* 078 */ boolean isNull11 = true;
/* 079 */ boolean value11 = false;
/* 080 */
/* 081 */ boolean isNull12 = true;
/* 082 */ int value12 = -1;
/* 083 */
/* 084 */
/* 085 */ boolean isNull13 = true;
/* 086 */ boolean value13 = false;
/* 087 */
/* 088 */ boolean isNull14 = true;
/* 089 */ int value14 = -1;
/* 090 */
/* 091 */
/* 092 */ boolean isNull15 = true;
/* 093 */ boolean value15 = false;
/* 094 */
/* 095 */ boolean isNull16 = true;
/* 096 */ int value16 = -1;
/* 097 */
/* 098 */
/* 099 */ boolean isNull17 = true;
/* 100 */ boolean value17 = false;
/* 101 */
/* 102 */ boolean isNull18 = true;
/* 103 */ int value18 = -1;
/* 104 */
/* 105 */
/* 106 */ boolean isNull19 = true;
/* 107 */ boolean value19 = false;
/* 108 */
/* 109 */ boolean isNull20 = i.isNullAt(0);
/* 110 */ int value20 = isNull20 ? -1 : (i.getInt(0));
/* 111 */ if (!isNull20) {
/* 112 */
/* 113 */
/* 114 */   isNull19 = false; // resultCode could change nullability.
/* 115 */   value19 = value20 == 0;
/* 116 */
/* 117 */ }
/* 118 */ if (!isNull19 && value19) {
/* 119 */
/* 120 */   isNull18 = false;
/* 121 */   value18 = -1;
/* 122 */ }
/* 123 */
/* 124 */ else {
/* 125 */
/* 126 */
/* 127 */   boolean isNull23 = i.isNullAt(0);
/* 128 */   int value23 = isNull23 ? -1 : (i.getInt(0));
/* 129 */   isNull18 = isNull23;
/* 130 */   value18 = value23;
/* 131 */ }
...
```

After this PR
```java
/* 005 */ class SpecificMutableProjection extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection {
...
/* 263 */   private boolean isNull1409;
/* 264 */   private boolean value1409;
...
/* 519 */   private boolean isNull2815;
/* 520 */   private boolean value2815;
...
/* 1073 */   public java.lang.Object apply(java.lang.Object _i) {
/* 1074 */ InternalRow i = (InternalRow) _i;
/* 1075 */
/* 1076 */
/* 1077 */
/* 1078 */ boolean isNull = true;
/* 1079 */ int value = -1;
/* 1080 */
/* 1081 */ caseWhenCondExpr255(i);
/* 1082 */ if (!isNull2815 && value2815) {
/* 1083 */
/* 1084 */   isNull = false;
/* 1085 */   value = -1;
/* 1086 */ }
/* 1087 */
/* 1088 */ else {
/* 1089 */
/* 1090 */
/* 1091 */   boolean isNull2816 = true;
/* 1092 */   int value2816 = -1;
/* 1093 */
/* 1094 */   caseWhenCondExpr383(i);
/* 1095 */   if (!isNull4223 && value4223) {
/* 1096 */
/* 1097 */ isNull2816 = false;
/* 1098 */ value2816 = -1;
/* 1099 */   }
...
/* 30171 */   private void caseWhenCondExpr255(InternalRow i) {
/* 30

[GitHub] spark pull request #18625: [SPARK-21267][DOCS][MINOR] Follow up to avoid ref...

2017-07-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18625: [SPARK-21267][DOCS][MINOR] Follow up to avoid referencin...

2017-07-15 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18625
  
Merged to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17373
  
They are, I think some values are something like 4.7532244532E-10 the 
display truncate them.
Thanks

Sent from my iPhone

On 15 Jul 2017, at 12:35 AM, Leonard Hövelmann 
mailto:notificati...@github.com>> wrote:


Alright, thanks. But have a look at the probabilities. They aren’t in 
[0,1] either.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on 
GitHub, or 
mute the 
thread.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-15 Thread LeoIV
Github user LeoIV commented on the issue:

https://github.com/apache/spark/pull/17373
  
Alright, thanks. But have a look at the probabilities. They aren’t in 
[0,1] either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org