[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-06-25 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17995
  
@yanboliang I update this PR and revert changes on `setSolver` in GLR and 
LiR. Thanks for your reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17995
  
**[Test build #78607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78607/testReport)**
 for PR 17995 at commit 
[`28941f3`](https://github.com/apache/spark/commit/28941f39187380b9f7ca6a49d24fbee8a759a505).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123931580
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 ---
@@ -81,7 +83,8 @@ private[classification] trait MultilayerPerceptronParams 
extends PredictorParams
   final val solver: Param[String] = new Param[String](this, "solver",
 "The solver algorithm for optimization. Supported options: " +
   s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. 
(Default l-bfgs)",
-
ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers))
+(value: String) => MultilayerPerceptronClassifier.supportedSolvers
+  .contains(value.toLowerCase(Locale.ROOT)))
--- End diff --

I think it a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...

2017-06-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18334#discussion_r123930251
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ---
@@ -0,0 +1,112 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.command
+
+import java.net.URI
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, 
CatalogTable}
+import org.apache.spark.sql.internal.SessionState
+
+
+object CommandUtils extends Logging {
+
+  /**
+   * Update statistics (currently only sizeInBytes) after changing data by 
commands.
+   */
+  def updateTableStats(
+  sparkSession: SparkSession,
+  table: CatalogTable,
+  newTableSize: Option[BigInt] = None,
+  newRowCount: Option[BigInt] = None): Unit = {
+if (sparkSession.sessionState.conf.autoStatsUpdate && 
table.stats.nonEmpty) {
+  val catalog = sparkSession.sessionState.catalog
+  val newTable = catalog.getTableMetadata(table.identifier)
+  val newSize = newTableSize.getOrElse(
+CommandUtils.calculateTotalSize(sparkSession.sessionState, 
newTable))
+  catalog.alterTableStats(table.identifier,
+CatalogStatistics(sizeInBytes = newSize, rowCount = newRowCount))
--- End diff --

since we are protected by a flag, can we be more aggressive and auto update 
all stats?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...

2017-06-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18419
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123930006
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
--- End diff --

If `minValue == maxValue `,  then `lower==upper`,  result is `numBucket + 
1L`:
```
val lower: Double = Math.min(minValue, maxValue)
val upper: Double = Math.max(minValue, maxValue)

val result: Long = if (expr < lower) {
  0
} else if (expr >= upper) {
  numBucket + 1L
} else {
  (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong
}

if (minValue > maxValue) (numBucket - result) + 1 else result
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...

2017-06-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18419#discussion_r123929987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -118,4 +125,19 @@ object SQLExecution {
   sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
 }
   }
+
+  /**
+   * Wrap an action which may have nested execution id. This method can be 
used to run an execution
+   * inside another execution, e.g., `CacheTableCommand` need to call 
`Dataset.collect`.
--- End diff --

nit: All Spark jobs in the body won't be tracked in UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...

2017-06-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18334#discussion_r123929887
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -165,6 +167,22 @@ private[sql] trait SQLTestUtils
   }
 
   /**
+   * Creates the specified number of temporary directories, which is then 
passed to `f` and will be
+   * deleted after `f` returns.
+   */
+  protected def withTempPaths(numPaths: Int)(f: Seq[File] => Unit): Unit = 
{
+val files = mutable.Buffer[File]()
--- End diff --

nit: we can just create an array as we know the size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17084
  
**[Test build #78606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78606/testReport)**
 for PR 17084 at commit 
[`60fc2a7`](https://github.com/apache/spark/commit/60fc2a78d4c3e985e91fd14522642d861df58d99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17084
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17084
  
the pip packaging failing seems to be unrelated to the code... let me try 
this again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...

2017-06-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18419#discussion_r123928964
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -118,4 +125,19 @@ object SQLExecution {
   sc.setLocalProperty(SQLExecution.EXECUTION_ID_KEY, oldExecutionId)
 }
   }
+
+  /**
+   * Wrap an action which may have nested execution id. This method can be 
used to run an execution
+   * inside another execution, e.g., `CacheTableCommand` need to call 
`Dataset.collect`.
+   */
+  def ignoreNestedExecutionId[T](sparkSession: SparkSession)(body: => T): 
T = {
--- End diff --

Although we ignore nested execution id, the job stages and metrics created 
by the body here will still be recorded into the `SQLExecutionUIData` referred 
by the current execution id. But looks it should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...

2017-06-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18334#discussion_r123928827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ---
@@ -0,0 +1,112 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.command
+
+import java.net.URI
+
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, 
CatalogTable}
+import org.apache.spark.sql.internal.SessionState
+
+
+object CommandUtils extends Logging {
+
+  /**
+   * Update statistics (currently only sizeInBytes) after changing data by 
commands.
+   */
+  def updateTableStats(
+  sparkSession: SparkSession,
+  table: CatalogTable,
+  newTableSize: Option[BigInt] = None,
+  newRowCount: Option[BigInt] = None): Unit = {
+if (sparkSession.sessionState.conf.autoStatsUpdate && 
table.stats.nonEmpty) {
+  val catalog = sparkSession.sessionState.catalog
+  val newTable = catalog.getTableMetadata(table.identifier)
+  val newSize = newTableSize.getOrElse(
+CommandUtils.calculateTotalSize(sparkSession.sessionState, 
newTable))
+  catalog.alterTableStats(table.identifier,
+CatalogStatistics(sizeInBytes = newSize, rowCount = newRowCount))
--- End diff --

so we never auto update column stats?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11994
  
**[Test build #78605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78605/testReport)**
 for PR 11994 at commit 
[`dd981ba`](https://github.com/apache/spark/commit/dd981ba1db4066109d61af1cfb18a06819b4bed5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18368
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78596/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18368
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18405
  
**[Test build #78604 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78604/testReport)**
 for PR 18405 at commit 
[`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18368
  
**[Test build #78596 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78596/testReport)**
 for PR 18368 at commit 
[`fc2b7c0`](https://github.com/apache/spark/commit/fc2b7c02fab7f570ae3ca080ae1c2c9502300de7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18405
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17084
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78598/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17084
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17084
  
**[Test build #78598 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78598/testReport)**
 for PR 17084 at commit 
[`60fc2a7`](https://github.com/apache/spark/commit/60fc2a78d4c3e985e91fd14522642d861df58d99).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SPARK-21144][SQL] Make it more consistent ...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17758
  
**[Test build #78603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78603/testReport)**
 for PR 17758 at commit 
[`3f56d04`](https://github.com/apache/spark/commit/3f56d04c7131fe833a3efbf56e7318e2c08f79dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17084
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17084
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78597/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17084
  
**[Test build #78597 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78597/testReport)**
 for PR 17084 at commit 
[`cf59c62`](https://github.com/apache/spark/commit/cf59c62f272ade192dfbf28ab53881251ea0d95e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BinaryClassificationMetrics @Since(\"2.2.0\") (`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SPARK-21144][SQL] Make it more consistent ...

2017-06-25 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17758
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18334
  
**[Test build #78602 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78602/testReport)**
 for PR 18334 at commit 
[`5a43594`](https://github.com/apache/spark/commit/5a43594fb8a2fb2885c4d268140f28827a65ff5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18419
  
**[Test build #78600 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78600/testReport)**
 for PR 18419 at commit 
[`0795c16`](https://github.com/apache/spark/commit/0795c16b4beaf70430e8dc62f135f99ac801960e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18418: [SPARK-19104][SQL] Lambda variables should work when par...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18418
  
**[Test build #78601 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78601/testReport)**
 for PR 18418 at commit 
[`bd0221a`](https://github.com/apache/spark/commit/bd0221a6b745be938ade7596658e788dbddbab91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecution.ign...

2017-06-25 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18419
  
cc @rdblue @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18419: [SPARK-20213][SQL][follow-up] introduce SQLExecut...

2017-06-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/18419

[SPARK-20213][SQL][follow-up] introduce SQLExecution.ignoreNestedExecutionId

## What changes were proposed in this pull request?

in https://github.com/apache/spark/pull/18064, to work around the nested 
sql execution id issue, we introduced several internal methods in `Dataset`, 
like `collectInternal`, `countInternal`, `showInternal`, etc., to avoid nested 
execution id.

However, this approach has poor expansibility. When we hit other nested 
execution id cases, we may need to add more internal methods in `Dataset`.

Our goal is to ignore the nested execution id in some cases, and we can 
have a better approach to achieve this goal, by introducing 
`SQLExecution.ignoreNestedExecutionId`. Whenever we find a place which needs to 
ignore the nested execution, we can just wrap the action with 
`SQLExecution.ignoreNestedExecutionId`, and this is more expansible than the 
previous approach.

The idea comes from 
https://github.com/apache/spark/pull/17540/files#diff-ab49028253e599e6e74cc4f4dcb2e3a8R57
 by @rdblue 

## How was this patch tested?

existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark follow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18419


commit 0795c16b4beaf70430e8dc62f135f99ac801960e
Author: Wenchen Fan 
Date:   2017-06-26T04:36:59Z

introduce SQLExecution.ignoreNestedExecutionId




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18366: [SPARK-20889][SparkR] Grouped documentation for S...

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18366#discussion_r123925897
  
--- Diff: R/pkg/R/functions.R ---
@@ -635,20 +652,16 @@ setMethod("dayofyear",
 column(jc)
   })
 
-#' decode
-#'
-#' Computes the first argument into a string from a binary using the 
provided character set
-#' (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16').
+#' @details
+#' \code{decode}: Computes the first argument into a string from a binary 
using the provided
+#' character set.
 #'
-#' @param x Column to compute on.
-#' @param charset Character set to use
+#' @param charset Character set to use (one of "US-ASCII", "ISO-8859-1", 
"UTF-8", "UTF-16BE",
+#'"UTF-16LE", "UTF-16").
--- End diff --

Not a big deal as they contain same information. So, just rather a weak 
opinion - it'd be nicer if we match this to Scala/Python too IMHO or just leave 
as is. It's also fine to me as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18366: [SPARK-20889][SparkR] Grouped documentation for S...

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18366#discussion_r123926400
  
--- Diff: R/pkg/R/functions.R ---
@@ -1503,18 +1491,12 @@ setMethod("skewness",
 column(jc)
   })
 
-#' soundex
-#'
-#' Return the soundex code for the specified expression.
-#'
-#' @param x Column to compute on.
+#' @details
+#' \code{soundex}: Returns the soundex code for the specified expression.
 #'
-#' @rdname soundex
-#' @name soundex
-#' @family string functions
-#' @aliases soundex,Column-method
+#' @rdname column_string_functions
+#' @aliases soundex soundex,Column-method
 #' @export
-#' @examples \dontrun{soundex(df$c)}
--- End diff --

It looks this example is missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-25 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18334
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18418: [SPARK-19104][SQL] Lambda variables should work when par...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18418
  
**[Test build #78599 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78599/testReport)**
 for PR 18418 at commit 
[`c417e22`](https://github.com/apache/spark/commit/c417e229f4563a6ee857d7ee55582e3b6ca2ed6b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18405
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78593/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18405
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18405
  
**[Test build #78593 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78593/testReport)**
 for PR 18405 at commit 
[`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-25 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/18418

[SPARK-19104][SQL] Lambda variables should work when parent expression 
splits generated codes

## What changes were proposed in this pull request?

When an expression using lambda variables split the generated codes, the 
generated local variables of lambda variables can't be accessed in the 
generated functions. This patch fixes this issue by adding the lambda variables 
into the function parameter list.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-19104

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18418


commit c417e229f4563a6ee857d7ee55582e3b6ca2ed6b
Author: Liang-Chi Hsieh 
Date:   2017-06-26T04:26:00Z

Add lambda variables into the parameters of functions generated by 
splitExpressions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17084
  
**[Test build #78598 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78598/testReport)**
 for PR 17084 at commit 
[`60fc2a7`](https://github.com/apache/spark/commit/60fc2a78d4c3e985e91fd14522642d861df58d99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17084
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17084#discussion_r123925697
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryConfusionMatrix.scala
 ---
@@ -22,22 +22,22 @@ package org.apache.spark.mllib.evaluation.binary
  */
 private[evaluation] trait BinaryConfusionMatrix {
   /** number of true positives */
-  def numTruePositives: Long
+  def numTruePositives: Double
--- End diff --

good idea, updated the names of the variables


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17084#discussion_r123925523
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -146,11 +160,13 @@ class BinaryClassificationMetrics @Since("1.3.0") (
   private lazy val (
 cumulativeCounts: RDD[(Double, BinaryLabelCounter)],
 confusions: RDD[(Double, BinaryConfusionMatrix)]) = {
-// Create a bin for each distinct score value, count positives and 
negatives within each bin,
-// and then sort by score values in descending order.
-val counts = scoreAndLabels.combineByKey(
-  createCombiner = (label: Double) => new BinaryLabelCounter(0L, 0L) 
+= label,
-  mergeValue = (c: BinaryLabelCounter, label: Double) => c += label,
+// Create a bin for each distinct score value, count weighted 
positives and
+// negatives within each bin, and then sort by score values in 
descending order.
+val counts = scoreAndLabelsWithWeights.combineByKey(
+  createCombiner = (labelAndWeight: (Double, Double)) =>
+new BinaryLabelCounter(0L, 0L) += (labelAndWeight._1, 
labelAndWeight._2),
--- End diff --

updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17451#discussion_r123925184
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2869,6 +2869,18 @@ def findSynonyms(self, word, num):
 word = _convert_to_vector(word)
 return self._call_java("findSynonyms", word, num)
 
+@since("2.2.0")
+def findSynonymsTuple(self, word, num):
--- End diff --

```findSynonymsTuple``` -> ```findSynonymsArray```, we should keep the same 
function name and return type with Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17451#discussion_r123925233
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2869,6 +2869,18 @@ def findSynonyms(self, word, num):
 word = _convert_to_vector(word)
 return self._call_java("findSynonyms", word, num)
 
+@since("2.2.0")
+def findSynonymsTuple(self, word, num):
+"""
+Find "num" number of words closest in similarity to "word".
+word can be a string or vector representation.
+Returns an array with two fields word and similarity (which
+gives the cosine similarity).
+"""
+if not isinstance(word, basestring):
+word = _convert_to_vector(word)
+return self._call_java("findSynonymsTuple", word, num)
+
--- End diff --

We need to convert result back to array of tuple, which would be consistent 
with Scala output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17451#discussion_r123925086
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -274,6 +274,31 @@ class Word2VecModel private[ml] (
 wordVectors.findSynonyms(word, num)
   }
 
+  /**
+   * Find "num" number of words whose vector representation is most 
similar to the supplied vector.
+   * If the supplied vector is the vector representation of a word in the 
model's vocabulary,
+   * that word will be in the results.
+   * @return a tuple of the words list and the cosine similarities list 
between the synonyms given
+   * word vector.
+   */
+  @Since("2.2.0")
+  def findSynonymsTuple(vec: Vector, num: Int): (Array[String], 
Array[Double]) = {
+val result = findSynonymsArray(vec, num)
+(result.map(e => e._1), result.map(e => e._2))
+  }
+
+  /**
+   * Find "num" number of words closest in similarity to the given word, 
not
+   * including the word itself.
+   * @return a tuple of the words list and the cosine similarities list 
between the synonyms given
+   * word vector.
+   */
+  @Since("2.2.0")
+  def findSynonymsTuple(word: String, num: Int): (Array[String], 
Array[Double]) = {
--- End diff --

Ditto, should be private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17451: [SPARK-19866][ML][PySpark] Add local version of W...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17451#discussion_r123925064
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -274,6 +274,31 @@ class Word2VecModel private[ml] (
 wordVectors.findSynonyms(word, num)
   }
 
+  /**
+   * Find "num" number of words whose vector representation is most 
similar to the supplied vector.
+   * If the supplied vector is the vector representation of a word in the 
model's vocabulary,
+   * that word will be in the results.
+   * @return a tuple of the words list and the cosine similarities list 
between the synonyms given
+   * word vector.
+   */
+  @Since("2.2.0")
+  def findSynonymsTuple(vec: Vector, num: Int): (Array[String], 
Array[Double]) = {
--- End diff --

This should be private. Meanwhile, add annotation to clarify this is only 
the Java stubs for the Python bindings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17084#discussion_r123925437
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -41,13 +41,27 @@ import org.apache.spark.sql.DataFrame
  *partition boundaries.
  */
 @Since("1.0.0")
-class BinaryClassificationMetrics @Since("1.3.0") (
-@Since("1.3.0") val scoreAndLabels: RDD[(Double, Double)],
-@Since("1.3.0") val numBins: Int) extends Logging {
+class BinaryClassificationMetrics @Since("2.2.0") (
+val numBins: Int,
+@Since("2.2.0") val scoreAndLabelsWithWeights: RDD[(Double, (Double, 
Double))])
+  extends Logging {
 
   require(numBins >= 0, "numBins must be nonnegative")
 
   /**
+   * Retrieves the score and labels (for binary compatibility).
+   * @return The score and labels.
+   */
+  @Since("1.0.0")
--- End diff --

good catch, updated version for both:
1.) def scoreAndLabels: RDD[(Double, Double)] = 
2.) def this(@Since("1.3.0") scoreAndLabels: RDD[(Double, Double)], 
@Since("1.3.0") numBins: Int) = ...
to 1.3.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17084#discussion_r123925177
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 ---
@@ -77,12 +87,16 @@ class BinaryClassificationEvaluator @Since("1.4.0") 
(@Since("1.4.0") override va
 SchemaUtils.checkNumericType(schema, $(labelCol))
 
 // TODO: When dataset metadata has been implemented, check 
rawPredictionCol vector length = 2.
-val scoreAndLabels =
-  dataset.select(col($(rawPredictionCol)), 
col($(labelCol)).cast(DoubleType)).rdd.map {
-case Row(rawPrediction: Vector, label: Double) => 
(rawPrediction(1), label)
-case Row(rawPrediction: Double, label: Double) => (rawPrediction, 
label)
+val scoreAndLabelsWithWeights =
+  dataset.select(col($(rawPredictionCol)), 
col($(labelCol)).cast(DoubleType),
+if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else 
col($(weightCol)))
--- End diff --

added check for numeric type and did cast to Double, similar to labelCol


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17084#discussion_r123924688
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 ---
@@ -36,12 +36,18 @@ import org.apache.spark.sql.types.DoubleType
 @Since("1.2.0")
 @Experimental
 class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
-  extends Evaluator with HasRawPredictionCol with HasLabelCol with 
DefaultParamsWritable {
+  extends Evaluator with HasRawPredictionCol with HasLabelCol
+with HasWeightCol with DefaultParamsWritable {
 
   @Since("1.2.0")
   def this() = this(Identifiable.randomUID("binEval"))
 
   /**
+   * Default number of bins to use for binary classification evaluation.
+   */
+  val defaultNumberOfBins = 1000
--- End diff --

It seemed like a good default value to use - for graphing ROC curve, it's 
not too large for most plots, but it's not so small that the graph would be 
jagged.  The user can always specify a value to override the default.  However, 
it's usually not a good idea to sort over the entire label/score values, since 
the dataset will probably be very large, the operation will be very slow, and 
when visualizing the data there won't be any difference, so by default we 
should try to discourage the user from not down-sampling the number of bins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17084
  
**[Test build #78597 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78597/testReport)**
 for PR 17084 at commit 
[`cf59c62`](https://github.com/apache/spark/commit/cf59c62f272ade192dfbf28ab53881251ea0d95e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-25 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r123924134
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -123,6 +126,8 @@
 this.inMemSorter = new ShuffleInMemorySorter(
   this, initialSize, 
conf.getBoolean("spark.shuffle.sort.useRadixSort", true));
 this.peakMemoryUsedBytes = getMemoryUsage();
+this.diskWriteBufferSize =
+conf.getInt("spark.shuffle.spill.diskWriteBufferSize", 
DISK_WRITE_BUFFER_SIZE);
--- End diff --

@cloud-fan 
thanks for review it.
Thank you for your advice. I tried to fix it.
However, org.apache.spark.internal.config cannot be imported in the 
ShuffleExternalSorter.java class
This modification will change the org.apache.spark.internal.config and 
affect other code changes, 
so I suggest using the PR to modify it. Do you agree?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18334
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18334
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78592/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18334
  
**[Test build #78592 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78592/testReport)**
 for PR 18334 at commit 
[`5a43594`](https://github.com/apache/spark/commit/5a43594fb8a2fb2885c4d268140f28827a65ff5a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-25 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r123923859
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -82,6 +82,9 @@
   /** The buffer size to use when writing spills using 
DiskBlockObjectWriter */
   private final int fileBufferSizeBytes;
 
+  /** The buffer size to use when writes the sorted records to an on-disk 
file */
--- End diff --

@jiangxb1987 
thanks for review it,
The UnsafeSorterSpillWriter changes were updated.
please review it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18417: [INFRA] Close stale PRs

2017-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18417
  
Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18417: [INFRA] Close stale PRs

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/18417
  
an you please keep 17084 open?  thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-25 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17084
  
yes, will update the PR, thanks for the ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18368
  
**[Test build #78596 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78596/testReport)**
 for PR 18368 at commit 
[`fc2b7c0`](https://github.com/apache/spark/commit/fc2b7c02fab7f570ae3ca080ae1c2c9502300de7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18346
  
ping @cloud-fan any more feedback on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78595/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11994
  
**[Test build #78595 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78595/testReport)**
 for PR 11994 at commit 
[`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11994
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-25 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/18368
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18235: [SPARK-21012][Submit] Add glob support for resour...

2017-06-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18235#discussion_r123922136
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -310,33 +310,28 @@ object SparkSubmit extends CommandLineUtils {
   RPackageUtils.checkAndBuildRPackage(args.jars, printStream, 
args.verbose)
 }
 
-// In client mode, download remote files.
-if (deployMode == CLIENT) {
-  val hadoopConf = new HadoopConfiguration()
-  args.primaryResource = 
Option(args.primaryResource).map(downloadFile(_, hadoopConf)).orNull
-  args.jars = Option(args.jars).map(downloadFileList(_, 
hadoopConf)).orNull
-  args.pyFiles = Option(args.pyFiles).map(downloadFileList(_, 
hadoopConf)).orNull
-  args.files = Option(args.files).map(downloadFileList(_, 
hadoopConf)).orNull
-}
+val hadoopConf = new HadoopConfiguration()
+val targetDir = Files.createTempDirectory("tmp").toFile
--- End diff --

From my understanding currently no code is responsible for deleting, let me 
check the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11994
  
**[Test build #78595 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78595/testReport)**
 for PR 11994 at commit 
[`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-06-25 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18305
  
@sethah I will take a look in a few days after some backlog, thanks for 
your patience.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/11994
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9518
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78588/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9518
  
**[Test build #78588 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78588/testReport)**
 for PR 9518 at commit 
[`1d50f6f`](https://github.com/apache/spark/commit/1d50f6f5237ca01f7611677795d19e4975244316).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123921193
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 ---
@@ -81,7 +83,8 @@ private[classification] trait MultilayerPerceptronParams 
extends PredictorParams
   final val solver: Param[String] = new Param[String](this, "solver",
 "The solver algorithm for optimization. Supported options: " +
   s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. 
(Default l-bfgs)",
-
ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers))
+(value: String) => MultilayerPerceptronClassifier.supportedSolvers
+  .contains(value.toLowerCase(Locale.ROOT)))
--- End diff --

What do you think of adding a new function in ```object ParamValidators``` 
as
```
def inStringArray(allowed: Array[String]): String => Boolean = { (value: 
String) =>
allowed.contains(value.toLowerCase(java.util.Locale.ROOT))
  }
```
to facilitate similar check here and other place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123920923
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -313,7 +313,11 @@ class GeneralizedLinearRegression @Since("2.0.0") 
(@Since("2.0.0") override val
* @group setParam
*/
   @Since("2.0.0")
-  def setSolver(value: String): this.type = set(solver, value)
+  def setSolver(value: String): this.type = {
+require("irls" == value.toLowerCase(Locale.ROOT),
+  s"Solver $value was not supported. Supported options: irls")
+set(solver, value)
+  }
--- End diff --

Actually we can't do this, since MLlib supports set params via other 
entrances. Currently we can leave as it is, until we resolved #16028.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123919667
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -128,7 +130,8 @@ private[feature] trait ChiSqSelectorParams extends 
Params
   final val selectorType = new Param[String](this, "selectorType",
 "The selector type of the ChisqSelector. " +
   "Supported options: " + 
OldChiSqSelector.supportedSelectorTypes.mkString(", "),
-
ParamValidators.inArray[String](OldChiSqSelector.supportedSelectorTypes))
+(value: String) => 
OldChiSqSelector.supportedSelectorTypes.map(_.toLowerCase(Locale.ROOT))
--- End diff --

Supported selector types should always be stored as lower case, please 
update corresponding code snippet in ```mllib.feature.ChiSqSelector``` from:
```
private[spark] val NumTopFeatures: String = "numTopFeatures"
..
```
to
```
private[spark] val NumTopFeatures: String = "numTopFeatures".toLowerCase
..
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123919494
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala 
---
@@ -48,10 +50,11 @@ final class RegressionEvaluator @Since("1.4.0") 
(@Since("1.4.0") override val ui
* @group param
*/
   @Since("1.4.0")
-  val metricName: Param[String] = {
-val allowedParams = ParamValidators.inArray(Array("mse", "rmse", "r2", 
"mae"))
-new Param(this, "metricName", "metric name in evaluation 
(mse|rmse|r2|mae)", allowedParams)
-  }
+  val metricName: Param[String] = new Param[String](this, "metricName", 
"metric name in" +
+" evaluation (mse|rmse|r2|mae)",
+(value: String) => Array("mse", "rmse", "r2", "mae")
--- End diff --

Ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123888757
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 ---
@@ -46,11 +48,10 @@ class BinaryClassificationEvaluator @Since("1.4.0") 
(@Since("1.4.0") override va
* @group param
*/
   @Since("1.2.0")
-  val metricName: Param[String] = {
-val allowedParams = ParamValidators.inArray(Array("areaUnderROC", 
"areaUnderPR"))
-new Param(
-  this, "metricName", "metric name in evaluation 
(areaUnderROC|areaUnderPR)", allowedParams)
-  }
+  val metricName: Param[String] = new Param[String](this, "metricName", 
"metric name in" +
+" evaluation (areaUnderROC|areaUnderPR)",
+(value: String) => Array("areaunderroc", "areaunderpr").contains(
+  value.toLowerCase(Locale.ROOT)))
--- End diff --

Could we organize as 
```
val AreaUnderROC: String = "areaUnderROC".toLowerCase
val AreaUnderPR: String = "areaUnderPR".toLowerCase
val supportedMetricNames = Set(AreaUnderROC, AreaUnderPR)
```
in ```object BinaryClassificationEvaluator```? This should be more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123920437
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
@@ -45,7 +47,7 @@ private[feature] trait ImputerParams extends Params with 
HasInputCols {
   final val strategy: Param[String] = new Param(this, "strategy", 
s"strategy for imputation. " +
 s"If ${Imputer.mean}, then replace missing values using the mean value 
of the feature. " +
 s"If ${Imputer.median}, then replace missing values using the median 
value of the feature.",
-ParamValidators.inArray[String](Array(Imputer.mean, Imputer.median)))
+(value: String) => Array(Imputer.mean, 
Imputer.median).contains(value.toLowerCase(Locale.ROOT)))
--- End diff --

Ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123919458
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 ---
@@ -44,12 +46,10 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
* @group param
*/
   @Since("1.5.0")
-  val metricName: Param[String] = {
-val allowedParams = ParamValidators.inArray(Array("f1", 
"weightedPrecision",
-  "weightedRecall", "accuracy"))
-new Param(this, "metricName", "metric name in evaluation " +
-  "(f1|weightedPrecision|weightedRecall|accuracy)", allowedParams)
-  }
+  val metricName: Param[String] = new Param[String](this, "metricName", 
"metric name in" +
+" evaluation (f1|weightedPrecision|weightedRecall|accuracy)",
+(value: String) => Array("f1", "weightedprecision", "weightedrecall", 
"accuracy")
+  .contains(value.toLowerCase(Locale.ROOT)))
--- End diff --

Ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-06-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17995#discussion_r123920352
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 ---
@@ -70,6 +71,10 @@ class BinaryClassificationEvaluator @Since("1.4.0") 
(@Since("1.4.0") override va
 
   setDefault(metricName -> "areaUnderROC")
 
+  private def getFormattedMetricName =
--- End diff --

Is this really necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18414: [SPARK-21169] [core] Make sure to update application sta...

2017-06-25 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18414
  
@srini-daruna I think I already addressed this issue in SPARK-12552, here 
is the 
[code](https://github.com/srini-daruna/spark/blob/b3ea3358a7bf55cedaa5cd7d08860bc625e83cd2/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L552).
 Did you test with SPARK-12552 in or not? Also is that fix not enough to 
address the problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11994
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78594/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11994
  
**[Test build #78594 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78594/testReport)**
 for PR 11994 at commit 
[`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set collec...

2017-06-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18416
  
cc @cloud-fan I'd like to hear your opinion about this `Set` support. Can 
you provide some insights?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set...

2017-06-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18416#discussion_r123920728
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -992,6 +1123,128 @@ case class ExternalMapToCatalyst private(
   }
 }
 
+object ExternalSetToCatalystArray {
+  private val curId = new java.util.concurrent.atomic.AtomicInteger()
+
+  def apply(
+  inputSet: Expression,
+  elementType: DataType,
+  elementConverter: Expression => Expression,
+  elementNullable: Boolean): ExternalSetToCatalystArray = {
+val id = curId.getAndIncrement()
+val elementName = "ExternalSetToCatalystArray_element" + id
+val elementIsNull = "ExternalSetToCatalystArray_element_isNull" + id
+
+ExternalSetToCatalystArray(
+  elementName,
+  elementIsNull,
+  elementType,
+  elementConverter(LambdaVariable(elementName, elementIsNull, 
elementType, elementNullable)),
+  inputSet
+)
+  }
+}
+
+/**
+ * Converts a Scala/Java set object into catalyst array format, by 
applying the converter when
+ * iterate the set.
+ *
+ * @param element the name of the set element variable that used when 
iterate the set, and used as
+ *input for the `elementConverter`
+ * @param elementIsNull the nullability of the element variable that used 
when iterate the set, and
+ *used as input for the `elementConverter`
+ * @param elementType the data type of the element variable that used when 
iterate the set, and
+ *  used as input for the `elementConverter`
+ * @param elementConverter A function that take the `element` as input, 
and converts it to catalyst
+ *   array format.
+ * @param child An expression that when evaluated returns the input set 
object.
+ */
+case class ExternalSetToCatalystArray private(
+element: String,
+elementIsNull: String,
+elementType: DataType,
+elementConverter: Expression,
+child: Expression)
+  extends UnaryExpression with NonSQLExpression {
+
+  override def foldable: Boolean = false
+
+  override def dataType: ArrayType = ArrayType(
+elementType = elementConverter.dataType, containsNull = 
elementConverter.nullable)
+
+  override def eval(input: InternalRow): Any =
+throw new UnsupportedOperationException("Only code-generated 
evaluation is supported")
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val inputSet = child.genCode(ctx)
+val genElementConverter = elementConverter.genCode(ctx)
+val length = ctx.freshName("length")
+val index = ctx.freshName("index")
+
+val iter = ctx.freshName("iter")
+val (defineIterator, defineElement) = child.dataType match {
+  case ObjectType(cls) if 
classOf[java.util.Set[_]].isAssignableFrom(cls) =>
+val javaIteratorCls = classOf[java.util.Iterator[_]].getName
--- End diff --

I'd prefer to leave java set support to other PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123918251
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19');
 
 -- positive/negative
 select positive('-1.11'), positive(-1.11), negative('-1.11'), 
negative(-1.11);
+
+-- width_bucket
+select width_bucket(5.35, 0.024, 10.06, 5);
+select width_bucket(5.35, 0.024, 10.06, -5);
--- End diff --

add a case for wrong input type: `select width_bucket(5.35, 0.024, 10.06, 
0.5);`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123918240
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
--- End diff --

nit: id -> is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123919502
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
--- End diff --

Let's remove these examples in the description, they are just corner cases. 
My previous comment was just to make sure both ends should be included.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123919350
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
--- End diff --

Do we consider minValue == maxValue and numBucket > 1 valid input or not?
Please also add a test case for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18405
  
**[Test build #78593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78593/testReport)**
 for PR 18405 at commit 
[`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123920115
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
--- End diff --

Yes,  This is what I worry about.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11994
  
**[Test build #78594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78594/testReport)**
 for PR 11994 at commit 
[`15c79f2`](https://github.com/apache/spark/commit/15c79f26aae206a390ae5609d911bd8f0ad6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18405
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123919660
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = dataType match {
+case NullType => Seq(dataType, TypeCollection(StringType, IntegerType))
+case DateType => Seq(dataType, StringType)
+case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType)
+case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType),
--- End diff --

Add this case to show all supported type:
```
 > select trunc(false, 'MON'); 
Error in query: cannot resolve 'trunc(false, 'MON')' due to data type 
mismatch: argument 1 requires (date or double or decimal) type, however, 
'false' is of boolean type.; line 1 pos 7;
'Project [unresolvedalias(trunc(false, MON), None)]
+- OneRowRelation$
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18405
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18405
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78590/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18405
  
**[Test build #78590 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78590/testReport)**
 for PR 18405 at commit 
[`255c50a`](https://github.com/apache/spark/commit/255c50a87051df42933bbd83aea14ccd54c18826).
 * This patch **fails due to an unknown error code, -10**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18416: [SPARK-21204][SQL][WIP] Add support for Scala Set collec...

2017-06-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18416
  
Currently I can't think of possible issues of serializing `Set` as array. 
But welcome comments to point any possible issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >