[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-03-29 Thread yssharma
Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17467
  
@HyukjinKwon What should be the next steps for this PR. Are there any 
Spark-Kinesis experts who can review the patch ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-29 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17324
  
The test was interrupted and need a retest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17450: [SPARK-20121][SQL] simplify NullPropagation with ...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17450#discussion_r108843221
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -297,8 +297,8 @@ case class Lower(child: Expression) extends 
UnaryExpression with String2StringEx
 }
 
 /** A base trait for functions that compare two strings, returning a 
boolean. */
-trait StringPredicate extends Predicate with ImplicitCastInputTypes {
-  self: BinaryExpression =>
+abstract class StringPredicate extends BinaryExpression
+  with Predicate with ImplicitCastInputTypes {
--- End diff --

Yeah. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17419#discussion_r108840757
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.{SparkException, SparkFunSuite}
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.stat.SummaryBuilderImpl.Buffer
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
+import org.apache.spark.mllib.stat.{MultivariateOnlineSummarizer, 
Statistics}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+
+class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext {
+
+  import testImplicits._
+  import Summarizer._
+
+  private case class ExpectedMetrics(
+  mean: Seq[Double],
+  variance: Seq[Double],
+  count: Long,
+  numNonZeros: Seq[Long],
+  max: Seq[Double],
+  min: Seq[Double],
+  normL2: Seq[Double],
+  normL1: Seq[Double])
+
+  // The input is expected to be either a sparse vector, a dense vector or 
an array of doubles
+  // (which will be converted to a dense vector)
+  // The expected is the list of all the known metrics.
+  //
+  // The tests take an list of input vectors and a list of all the summary 
values that
+  // are expected for this input. They currently test against some fixed 
subset of the
+  // metrics, but should be made fuzzy in the future.
+
+  private def testExample(name: String, input: Seq[Any], exp: 
ExpectedMetrics): Unit = {
+def inputVec: Seq[Vector] = input.map {
+  case x: Array[Double @unchecked] => Vectors.dense(x)
+  case x: Seq[Double @unchecked] => Vectors.dense(x.toArray)
+  case x: Vector => x
+  case x => throw new Exception(x.toString)
+}
+
+val s = {
+  val s2 = new MultivariateOnlineSummarizer
+  inputVec.foreach(v => s2.add(OldVectors.fromML(v)))
+  s2
+}
+
+// Because the Spark context is reset between tests, we cannot hold a 
reference onto it.
+def wrapped() = {
+  val df = sc.parallelize(inputVec).map(Tuple1.apply).toDF("features")
+  val c = df.col("features")
+  (df, c)
+}
+
+registerTest(s"$name - mean only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("mean").summary(c), mean(c)), 
Seq(Row(exp.mean), s.mean))
+}
+
+registerTest(s"$name - mean only (direct)") {
+  val (df, c) = wrapped()
+  compare(df.select(mean(c)), Seq(exp.mean))
+}
+
+registerTest(s"$name - variance only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("variance").summary(c), variance(c)),
+Seq(Row(exp.variance), s.variance))
+}
+
+registerTest(s"$name - variance only (direct)") {
+  val (df, c) = wrapped()
+  compare(df.select(variance(c)), Seq(s.variance))
+}
+
+registerTest(s"$name - count only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("count").summary(c), count(c)),
+Seq(Row(exp.count), exp.count))
+}
+
+registerTest(s"$name - count only (direct)") {
+  val (df, c) = wrapped()
+  compare(df.select(count(c)),
+Seq(exp.count))
+}
+
+registerTest(s"$name - numNonZeros only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("numNonZeros").summary(c), numNonZeros(c)),
+Seq(Row(exp.numNonZeros), exp.numNonZeros))
+}
+
+registerTest(s"$name - 

[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17419#discussion_r108840709
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.{SparkException, SparkFunSuite}
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.stat.SummaryBuilderImpl.Buffer
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
+import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+
+class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext {
+
+  import testImplicits._
+  import Summarizer._
+
+  private case class ExpectedMetrics(
+  mean: Seq[Double],
+  variance: Seq[Double],
+  count: Long,
+  numNonZeros: Seq[Long],
+  max: Seq[Double],
+  min: Seq[Double],
+  normL2: Seq[Double],
+  normL1: Seq[Double])
+
+  // The input is expected to be either a sparse vector, a dense vector or 
an array of doubles
+  // (which will be converted to a dense vector)
+  // The expected is the list of all the known metrics.
+  //
+  // The tests take an list of input vectors and a list of all the summary 
values that
+  // are expected for this input. They currently test against some fixed 
subset of the
+  // metrics, but should be made fuzzy in the future.
+
+  private def testExample(name: String, input: Seq[Any], exp: 
ExpectedMetrics): Unit = {
+def inputVec: Seq[Vector] = input.map {
+  case x: Array[Double @unchecked] => Vectors.dense(x)
+  case x: Seq[Double @unchecked] => Vectors.dense(x.toArray)
+  case x: Vector => x
+  case x => throw new Exception(x.toString)
+}
+
+val s = {
+  val s2 = new MultivariateOnlineSummarizer
+  inputVec.foreach(v => s2.add(OldVectors.fromML(v)))
+  s2
+}
+
+// Because the Spark context is reset between tests, we cannot hold a 
reference onto it.
+def wrapped() = {
+  val df = sc.parallelize(inputVec).map(Tuple1.apply).toDF("features")
+  val c = df.col("features")
+  (df, c)
+}
+
+registerTest(s"$name - mean only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("mean").summary(c), mean(c)), 
Seq(Row(exp.mean), s.mean))
+}
+
+registerTest(s"$name - mean only (direct)") {
+  val (df, c) = wrapped()
+  compare(df.select(mean(c)), Seq(exp.mean))
+}
+
+registerTest(s"$name - variance only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("variance").summary(c), variance(c)),
+Seq(Row(exp.variance), s.variance))
+}
+
+registerTest(s"$name - variance only (direct)") {
+  val (df, c) = wrapped()
+  compare(df.select(variance(c)), Seq(s.variance))
+}
+
+registerTest(s"$name - count only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("count").summary(c), count(c)),
+Seq(Row(exp.count), exp.count))
+}
+
+registerTest(s"$name - count only (direct)") {
+  val (df, c) = wrapped()
+  compare(df.select(count(c)),
+Seq(exp.count))
+}
+
+registerTest(s"$name - numNonZeros only") {
+  val (df, c) = wrapped()
+  compare(df.select(metrics("numNonZeros").summary(c), numNonZeros(c)),
+Seq(Row(exp.numNonZeros), exp.numNonZeros))
+}
+
+registerTest(s"$name - numNonZeros only 

[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-03-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17251
  
Thank you so much! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17476: [SPARK-20151][SQL] Account for partition pruning in scan...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17476
  
**[Test build #75379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75379/testReport)**
 for PR 17476 at commit 
[`8789cf0`](https://github.com/apache/spark/commit/8789cf04ea4f7addbcd8da9d83615ee96d9bd192).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17476: [SPARK-20151][SQL] Account for partition pruning in scan...

2017-03-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17476
  
cc @ericl, @bogdanrdc, @adrian-ionescu, @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17476: [SPARK-20151][SQL] Account for partition pruning ...

2017-03-29 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/17476

[SPARK-20151][SQL] Account for partition pruning in scan metadataTime 
metrics

## What changes were proposed in this pull request?
After SPARK-20136, we report metadata timing metrics in scan operator. 
However, that timing metric doesn't include one of the most important part of 
metadata, which is partition pruning. This patch adds that time measurement to 
the scan metrics.

## How was this patch tested?
N/A - I tried adding a test in SQLMetricsSuite but it was extremely 
convoluted to the point that I'm not sure if this is worth it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-20151

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17476


commit 8789cf04ea4f7addbcd8da9d83615ee96d9bd192
Author: Reynold Xin 
Date:   2017-03-30T04:46:45Z

[SPARK-20151][SQL] Account for partition pruning in scan metadataTime 
metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17474: [Minor][SparkR]: Add run command comment in examp...

2017-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17474


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples

2017-03-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17474
  
merged to master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-03-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17251
  
Will review it tonight. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17419#discussion_r108838518
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala ---
@@ -0,0 +1,746 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.stat
+
+import breeze.{linalg => la}
+import breeze.linalg.{Vector => BV}
+import breeze.numerics
+
+import org.apache.spark.SparkException
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, 
Vectors, VectorUDT}
+import org.apache.spark.sql.Column
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Expression, 
UnsafeArrayData, UnsafeProjection, UnsafeRow}
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
Complete, TypedImperativeAggregate}
+import org.apache.spark.sql.types._
+
+
+/**
+ * A builder object that provides summary statistics about a given column.
+ *
+ * Users should not directly create such builders, but instead use one of 
the methods in
+ * [[Summarizer]].
+ */
+@Since("2.2.0")
+abstract class SummaryBuilder {
+  /**
+   * Returns an aggregate object that contains the summary of the column 
with the requested metrics.
+   * @param column a column that contains Vector object.
+   * @return an aggregate column that contains the statistics. The exact 
content of this
+   * structure is determined during the creation of the builder.
+   */
+  @Since("2.2.0")
+  def summary(column: Column): Column
+}
+
+/**
+ * Tools for vectorized statistics on MLlib Vectors.
+ *
+ * The methods in this package provide various statistics for Vectors 
contained inside DataFrames.
+ *
+ * This class lets users pick the statistics they would like to extract 
for a given column. Here is
+ * an example in Scala:
+ * {{{
+ *   val dataframe = ... // Some dataframe containing a feature column
+ *   val allStats = dataframe.select(Summarizer.metrics("min", 
"max").summary($"features"))
+ *   val Row(min_, max_) = allStats.first()
+ * }}}
+ *
+ * If one wants to get a single metric, shortcuts are also available:
+ * {{{
+ *   val meanDF = dataframe.select(Summarizer.mean($"features"))
+ *   val Row(mean_) = meanDF.first()
+ * }}}
+ */
+@Since("2.2.0")
+object Summarizer extends Logging {
+
+  import SummaryBuilderImpl._
+
+  /**
+   * Given a list of metrics, provides a builder that it turns computes 
metrics from a column.
+   *
+   * See the documentation of [[Summarizer]] for an example.
+   *
+   * The following metrics are accepted (case sensitive):
+   *  - mean: a vector that contains the coefficient-wise mean.
+   *  - variance: a vector tha contains the coefficient-wise variance.
+   *  - count: the count of all vectors seen.
+   *  - numNonzeros: a vector with the number of non-zeros for each 
coefficients
+   *  - max: the maximum for each coefficient.
+   *  - min: the minimum for each coefficient.
+   *  - normL2: the Euclidian norm for each coefficient.
+   *  - normL1: the L1 norm of each coefficient (sum of the absolute 
values).
+   * @param firstMetric the metric being provided
+   * @param metrics additional metrics that can be provided.
+   * @return a builder.
+   * @throws IllegalArgumentException if one of the metric names is not 
understood.
+   */
+  @Since("2.2.0")
+  def metrics(firstMetric: String, metrics: String*): SummaryBuilder = {
+val (typedMetrics, computeMetrics) = 
getRelevantMetrics(Seq(firstMetric) ++ metrics)
+new SummaryBuilderImpl(typedMetrics, computeMetrics)
+  }
+
+  def mean(col: Column): Column = getSingleMetric(col, "mean")
+
+  def variance(col: Column): Column = getSingleMetric(col, "variance")

[GitHub] spark pull request #16019: [SPARK-18595] [SQL] Handling ignoreIfExists in Hi...

2017-03-29 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/16019


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17451
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75378/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17451
  
**[Test build #75378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75378/testReport)**
 for PR 17451 at commit 
[`ecdcbf6`](https://github.com/apache/spark/commit/ecdcbf665a3f91e06cae4879cf041940b583e2ee).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17451
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17451
  
**[Test build #75378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75378/testReport)**
 for PR 17451 at commit 
[`ecdcbf6`](https://github.com/apache/spark/commit/ecdcbf665a3f91e06cae4879cf041940b583e2ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17472: [SPARK-19999]: Fix for flakey tests due to java.n...

2017-03-29 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17472#discussion_r108837689
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -46,18 +46,22 @@
   private static final boolean unaligned;
   static {
 boolean _unaligned;
-// use reflection to access unaligned field
-try {
-  Class bitsClass =
-Class.forName("java.nio.Bits", false, 
ClassLoader.getSystemClassLoader());
-  Method unalignedMethod = bitsClass.getDeclaredMethod("unaligned");
-  unalignedMethod.setAccessible(true);
-  _unaligned = Boolean.TRUE.equals(unalignedMethod.invoke(null));
-} catch (Throwable t) {
-  // We at least know x86 and x64 support unaligned access.
-  String arch = System.getProperty("os.arch", "");
-  //noinspection DynamicRegexReplaceableByCompiledPattern
-  _unaligned = 
arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64|aarch64)$");
+if (arch.matches("^(ppc64le | ppc64)$")) {
+  // Since java.nio.Bits.unaligned() doesn't return true on ppc (See 
JDK-8165231), but ppc64 and ppc64le support it
+  _unaligned = true;
+} else {
+  try {
+Class bitsClass =
+  Class.forName("java.nio.Bits", false, 
ClassLoader.getSystemClassLoader());
+Method unalignedMethod = bitsClass.getDeclaredMethod("unaligned");
+unalignedMethod.setAccessible(true);
+_unaligned = Boolean.TRUE.equals(unalignedMethod.invoke(null));
+  } catch (Throwable t) {
+// We at least know x86 and x64 support unaligned access.
+String arch = System.getProperty("os.arch", "");
--- End diff --

Should we define `arch` before the `if` statement, now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #75377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75377/testReport)**
 for PR 17436 at commit 
[`9d14d33`](https://github.com/apache/spark/commit/9d14d3337ccf3e2255dfc79959823b3cf6bf3c0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17450: [SPARK-20121][SQL] simplify NullPropagation with ...

2017-03-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17450#discussion_r108837093
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -297,8 +297,8 @@ case class Lower(child: Expression) extends 
UnaryExpression with String2StringEx
 }
 
 /** A base trait for functions that compare two strings, returning a 
boolean. */
-trait StringPredicate extends Predicate with ImplicitCastInputTypes {
-  self: BinaryExpression =>
+abstract class StringPredicate extends BinaryExpression
+  with Predicate with ImplicitCastInputTypes {
--- End diff --

I finally got your point. `StringPredicate` is used for inferring the null 
constants in the rule `NullPropagation`. Thus, we should mark it as 
`NullIntolerant `. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17475: [SPARK-20148] [SQL] Extend the file commit API to...

2017-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17475


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...

2017-03-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17475
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r108835449
  
--- Diff: python/pyspark/sql/column.py ---
@@ -124,6 +124,35 @@ def _(self, other):
 return _
 
 
+like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
+rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+endswith_doc = ''' Return a Boolean :class:`Column` based on matching end 
of string.\n
+   :param other: string at end of line (do not use a regex 
`$`)\n
+   >>> df.filter(df.name.endswith('ice')).collect()
+   [Row(name=u'Alice', age=1)]
+   >>> df.filter(df.name.endswith('ice$')).collect()
+   []
+   '''
+startswith_doc = ''' Return a Boolean :class:`Column` based on a string 
match.\n
--- End diff --

Mind adding `_` as a prefix in this variable to indicate this is a private 
one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r108835232
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2747,6 +2747,17 @@ class Dataset[T] private[sql](
 }
   }
 
+  /**
+   * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark.
+   */
+  private[sql] def collectAsArrowToPython(): Int = {
+val payloadRdd = toArrowPayloadBytes()
+val payloadByteArrays = payloadRdd.collect()
--- End diff --

@BryanCutler Btw, it is not for performance gain I think. 
`toLocalIteratorAndServe` can avoid collect all data at once into the driver. 
So it may be good for the memory usage on the driver side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-29 Thread wesm
Github user wesm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r108834631
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2747,6 +2747,17 @@ class Dataset[T] private[sql](
 }
   }
 
+  /**
+   * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark.
+   */
+  private[sql] def collectAsArrowToPython(): Int = {
+val payloadRdd = toArrowPayloadBytes()
+val payloadByteArrays = payloadRdd.collect()
--- End diff --

You can stream out payloads as they come into the driver (maybe this is 
already happening). We may be able to play with the StreamWriter to reduce the 
driver memory usage in a follow up patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r108834618
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2828,4 +2839,16 @@ class Dataset[T] private[sql](
   Dataset(sparkSession, logicalPlan)
 }
   }
+
+  /** Convert to an RDD of ArrowPayload byte arrays */
+  private[sql] def toArrowPayloadBytes(): RDD[Array[Byte]] = {
+val schema_captured = this.schema
+queryExecution.toRdd.mapPartitionsInternal { iter =>
+  val converter = new ArrowConverters
+  val payload = converter.interalRowIterToPayload(iter, 
schema_captured)
+  val payloadBytes = ArrowConverters.payloadToByteArray(payload, 
schema_captured)
--- End diff --

Do you think we need a dedicated config for it? Or maybe a constant like 
1000 (rows)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-03-29 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/14617
  
Hi @squito , would you please review the code again? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17452: [SPARK-20123][build]$SPARK_HOME variable might have spac...

2017-03-29 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/17452
  
OK,will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r108834139
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2747,6 +2747,17 @@ class Dataset[T] private[sql](
 }
   }
 
+  /**
+   * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark.
+   */
+  private[sql] def collectAsArrowToPython(): Int = {
+val payloadRdd = toArrowPayloadBytes()
+val payloadByteArrays = payloadRdd.collect()
--- End diff --

Ok. As actually to make the Pandas's DataFrame, all data are needed to load 
into driver's memory, `toLocalIteratorAndServe` can't improve the memory usage 
in the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r108833855
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -56,6 +56,15 @@
 from pyspark.sql.utils import AnalysisException, ParseException, 
IllegalArgumentException
 
 
+_have_arrow = False
+try:
+import pyarrow
+_have_arrow = True
--- End diff --

Maybe give the param doc string as exception message?

I.e., `To make use of Apache Arrow for conversion, pyarrow must be 
installed and available on the calling Python process (Experimental)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r108833416
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -56,6 +56,15 @@
 from pyspark.sql.utils import AnalysisException, ParseException, 
IllegalArgumentException
 
 
+_have_arrow = False
+try:
+import pyarrow
+_have_arrow = True
--- End diff --

I mean we should throw an exception when `useArrow` is used but no pyspark 
is installed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17424: [SPARK-20089] [SQL] [TEST] Added DESC FUNCTION and DESC ...

2017-03-29 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17424
  
I am not sure if it helps review by dumping the output of `DESC EXTENDED 
FUNCTION` to the test. We may not frequently change the output as I see. IMHO, 
It is hard to tell which is "correct" output for a function, except for obvious 
incorrectness like wrong parameters, results. It is also hard to check the 
consistency of output in a 3000-lines text file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r108830896
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -450,6 +467,69 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
 }
   }
 
+  test("cint = cint2") {
+validateEstimatedStats(
+  Filter(EqualTo(attrInt, attrInt2), childStatsTestPlan(Seq(attrInt, 
attrInt2), 10L)),
+  Seq(attrInt -> ColumnStat(distinctCount = 3, min = Some(7), max = 
Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4),
+attrInt2 -> ColumnStat(distinctCount = 3, min = Some(7), max = 
Some(10),
+  nullCount = 0, avgLen = 4, maxLen = 4)),
+  expectedRowCount = 4)
+  }
+
+  test("cint > cint2") {
+validateEstimatedStats(
+  Filter(GreaterThan(attrInt, attrInt2), 
childStatsTestPlan(Seq(attrInt, attrInt2), 10L)),
+  Seq(attrInt -> ColumnStat(distinctCount = 3, min = Some(7), max = 
Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4),
+attrInt2 -> ColumnStat(distinctCount = 3, min = Some(7), max = 
Some(10),
+  nullCount = 0, avgLen = 4, maxLen = 4)),
+  expectedRowCount = 4)
+  }
+
+  test("cint < cint2") {
+validateEstimatedStats(
+  Filter(LessThan(attrInt, attrInt2), childStatsTestPlan(Seq(attrInt, 
attrInt2), 10L)),
+  Seq(attrInt -> ColumnStat(distinctCount = 3, min = Some(1), max = 
Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4),
+attrInt2 -> ColumnStat(distinctCount = 3, min = Some(7), max = 
Some(16),
+  nullCount = 0, avgLen = 4, maxLen = 4)),
+  expectedRowCount = 4)
+  }
+
+  test("cint = cint3") {
+// no records qualify due to no overlap
+validateEstimatedStats(
+  Filter(EqualTo(attrInt, attrInt3), childStatsTestPlan(Seq(attrInt, 
attrInt3), 10L)),
+  Seq(attrInt -> ColumnStat(distinctCount = 0, min = Some(1), max = 
Some(10),
--- End diff --

Once no overlap, is it still meaningful to keep `min`, `max`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2017-03-29 Thread gczsjdy
Github user gczsjdy commented on the issue:

https://github.com/apache/spark/pull/16476
  
@cloud-fan Do you have comment on this version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17475
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17475
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75374/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17475
  
**[Test build #75374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75374/testReport)**
 for PR 17475 at commit 
[`a541fdd`](https://github.com/apache/spark/commit/a541fdd34d71656c6932eadb3edad9b782a1ae22).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15334
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15334
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75376/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15334
  
**[Test build #75376 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75376/testReport)**
 for PR 15334 at commit 
[`35ec9f1`](https://github.com/apache/spark/commit/35ec9f18aea900caace5e6dc5e053ce10a3e5b5c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17465: [SPARK-20136][SQL] Add num files and metadata ope...

2017-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17465


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17465: [SPARK-20136][SQL] Add num files and metadata operation ...

2017-03-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17465
  
Let me merge this now. I will send a follow-up PR to take the logical 
planning time into account (otherwise in the vast majority of cases, i.e. 
pruned partitions, the metadata operation time will be approximately 0).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17465: [SPARK-20136][SQL] Add num files and metadata operation ...

2017-03-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17465
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75375/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17415
  
**[Test build #75375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75375/testReport)**
 for PR 17415 at commit 
[`9b98ff1`](https://github.com/apache/spark/commit/9b98ff1f7c8521e7d1277fd1f0c6e9a809a0d337).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17470: [SPARK-20146][SQL] fix comment missing issue for ...

2017-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17470


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17470: [SPARK-20146][SQL] fix comment missing issue for thrift ...

2017-03-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17470
  
Merging in master. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...

2017-03-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17475
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-03-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17375
  
Yea, it might be less important but I guess still it is a valid backport.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-03-29 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17375
  
tentative looks good, my only question is if someone wants to use Python 
3.6 (first released December 2016) are they likely to want to use it with Spark 
1.6 (first released January 2016)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-03-29 Thread zjffdu
GitHub user zjffdu reopened a pull request:

https://github.com/apache/spark/pull/17222

[SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support 
UDAFs

## What changes were proposed in this pull request?

Support register Java UDAFs in PySpark so that user can use Java UDAF in 
PySpark. Besides that I also add api in `UDFRegistration`

## How was this patch tested?

Unit test is added




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-19439

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17222


commit 8c1e837e2e97c08c4a5753c79aea71da772b0eaa
Author: Jeff Zhang 
Date:   2017-03-09T07:06:50Z

[SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support 
UDAFs

commit 89b8d6588d4d6258f9c4d84339775544d93e6e3c
Author: Jeff Zhang 
Date:   2017-03-10T00:28:12Z

add scala doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-03-29 Thread zjffdu
Github user zjffdu closed the pull request at:

https://github.com/apache/spark/pull/17222


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17473: [SPARK-19088][SQL] Fix 2.10 build.

2017-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17473


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.

2017-03-29 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17473
  
Merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.

2017-03-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17473
  
LGTM, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75370/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75370/testReport)**
 for PR 17394 at commit 
[`36b501e`](https://github.com/apache/spark/commit/36b501ebb18dca3195e44be92accd3fada479152).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17473
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75371/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17473
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17473
  
**[Test build #75371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75371/testReport)**
 for PR 17473 at commit 
[`36d12fd`](https://github.com/apache/spark/commit/36d12fd26f0919f06f887e8cb0b1f4b19a16f989).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15334
  
**[Test build #75376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75376/testReport)**
 for PR 15334 at commit 
[`35ec9f1`](https://github.com/apache/spark/commit/35ec9f18aea900caace5e6dc5e053ce10a3e5b5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75369/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17415
  
**[Test build #75369 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75369/testReport)**
 for PR 17415 at commit 
[`70ac70c`](https://github.com/apache/spark/commit/70ac70cf0ab403e136d4114869174db171673364).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17415
  
**[Test build #75375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75375/testReport)**
 for PR 17415 at commit 
[`9b98ff1`](https://github.com/apache/spark/commit/9b98ff1f7c8521e7d1277fd1f0c6e9a809a0d337).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17475
  
**[Test build #75374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75374/testReport)**
 for PR 17475 at commit 
[`a541fdd`](https://github.com/apache/spark/commit/a541fdd34d71656c6932eadb3edad9b782a1ae22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17417: [DOCS] Docs-only improvements

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17417
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75368/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17417: [DOCS] Docs-only improvements

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17417
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17475: [SPARK-20148] [SQL] Extend the file commit API to...

2017-03-29 Thread ericl
GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/17475

[SPARK-20148] [SQL] Extend the file commit API to allow subscribing to task 
commit messages

## What changes were proposed in this pull request?

The internal FileCommitProtocol interface returns all task commit messages 
in bulk to the implementation when a job finishes. However, it is sometimes 
useful to access those messages before the job completes, so that the driver 
gets incremental progress updates before the job finishes.

This adds an `onTaskCommit` listener to the internal api.

## How was this patch tested?

Unit tests.

cc @rxin

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark file-commit-api-ext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17475


commit a541fdd34d71656c6932eadb3edad9b782a1ae22
Author: Eric Liang 
Date:   2017-03-29T23:16:40Z

initial commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17417: [DOCS] Docs-only improvements

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17417
  
**[Test build #75368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75368/testReport)**
 for PR 17417 at commit 
[`913dbb8`](https://github.com/apache/spark/commit/913dbb81c6680e6063875a3fd7ddd0214bf7a7c4).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17416: [SPARK-20075][CORE][WIP] Support classifier, packaging i...

2017-03-29 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/17416
  
So is the problem that it downloads `stanford-corenlp-3.4.1-models.jar` but 
thinks it is `stanford-corenlp-3.4.1.jar`?

It looks like it might be possible to add the classifier to 
`ModuleRevisionId.newInstance`, have you tried just doing that instead of 
`dd.addDependencyArtifact`?

If I have some time, I'll give it a shot to run it and see what's going on..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17445: [SPARK-20115] [CORE] Fix DAGScheduler to recompute all t...

2017-03-29 Thread umehrot2
Github user umehrot2 commented on the issue:

https://github.com/apache/spark/pull/17445
  
@kayousterhout Thanks for your response, and for that link. Well it does 
seem like #17088 addresses the same issue as this PR.

However, I would like the you all to review this PR as well, because I 
think it more clearly organizes the code between handling of internal and 
external shuffle failures. It also removes a lot of the code duplication which 
is part of the other PR. Further, it adds an epoch check for the 'host'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r108810108
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,8 +565,143 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator such as =, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+val ndvLeft = BigDecimal(colStatLeft.distinctCount)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+val ndvRight = BigDecimal(colStatRight.distinctCount)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  case _: LessThan =>
+(minLeft >= maxRight, maxLeft < minRight)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight, maxLeft <= minRight)
+  case _: GreaterThan =>
+(maxLeft <= minRight, minLeft > maxRight)
+  case _: GreaterThanOrEqual =>
+(maxLeft < minRight, minLeft >= maxRight)
+  case _: EqualTo =>
+((maxLeft < minRight) || (maxRight < minLeft),
+  (minLeft == minRight) && (maxLeft == maxRight))
+  case _: EqualNullSafe =>
+// For null-safe equality, we use a very restrictive condition to 
evaluate its overlap.
+// If null values exists, we set it to partial overlap.
+(((maxLeft < minRight) || (maxRight < minLeft))
+&& colStatLeft.nullCount == 0 && colStatRight.nullCount == 0,
+  ((minLeft == minRight) && (maxLeft == maxRight))
+&& colStatLeft.nullCount == 0 && colStatRight.nullCount == 0
+)
+}
+
+var percent = BigDecimal(1.0)
+if (noOverlap) {
+  percent = 0.0
+} else if (completeOverlap) {
+  percent = 1.0
+} else {
+  // For partial overlap, we use an empirical value 1/3 as suggested 
by the book
+  // "Database Systems, the complete book".
+  percent = 1.0/3.0
+
+  if (update) {
+// Need to adjust new min/max after the filter condition is applied
+
+val ndvLeft = BigDecimal(colStatLeft.distinctCount)
+var newNdvLeft = (ndvLeft * percent).setScale(0, 
RoundingMode.HALF_UP).toBigInt()
+if (newNdvLeft < 1) newNdvLeft = 1
+val ndvRight = BigDecimal(colStatLeft.distinctCount)
+var newNdvRight = (ndvRight * percent).setScale(0, 
RoundingMode.HALF_UP).toBigInt()
+if 

[GitHub] spark pull request #17450: [SPARK-20121][SQL] simplify NullPropagation with ...

2017-03-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17450#discussion_r108809796
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -297,8 +297,8 @@ case class Lower(child: Expression) extends 
UnaryExpression with String2StringEx
 }
 
 /** A base trait for functions that compare two strings, returning a 
boolean. */
-trait StringPredicate extends Predicate with ImplicitCastInputTypes {
-  self: BinaryExpression =>
+abstract class StringPredicate extends BinaryExpression
+  with Predicate with ImplicitCastInputTypes {
--- End diff --

See above `StringRegexExpression`, similar to it, in order to simplify the 
`NullPropagation`, we need to add `NullIntolerant`, so it can propagate null 
value...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17474
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75373/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17474
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17474
  
**[Test build #75373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75373/testReport)**
 for PR 17474 at commit 
[`5460e78`](https://github.com/apache/spark/commit/5460e78b3907a0ce6f8983bd4dcc83d02acc2b2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15326: [SPARK-17759] [CORE] Avoid adding duplicate schedulables

2017-03-29 Thread erenavsarogullari
Github user erenavsarogullari commented on the issue:

https://github.com/apache/spark/pull/15326
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17474
  
**[Test build #75373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75373/testReport)**
 for PR 17474 at commit 
[`5460e78`](https://github.com/apache/spark/commit/5460e78b3907a0ce6f8983bd4dcc83d02acc2b2d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17474: [Minor][SparkR]: Add run command comment in examp...

2017-03-29 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/17474

[Minor][SparkR]: Add run command comment in examples

## What changes were proposed in this pull request?

There are two examples in r folder missing the run commands.

In this PR, I just add the missing comment, which is consistent with other 
examples.

## How was this patch tested?

Manual test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark stat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17474


commit e095333508a28cf024925610fb127e1f05b3eec2
Author: wm...@hotmail.com 
Date:   2017-03-29T21:40:59Z

simple fix

commit 5460e78b3907a0ce6f8983bd4dcc83d02acc2b2d
Author: wm...@hotmail.com 
Date:   2017-03-29T22:33:12Z

revert ignore




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17472
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75372/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17472
  
**[Test build #75372 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75372/testReport)**
 for PR 17472 at commit 
[`bf7cc24`](https://github.com/apache/spark/commit/bf7cc24f213a2cf043a579846859647da850f1f8).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17472
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17472
  
**[Test build #75372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75372/testReport)**
 for PR 17472 at commit 
[`bf7cc24`](https://github.com/apache/spark/commit/bf7cc24f213a2cf043a579846859647da850f1f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17472
  
Clear code comments can help code reading. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17472
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17472: [SPARK-19999]: Fix for flakey tests due to java.n...

2017-03-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17472#discussion_r108802741
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -46,18 +46,22 @@
   private static final boolean unaligned;
   static {
 boolean _unaligned;
-// use reflection to access unaligned field
-try {
-  Class bitsClass =
-Class.forName("java.nio.Bits", false, 
ClassLoader.getSystemClassLoader());
-  Method unalignedMethod = bitsClass.getDeclaredMethod("unaligned");
-  unalignedMethod.setAccessible(true);
-  _unaligned = Boolean.TRUE.equals(unalignedMethod.invoke(null));
-} catch (Throwable t) {
-  // We at least know x86 and x64 support unaligned access.
-  String arch = System.getProperty("os.arch", "");
-  //noinspection DynamicRegexReplaceableByCompiledPattern
-  _unaligned = 
arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64|aarch64)$");
+if (arch.matches("^(ppc64le | ppc64)$")) {
+  // Since java.nio.Bits.unaligned() doesn't return true on ppc (See 
JDK-8165231), but ppc64 and ppc64le support it
--- End diff --

It is longer than 101 characters. It will fail the style test.

You can check it in your local environment using the command:
> dev/lint-scala



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17449: [SPARK-20120][SQL] spark-sql support silent mode

2017-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17449


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-03-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r108802255
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -735,7 +749,12 @@ object SparkSubmit extends CommandLineUtils {
 }
 
 try {
-  mainMethod.invoke(null, childArgs.toArray)
+  if (isSparkApp) {
+val envvars = Map[String, String]() ++ sys.env
+mainMethod.invoke(null, childArgs.toArray, childSparkConf, 
envvars.toMap)
--- End diff --

In that case it might be worth it to add a check in `SparkLauncher` to 
throw an exception in case env variables are set, and the app is started in a 
thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17449: [SPARK-20120][SQL] spark-sql support silent mode

2017-03-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17449
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-03-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r108801550
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -735,7 +749,12 @@ object SparkSubmit extends CommandLineUtils {
 }
 
 try {
-  mainMethod.invoke(null, childArgs.toArray)
+  if (isSparkApp) {
+val envvars = Map[String, String]() ++ sys.env
+mainMethod.invoke(null, childArgs.toArray, childSparkConf, 
envvars.toMap)
--- End diff --

Lets just remove it. @kishorvpatil 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75367/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. in P...

2017-03-29 Thread allengeorge
Github user allengeorge commented on the issue:

https://github.com/apache/spark/pull/17400
  
I suggest the following code for `outputOrdering`:

```
override def outputOrdering: Seq[SortOrder] = child.outputOrdering.map {
case s @ SortOrder(e, _) =>
  s.copy(child = maybeReplaceExpr(e))
case s =>
  s
  }```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17415
  
**[Test build #75367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75367/testReport)**
 for PR 17415 at commit 
[`64bf43e`](https://github.com/apache/spark/commit/64bf43e562a3c257b847502eae651a8887eaddcf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...

2017-03-29 Thread samelamin
Github user samelamin commented on the issue:

https://github.com/apache/spark/pull/17472
  
@gatorsmile moved the comment per your suggestion, but to be honest if the 
comment is unclear surly the first thing someone will do is check that JIRA? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-29 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16541
  
I sent a pr #17473.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >