[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17467 @HyukjinKwon What should be the next steps for this PR. Are there any Spark-Kinesis experts who can review the patch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17324 The test was interrupted and need a retest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17450: [SPARK-20121][SQL] simplify NullPropagation with ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17450#discussion_r108843221 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -297,8 +297,8 @@ case class Lower(child: Expression) extends UnaryExpression with String2StringEx } /** A base trait for functions that compare two strings, returning a boolean. */ -trait StringPredicate extends Predicate with ImplicitCastInputTypes { - self: BinaryExpression => +abstract class StringPredicate extends BinaryExpression + with Predicate with ImplicitCastInputTypes { --- End diff -- Yeah. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r108840757 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -0,0 +1,399 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import org.scalatest.exceptions.TestFailedException + +import org.apache.spark.{SparkException, SparkFunSuite} +import org.apache.spark.ml.linalg.{Vector, Vectors} +import org.apache.spark.ml.stat.SummaryBuilderImpl.Buffer +import org.apache.spark.ml.util.TestingUtils._ +import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.stat.{MultivariateOnlineSummarizer, Statistics} +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema + +class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext { + + import testImplicits._ + import Summarizer._ + + private case class ExpectedMetrics( + mean: Seq[Double], + variance: Seq[Double], + count: Long, + numNonZeros: Seq[Long], + max: Seq[Double], + min: Seq[Double], + normL2: Seq[Double], + normL1: Seq[Double]) + + // The input is expected to be either a sparse vector, a dense vector or an array of doubles + // (which will be converted to a dense vector) + // The expected is the list of all the known metrics. + // + // The tests take an list of input vectors and a list of all the summary values that + // are expected for this input. They currently test against some fixed subset of the + // metrics, but should be made fuzzy in the future. + + private def testExample(name: String, input: Seq[Any], exp: ExpectedMetrics): Unit = { +def inputVec: Seq[Vector] = input.map { + case x: Array[Double @unchecked] => Vectors.dense(x) + case x: Seq[Double @unchecked] => Vectors.dense(x.toArray) + case x: Vector => x + case x => throw new Exception(x.toString) +} + +val s = { + val s2 = new MultivariateOnlineSummarizer + inputVec.foreach(v => s2.add(OldVectors.fromML(v))) + s2 +} + +// Because the Spark context is reset between tests, we cannot hold a reference onto it. +def wrapped() = { + val df = sc.parallelize(inputVec).map(Tuple1.apply).toDF("features") + val c = df.col("features") + (df, c) +} + +registerTest(s"$name - mean only") { + val (df, c) = wrapped() + compare(df.select(metrics("mean").summary(c), mean(c)), Seq(Row(exp.mean), s.mean)) +} + +registerTest(s"$name - mean only (direct)") { + val (df, c) = wrapped() + compare(df.select(mean(c)), Seq(exp.mean)) +} + +registerTest(s"$name - variance only") { + val (df, c) = wrapped() + compare(df.select(metrics("variance").summary(c), variance(c)), +Seq(Row(exp.variance), s.variance)) +} + +registerTest(s"$name - variance only (direct)") { + val (df, c) = wrapped() + compare(df.select(variance(c)), Seq(s.variance)) +} + +registerTest(s"$name - count only") { + val (df, c) = wrapped() + compare(df.select(metrics("count").summary(c), count(c)), +Seq(Row(exp.count), exp.count)) +} + +registerTest(s"$name - count only (direct)") { + val (df, c) = wrapped() + compare(df.select(count(c)), +Seq(exp.count)) +} + +registerTest(s"$name - numNonZeros only") { + val (df, c) = wrapped() + compare(df.select(metrics("numNonZeros").summary(c), numNonZeros(c)), +Seq(Row(exp.numNonZeros), exp.numNonZeros)) +} + +registerTest(s"$name -
[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r108840709 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import org.scalatest.exceptions.TestFailedException + +import org.apache.spark.{SparkException, SparkFunSuite} +import org.apache.spark.ml.linalg.{Vector, Vectors} +import org.apache.spark.ml.stat.SummaryBuilderImpl.Buffer +import org.apache.spark.ml.util.TestingUtils._ +import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema + +class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext { + + import testImplicits._ + import Summarizer._ + + private case class ExpectedMetrics( + mean: Seq[Double], + variance: Seq[Double], + count: Long, + numNonZeros: Seq[Long], + max: Seq[Double], + min: Seq[Double], + normL2: Seq[Double], + normL1: Seq[Double]) + + // The input is expected to be either a sparse vector, a dense vector or an array of doubles + // (which will be converted to a dense vector) + // The expected is the list of all the known metrics. + // + // The tests take an list of input vectors and a list of all the summary values that + // are expected for this input. They currently test against some fixed subset of the + // metrics, but should be made fuzzy in the future. + + private def testExample(name: String, input: Seq[Any], exp: ExpectedMetrics): Unit = { +def inputVec: Seq[Vector] = input.map { + case x: Array[Double @unchecked] => Vectors.dense(x) + case x: Seq[Double @unchecked] => Vectors.dense(x.toArray) + case x: Vector => x + case x => throw new Exception(x.toString) +} + +val s = { + val s2 = new MultivariateOnlineSummarizer + inputVec.foreach(v => s2.add(OldVectors.fromML(v))) + s2 +} + +// Because the Spark context is reset between tests, we cannot hold a reference onto it. +def wrapped() = { + val df = sc.parallelize(inputVec).map(Tuple1.apply).toDF("features") + val c = df.col("features") + (df, c) +} + +registerTest(s"$name - mean only") { + val (df, c) = wrapped() + compare(df.select(metrics("mean").summary(c), mean(c)), Seq(Row(exp.mean), s.mean)) +} + +registerTest(s"$name - mean only (direct)") { + val (df, c) = wrapped() + compare(df.select(mean(c)), Seq(exp.mean)) +} + +registerTest(s"$name - variance only") { + val (df, c) = wrapped() + compare(df.select(metrics("variance").summary(c), variance(c)), +Seq(Row(exp.variance), s.variance)) +} + +registerTest(s"$name - variance only (direct)") { + val (df, c) = wrapped() + compare(df.select(variance(c)), Seq(s.variance)) +} + +registerTest(s"$name - count only") { + val (df, c) = wrapped() + compare(df.select(metrics("count").summary(c), count(c)), +Seq(Row(exp.count), exp.count)) +} + +registerTest(s"$name - count only (direct)") { + val (df, c) = wrapped() + compare(df.select(count(c)), +Seq(exp.count)) +} + +registerTest(s"$name - numNonZeros only") { + val (df, c) = wrapped() + compare(df.select(metrics("numNonZeros").summary(c), numNonZeros(c)), +Seq(Row(exp.numNonZeros), exp.numNonZeros)) +} + +registerTest(s"$name - numNonZeros only
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Thank you so much! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17476: [SPARK-20151][SQL] Account for partition pruning in scan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17476 **[Test build #75379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75379/testReport)** for PR 17476 at commit [`8789cf0`](https://github.com/apache/spark/commit/8789cf04ea4f7addbcd8da9d83615ee96d9bd192). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17476: [SPARK-20151][SQL] Account for partition pruning in scan...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17476 cc @ericl, @bogdanrdc, @adrian-ionescu, @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17476: [SPARK-20151][SQL] Account for partition pruning ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17476 [SPARK-20151][SQL] Account for partition pruning in scan metadataTime metrics ## What changes were proposed in this pull request? After SPARK-20136, we report metadata timing metrics in scan operator. However, that timing metric doesn't include one of the most important part of metadata, which is partition pruning. This patch adds that time measurement to the scan metrics. ## How was this patch tested? N/A - I tried adding a test in SQLMetricsSuite but it was extremely convoluted to the point that I'm not sure if this is worth it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-20151 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17476.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17476 commit 8789cf04ea4f7addbcd8da9d83615ee96d9bd192 Author: Reynold XinDate: 2017-03-30T04:46:45Z [SPARK-20151][SQL] Account for partition pruning in scan metadataTime metrics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17474: [Minor][SparkR]: Add run command comment in examp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17474 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17474 merged to master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17251 Will review it tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17419: [SPARK-19634][ML] Multivariate summarizer - dataf...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17419#discussion_r108838518 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -0,0 +1,746 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import breeze.{linalg => la} +import breeze.linalg.{Vector => BV} +import breeze.numerics + +import org.apache.spark.SparkException +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, Vectors, VectorUDT} +import org.apache.spark.sql.Column +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Expression, UnsafeArrayData, UnsafeProjection, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, Complete, TypedImperativeAggregate} +import org.apache.spark.sql.types._ + + +/** + * A builder object that provides summary statistics about a given column. + * + * Users should not directly create such builders, but instead use one of the methods in + * [[Summarizer]]. + */ +@Since("2.2.0") +abstract class SummaryBuilder { + /** + * Returns an aggregate object that contains the summary of the column with the requested metrics. + * @param column a column that contains Vector object. + * @return an aggregate column that contains the statistics. The exact content of this + * structure is determined during the creation of the builder. + */ + @Since("2.2.0") + def summary(column: Column): Column +} + +/** + * Tools for vectorized statistics on MLlib Vectors. + * + * The methods in this package provide various statistics for Vectors contained inside DataFrames. + * + * This class lets users pick the statistics they would like to extract for a given column. Here is + * an example in Scala: + * {{{ + * val dataframe = ... // Some dataframe containing a feature column + * val allStats = dataframe.select(Summarizer.metrics("min", "max").summary($"features")) + * val Row(min_, max_) = allStats.first() + * }}} + * + * If one wants to get a single metric, shortcuts are also available: + * {{{ + * val meanDF = dataframe.select(Summarizer.mean($"features")) + * val Row(mean_) = meanDF.first() + * }}} + */ +@Since("2.2.0") +object Summarizer extends Logging { + + import SummaryBuilderImpl._ + + /** + * Given a list of metrics, provides a builder that it turns computes metrics from a column. + * + * See the documentation of [[Summarizer]] for an example. + * + * The following metrics are accepted (case sensitive): + * - mean: a vector that contains the coefficient-wise mean. + * - variance: a vector tha contains the coefficient-wise variance. + * - count: the count of all vectors seen. + * - numNonzeros: a vector with the number of non-zeros for each coefficients + * - max: the maximum for each coefficient. + * - min: the minimum for each coefficient. + * - normL2: the Euclidian norm for each coefficient. + * - normL1: the L1 norm of each coefficient (sum of the absolute values). + * @param firstMetric the metric being provided + * @param metrics additional metrics that can be provided. + * @return a builder. + * @throws IllegalArgumentException if one of the metric names is not understood. + */ + @Since("2.2.0") + def metrics(firstMetric: String, metrics: String*): SummaryBuilder = { +val (typedMetrics, computeMetrics) = getRelevantMetrics(Seq(firstMetric) ++ metrics) +new SummaryBuilderImpl(typedMetrics, computeMetrics) + } + + def mean(col: Column): Column = getSingleMetric(col, "mean") + + def variance(col: Column): Column = getSingleMetric(col, "variance")
[GitHub] spark pull request #16019: [SPARK-18595] [SQL] Handling ignoreIfExists in Hi...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/16019 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17451 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75378/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17451 **[Test build #75378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75378/testReport)** for PR 17451 at commit [`ecdcbf6`](https://github.com/apache/spark/commit/ecdcbf665a3f91e06cae4879cf041940b583e2ee). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17451 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17451 **[Test build #75378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75378/testReport)** for PR 17451 at commit [`ecdcbf6`](https://github.com/apache/spark/commit/ecdcbf665a3f91e06cae4879cf041940b583e2ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17472: [SPARK-19999]: Fix for flakey tests due to java.n...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17472#discussion_r108837689 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -46,18 +46,22 @@ private static final boolean unaligned; static { boolean _unaligned; -// use reflection to access unaligned field -try { - Class bitsClass = -Class.forName("java.nio.Bits", false, ClassLoader.getSystemClassLoader()); - Method unalignedMethod = bitsClass.getDeclaredMethod("unaligned"); - unalignedMethod.setAccessible(true); - _unaligned = Boolean.TRUE.equals(unalignedMethod.invoke(null)); -} catch (Throwable t) { - // We at least know x86 and x64 support unaligned access. - String arch = System.getProperty("os.arch", ""); - //noinspection DynamicRegexReplaceableByCompiledPattern - _unaligned = arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64|aarch64)$"); +if (arch.matches("^(ppc64le | ppc64)$")) { + // Since java.nio.Bits.unaligned() doesn't return true on ppc (See JDK-8165231), but ppc64 and ppc64le support it + _unaligned = true; +} else { + try { +Class bitsClass = + Class.forName("java.nio.Bits", false, ClassLoader.getSystemClassLoader()); +Method unalignedMethod = bitsClass.getDeclaredMethod("unaligned"); +unalignedMethod.setAccessible(true); +_unaligned = Boolean.TRUE.equals(unalignedMethod.invoke(null)); + } catch (Throwable t) { +// We at least know x86 and x64 support unaligned access. +String arch = System.getProperty("os.arch", ""); --- End diff -- Should we define `arch` before the `if` statement, now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #75377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75377/testReport)** for PR 17436 at commit [`9d14d33`](https://github.com/apache/spark/commit/9d14d3337ccf3e2255dfc79959823b3cf6bf3c0a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17450: [SPARK-20121][SQL] simplify NullPropagation with ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17450#discussion_r108837093 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -297,8 +297,8 @@ case class Lower(child: Expression) extends UnaryExpression with String2StringEx } /** A base trait for functions that compare two strings, returning a boolean. */ -trait StringPredicate extends Predicate with ImplicitCastInputTypes { - self: BinaryExpression => +abstract class StringPredicate extends BinaryExpression + with Predicate with ImplicitCastInputTypes { --- End diff -- I finally got your point. `StringPredicate` is used for inferring the null constants in the rule `NullPropagation`. Thus, we should mark it as `NullIntolerant `. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17475: [SPARK-20148] [SQL] Extend the file commit API to...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17475 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17475 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r108835449 --- Diff: python/pyspark/sql/column.py --- @@ -124,6 +124,35 @@ def _(self, other): return _ +like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" +rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +endswith_doc = ''' Return a Boolean :class:`Column` based on matching end of string.\n + :param other: string at end of line (do not use a regex `$`)\n + >>> df.filter(df.name.endswith('ice')).collect() + [Row(name=u'Alice', age=1)] + >>> df.filter(df.name.endswith('ice$')).collect() + [] + ''' +startswith_doc = ''' Return a Boolean :class:`Column` based on a string match.\n --- End diff -- Mind adding `_` as a prefix in this variable to indicate this is a private one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r108835232 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2747,6 +2747,17 @@ class Dataset[T] private[sql]( } } + /** + * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark. + */ + private[sql] def collectAsArrowToPython(): Int = { +val payloadRdd = toArrowPayloadBytes() +val payloadByteArrays = payloadRdd.collect() --- End diff -- @BryanCutler Btw, it is not for performance gain I think. `toLocalIteratorAndServe` can avoid collect all data at once into the driver. So it may be good for the memory usage on the driver side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...
Github user wesm commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r108834631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2747,6 +2747,17 @@ class Dataset[T] private[sql]( } } + /** + * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark. + */ + private[sql] def collectAsArrowToPython(): Int = { +val payloadRdd = toArrowPayloadBytes() +val payloadByteArrays = payloadRdd.collect() --- End diff -- You can stream out payloads as they come into the driver (maybe this is already happening). We may be able to play with the StreamWriter to reduce the driver memory usage in a follow up patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r108834618 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2828,4 +2839,16 @@ class Dataset[T] private[sql]( Dataset(sparkSession, logicalPlan) } } + + /** Convert to an RDD of ArrowPayload byte arrays */ + private[sql] def toArrowPayloadBytes(): RDD[Array[Byte]] = { +val schema_captured = this.schema +queryExecution.toRdd.mapPartitionsInternal { iter => + val converter = new ArrowConverters + val payload = converter.interalRowIterToPayload(iter, schema_captured) + val payloadBytes = ArrowConverters.payloadToByteArray(payload, schema_captured) --- End diff -- Do you think we need a dedicated config for it? Or maybe a constant like 1000 (rows)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/14617 Hi @squito , would you please review the code again? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17452: [SPARK-20123][build]$SPARK_HOME variable might have spac...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/17452 OKï¼will do. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r108834139 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2747,6 +2747,17 @@ class Dataset[T] private[sql]( } } + /** + * Collect a Dataset as ArrowPayload byte arrays and serve to PySpark. + */ + private[sql] def collectAsArrowToPython(): Int = { +val payloadRdd = toArrowPayloadBytes() +val payloadByteArrays = payloadRdd.collect() --- End diff -- Ok. As actually to make the Pandas's DataFrame, all data are needed to load into driver's memory, `toLocalIteratorAndServe` can't improve the memory usage in the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r108833855 --- Diff: python/pyspark/sql/tests.py --- @@ -56,6 +56,15 @@ from pyspark.sql.utils import AnalysisException, ParseException, IllegalArgumentException +_have_arrow = False +try: +import pyarrow +_have_arrow = True --- End diff -- Maybe give the param doc string as exception message? I.e., `To make use of Apache Arrow for conversion, pyarrow must be installed and available on the calling Python process (Experimental)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r108833416 --- Diff: python/pyspark/sql/tests.py --- @@ -56,6 +56,15 @@ from pyspark.sql.utils import AnalysisException, ParseException, IllegalArgumentException +_have_arrow = False +try: +import pyarrow +_have_arrow = True --- End diff -- I mean we should throw an exception when `useArrow` is used but no pyspark is installed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17424: [SPARK-20089] [SQL] [TEST] Added DESC FUNCTION and DESC ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17424 I am not sure if it helps review by dumping the output of `DESC EXTENDED FUNCTION` to the test. We may not frequently change the output as I see. IMHO, It is hard to tell which is "correct" output for a function, except for obvious incorrectness like wrong parameters, results. It is also hard to check the consistency of output in a 3000-lines text file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108830896 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -450,6 +467,69 @@ class FilterEstimationSuite extends StatsEstimationTestBase { } } + test("cint = cint2") { +validateEstimatedStats( + Filter(EqualTo(attrInt, attrInt2), childStatsTestPlan(Seq(attrInt, attrInt2), 10L)), + Seq(attrInt -> ColumnStat(distinctCount = 3, min = Some(7), max = Some(10), +nullCount = 0, avgLen = 4, maxLen = 4), +attrInt2 -> ColumnStat(distinctCount = 3, min = Some(7), max = Some(10), + nullCount = 0, avgLen = 4, maxLen = 4)), + expectedRowCount = 4) + } + + test("cint > cint2") { +validateEstimatedStats( + Filter(GreaterThan(attrInt, attrInt2), childStatsTestPlan(Seq(attrInt, attrInt2), 10L)), + Seq(attrInt -> ColumnStat(distinctCount = 3, min = Some(7), max = Some(10), +nullCount = 0, avgLen = 4, maxLen = 4), +attrInt2 -> ColumnStat(distinctCount = 3, min = Some(7), max = Some(10), + nullCount = 0, avgLen = 4, maxLen = 4)), + expectedRowCount = 4) + } + + test("cint < cint2") { +validateEstimatedStats( + Filter(LessThan(attrInt, attrInt2), childStatsTestPlan(Seq(attrInt, attrInt2), 10L)), + Seq(attrInt -> ColumnStat(distinctCount = 3, min = Some(1), max = Some(10), +nullCount = 0, avgLen = 4, maxLen = 4), +attrInt2 -> ColumnStat(distinctCount = 3, min = Some(7), max = Some(16), + nullCount = 0, avgLen = 4, maxLen = 4)), + expectedRowCount = 4) + } + + test("cint = cint3") { +// no records qualify due to no overlap +validateEstimatedStats( + Filter(EqualTo(attrInt, attrInt3), childStatsTestPlan(Seq(attrInt, attrInt3), 10L)), + Seq(attrInt -> ColumnStat(distinctCount = 0, min = Some(1), max = Some(10), --- End diff -- Once no overlap, is it still meaningful to keep `min`, `max`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on the issue: https://github.com/apache/spark/pull/16476 @cloud-fan Do you have comment on this version? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17475 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17475 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75374/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17475 **[Test build #75374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75374/testReport)** for PR 17475 at commit [`a541fdd`](https://github.com/apache/spark/commit/a541fdd34d71656c6932eadb3edad9b782a1ae22). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15334 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15334 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75376/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15334 **[Test build #75376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75376/testReport)** for PR 15334 at commit [`35ec9f1`](https://github.com/apache/spark/commit/35ec9f18aea900caace5e6dc5e053ce10a3e5b5c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17465: [SPARK-20136][SQL] Add num files and metadata ope...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17465 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17465: [SPARK-20136][SQL] Add num files and metadata operation ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17465 Let me merge this now. I will send a follow-up PR to take the logical planning time into account (otherwise in the vast majority of cases, i.e. pruned partitions, the metadata operation time will be approximately 0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17465: [SPARK-20136][SQL] Add num files and metadata operation ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17465 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75375/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17415 **[Test build #75375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75375/testReport)** for PR 17415 at commit [`9b98ff1`](https://github.com/apache/spark/commit/9b98ff1f7c8521e7d1277fd1f0c6e9a809a0d337). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17470: [SPARK-20146][SQL] fix comment missing issue for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17470 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17470: [SPARK-20146][SQL] fix comment missing issue for thrift ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17470 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17475 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17375 Yea, it might be less important but I guess still it is a valid backport. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17375 tentative looks good, my only question is if someone wants to use Python 3.6 (first released December 2016) are they likely to want to use it with Spark 1.6 (first released January 2016)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
GitHub user zjffdu reopened a pull request: https://github.com/apache/spark/pull/17222 [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support UDAFs ## What changes were proposed in this pull request? Support register Java UDAFs in PySpark so that user can use Java UDAF in PySpark. Besides that I also add api in `UDFRegistration` ## How was this patch tested? Unit test is added You can merge this pull request into a Git repository by running: $ git pull https://github.com/zjffdu/spark SPARK-19439 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17222 commit 8c1e837e2e97c08c4a5753c79aea71da772b0eaa Author: Jeff ZhangDate: 2017-03-09T07:06:50Z [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support UDAFs commit 89b8d6588d4d6258f9c4d84339775544d93e6e3c Author: Jeff Zhang Date: 2017-03-10T00:28:12Z add scala doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
Github user zjffdu closed the pull request at: https://github.com/apache/spark/pull/17222 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17473: [SPARK-19088][SQL] Fix 2.10 build.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17473 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17473 Merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17473 LGTM, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75370/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17394 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17394 **[Test build #75370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75370/testReport)** for PR 17394 at commit [`36b501e`](https://github.com/apache/spark/commit/36b501ebb18dca3195e44be92accd3fada479152). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17473 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75371/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17473 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17473: [SPARK-19088][SQL] Fix 2.10 build.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17473 **[Test build #75371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75371/testReport)** for PR 17473 at commit [`36d12fd`](https://github.com/apache/spark/commit/36d12fd26f0919f06f887e8cb0b1f4b19a16f989). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15334: [SPARK-10367][SQL][WIP] Support Parquet logical type INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15334 **[Test build #75376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75376/testReport)** for PR 15334 at commit [`35ec9f1`](https://github.com/apache/spark/commit/35ec9f18aea900caace5e6dc5e053ce10a3e5b5c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75369/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17415 **[Test build #75369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75369/testReport)** for PR 17415 at commit [`70ac70c`](https://github.com/apache/spark/commit/70ac70cf0ab403e136d4114869174db171673364). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17415 **[Test build #75375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75375/testReport)** for PR 17415 at commit [`9b98ff1`](https://github.com/apache/spark/commit/9b98ff1f7c8521e7d1277fd1f0c6e9a809a0d337). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17475: [SPARK-20148] [SQL] Extend the file commit API to allow ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17475 **[Test build #75374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75374/testReport)** for PR 17475 at commit [`a541fdd`](https://github.com/apache/spark/commit/a541fdd34d71656c6932eadb3edad9b782a1ae22). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17417: [DOCS] Docs-only improvements
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17417 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75368/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17417: [DOCS] Docs-only improvements
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17417 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17475: [SPARK-20148] [SQL] Extend the file commit API to...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/17475 [SPARK-20148] [SQL] Extend the file commit API to allow subscribing to task commit messages ## What changes were proposed in this pull request? The internal FileCommitProtocol interface returns all task commit messages in bulk to the implementation when a job finishes. However, it is sometimes useful to access those messages before the job completes, so that the driver gets incremental progress updates before the job finishes. This adds an `onTaskCommit` listener to the internal api. ## How was this patch tested? Unit tests. cc @rxin You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark file-commit-api-ext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17475.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17475 commit a541fdd34d71656c6932eadb3edad9b782a1ae22 Author: Eric LiangDate: 2017-03-29T23:16:40Z initial commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17417: [DOCS] Docs-only improvements
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17417 **[Test build #75368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75368/testReport)** for PR 17417 at commit [`913dbb8`](https://github.com/apache/spark/commit/913dbb81c6680e6063875a3fd7ddd0214bf7a7c4). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17416: [SPARK-20075][CORE][WIP] Support classifier, packaging i...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/17416 So is the problem that it downloads `stanford-corenlp-3.4.1-models.jar` but thinks it is `stanford-corenlp-3.4.1.jar`? It looks like it might be possible to add the classifier to `ModuleRevisionId.newInstance`, have you tried just doing that instead of `dd.addDependencyArtifact`? If I have some time, I'll give it a shot to run it and see what's going on.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17445: [SPARK-20115] [CORE] Fix DAGScheduler to recompute all t...
Github user umehrot2 commented on the issue: https://github.com/apache/spark/pull/17445 @kayousterhout Thanks for your response, and for that link. Well it does seem like #17088 addresses the same issue as this PR. However, I would like the you all to review this PR as well, because I think it more clearly organizes the code between handling of internal and external shuffle failures. It also removes a lot of the code duplication which is part of the other PR. Further, it adds an epoch check for the 'host'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108810108 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,8 +565,143 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator such as =, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) +val ndvLeft = BigDecimal(colStatLeft.distinctCount) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) +val ndvRight = BigDecimal(colStatRight.distinctCount) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + case _: LessThan => +(minLeft >= maxRight, maxLeft < minRight) + case _: LessThanOrEqual => +(minLeft > maxRight, maxLeft <= minRight) + case _: GreaterThan => +(maxLeft <= minRight, minLeft > maxRight) + case _: GreaterThanOrEqual => +(maxLeft < minRight, minLeft >= maxRight) + case _: EqualTo => +((maxLeft < minRight) || (maxRight < minLeft), + (minLeft == minRight) && (maxLeft == maxRight)) + case _: EqualNullSafe => +// For null-safe equality, we use a very restrictive condition to evaluate its overlap. +// If null values exists, we set it to partial overlap. +(((maxLeft < minRight) || (maxRight < minLeft)) +&& colStatLeft.nullCount == 0 && colStatRight.nullCount == 0, + ((minLeft == minRight) && (maxLeft == maxRight)) +&& colStatLeft.nullCount == 0 && colStatRight.nullCount == 0 +) +} + +var percent = BigDecimal(1.0) +if (noOverlap) { + percent = 0.0 +} else if (completeOverlap) { + percent = 1.0 +} else { + // For partial overlap, we use an empirical value 1/3 as suggested by the book + // "Database Systems, the complete book". + percent = 1.0/3.0 + + if (update) { +// Need to adjust new min/max after the filter condition is applied + +val ndvLeft = BigDecimal(colStatLeft.distinctCount) +var newNdvLeft = (ndvLeft * percent).setScale(0, RoundingMode.HALF_UP).toBigInt() +if (newNdvLeft < 1) newNdvLeft = 1 +val ndvRight = BigDecimal(colStatLeft.distinctCount) +var newNdvRight = (ndvRight * percent).setScale(0, RoundingMode.HALF_UP).toBigInt() +if
[GitHub] spark pull request #17450: [SPARK-20121][SQL] simplify NullPropagation with ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17450#discussion_r108809796 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -297,8 +297,8 @@ case class Lower(child: Expression) extends UnaryExpression with String2StringEx } /** A base trait for functions that compare two strings, returning a boolean. */ -trait StringPredicate extends Predicate with ImplicitCastInputTypes { - self: BinaryExpression => +abstract class StringPredicate extends BinaryExpression + with Predicate with ImplicitCastInputTypes { --- End diff -- See above `StringRegexExpression`, similar to it, in order to simplify the `NullPropagation`, we need to add `NullIntolerant`, so it can propagate null value... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17474 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75373/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17474 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17474 **[Test build #75373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75373/testReport)** for PR 17474 at commit [`5460e78`](https://github.com/apache/spark/commit/5460e78b3907a0ce6f8983bd4dcc83d02acc2b2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15326: [SPARK-17759] [CORE] Avoid adding duplicate schedulables
Github user erenavsarogullari commented on the issue: https://github.com/apache/spark/pull/15326 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17474: [Minor][SparkR]: Add run command comment in examples
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17474 **[Test build #75373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75373/testReport)** for PR 17474 at commit [`5460e78`](https://github.com/apache/spark/commit/5460e78b3907a0ce6f8983bd4dcc83d02acc2b2d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17474: [Minor][SparkR]: Add run command comment in examp...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17474 [Minor][SparkR]: Add run command comment in examples ## What changes were proposed in this pull request? There are two examples in r folder missing the run commands. In this PR, I just add the missing comment, which is consistent with other examples. ## How was this patch tested? Manual test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark stat Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17474.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17474 commit e095333508a28cf024925610fb127e1f05b3eec2 Author: wm...@hotmail.comDate: 2017-03-29T21:40:59Z simple fix commit 5460e78b3907a0ce6f8983bd4dcc83d02acc2b2d Author: wm...@hotmail.com Date: 2017-03-29T22:33:12Z revert ignore --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17472 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75372/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17472 **[Test build #75372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75372/testReport)** for PR 17472 at commit [`bf7cc24`](https://github.com/apache/spark/commit/bf7cc24f213a2cf043a579846859647da850f1f8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17472 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17472 **[Test build #75372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75372/testReport)** for PR 17472 at commit [`bf7cc24`](https://github.com/apache/spark/commit/bf7cc24f213a2cf043a579846859647da850f1f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17472 Clear code comments can help code reading. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17472 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17472: [SPARK-19999]: Fix for flakey tests due to java.n...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17472#discussion_r108802741 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -46,18 +46,22 @@ private static final boolean unaligned; static { boolean _unaligned; -// use reflection to access unaligned field -try { - Class bitsClass = -Class.forName("java.nio.Bits", false, ClassLoader.getSystemClassLoader()); - Method unalignedMethod = bitsClass.getDeclaredMethod("unaligned"); - unalignedMethod.setAccessible(true); - _unaligned = Boolean.TRUE.equals(unalignedMethod.invoke(null)); -} catch (Throwable t) { - // We at least know x86 and x64 support unaligned access. - String arch = System.getProperty("os.arch", ""); - //noinspection DynamicRegexReplaceableByCompiledPattern - _unaligned = arch.matches("^(i[3-6]86|x86(_64)?|x64|amd64|aarch64)$"); +if (arch.matches("^(ppc64le | ppc64)$")) { + // Since java.nio.Bits.unaligned() doesn't return true on ppc (See JDK-8165231), but ppc64 and ppc64le support it --- End diff -- It is longer than 101 characters. It will fail the style test. You can check it in your local environment using the command: > dev/lint-scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17449: [SPARK-20120][SQL] spark-sql support silent mode
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17449 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r108802255 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -735,7 +749,12 @@ object SparkSubmit extends CommandLineUtils { } try { - mainMethod.invoke(null, childArgs.toArray) + if (isSparkApp) { +val envvars = Map[String, String]() ++ sys.env +mainMethod.invoke(null, childArgs.toArray, childSparkConf, envvars.toMap) --- End diff -- In that case it might be worth it to add a check in `SparkLauncher` to throw an exception in case env variables are set, and the app is started in a thread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17449: [SPARK-20120][SQL] spark-sql support silent mode
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17449 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/15009#discussion_r108801550 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -735,7 +749,12 @@ object SparkSubmit extends CommandLineUtils { } try { - mainMethod.invoke(null, childArgs.toArray) + if (isSparkApp) { +val envvars = Map[String, String]() ++ sys.env +mainMethod.invoke(null, childArgs.toArray, childSparkConf, envvars.toMap) --- End diff -- Lets just remove it. @kishorvpatil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75367/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. in P...
Github user allengeorge commented on the issue: https://github.com/apache/spark/pull/17400 I suggest the following code for `outputOrdering`: ``` override def outputOrdering: Seq[SortOrder] = child.outputOrdering.map { case s @ SortOrder(e, _) => s.copy(child = maybeReplaceExpr(e)) case s => s }``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17415 **[Test build #75367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75367/testReport)** for PR 17415 at commit [`64bf43e`](https://github.com/apache/spark/commit/64bf43e562a3c257b847502eae651a8887eaddcf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17472: [SPARK-19999]: Fix for flakey tests due to java.nio.Bits...
Github user samelamin commented on the issue: https://github.com/apache/spark/pull/17472 @gatorsmile moved the comment per your suggestion, but to be honest if the comment is unclear surly the first thing someone will do is check that JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16541 I sent a pr #17473. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org