[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98838566 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -17,6 +17,9 @@ context("functions in utils.R") +# Ensure Spark is installed +sparkCheckInstall() --- End diff -- Ah thats a great idea - Can you see if that works (unfortunately it needs manual verification) ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #72229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72229/testReport)** for PR 16043 at commit [`bce45ea`](https://github.com/apache/spark/commit/bce45ea2720ba11e8a70979afe027e7f93633e33). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans summar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16761 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans summar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16761 **[Test build #72233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72233/testReport)** for PR 16761 at commit [`5eb2835`](https://github.com/apache/spark/commit/5eb2835231892054573c560c57d0ebcbca9b988a). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans summar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16761 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72233/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98837222 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -17,6 +17,9 @@ context("functions in utils.R") +# Ensure Spark is installed +sparkCheckInstall() --- End diff -- hmm, we could put it in https://github.com/apache/spark/blob/master/R/pkg/tests/run-all.R? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98837066 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -27,6 +27,9 @@ library(SparkR) We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession). +```{r, include=FALSE} +SparkR:::sparkCheckInstall() --- End diff -- Is the Rmd file a part of the install that the users see ? I just dont want to put in any code that people might copy-paste etc. Is it not good enough to pass in `master=local[*]` here ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [WIP][SPARK][SQL][Follow-up] Configurable `tableRelation...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16736 BTW, you need to update the PR title and description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [WIP][SPARK][SQL][Follow-up] Configurable `tableRelation...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16736 Overall, looks good to me. cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16736: [WIP][SPARK][SQL][Follow-up] Configurable `tableR...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16736#discussion_r98836889 --- Diff: core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala --- @@ -90,6 +90,14 @@ private[spark] class TypedConfigBuilder[T]( new TypedConfigBuilder(parent, s => fn(converter(s)), stringConverter) } + /** Check that user-provided value for the config match a validator */ --- End diff -- `Check that user-provided value for the config match a validator` -> `Checks if the user-provided value for the config matches the validator` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16758 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72230/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16758 **[Test build #72230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72230/testReport)** for PR 16758 at commit [`8b18fa1`](https://github.com/apache/spark/commit/8b18fa1c5a457198e3b99d41aaf770c8cb11106d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98836638 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -17,6 +17,9 @@ context("functions in utils.R") +# Ensure Spark is installed +sparkCheckInstall() --- End diff -- Sure - that sounds fine. I was looking to see if `testthat` had any support for writing a `setup` that gets called before each test - Doesn't look like it has that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16736: [WIP][SPARK][SQL][Follow-up] Configurable `tableR...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16736#discussion_r98836570 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala --- @@ -221,6 +221,10 @@ class SQLConfSuite extends QueryTest with SharedSQLContext { .sessionState.conf.warehousePath.stripSuffix("/")) } + test("default value of FILESOURCE_TABLE_RELATION_CACHE_SIZE") { + assert(spark.conf.get(StaticSQLConf.FILESOURCE_TABLE_RELATION_CACHE_SIZE) === 1000) + } --- End diff -- This test case is not needed. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16737: [SPARK-19397] [SQL] Make option names of LIBSVM and TEXT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16737 **[Test build #72238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72238/testReport)** for PR 16737 at commit [`c2c145d`](https://github.com/apache/spark/commit/c2c145d55c586af703228d46ee04f5e0027739e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r98836498 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.fpm + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol} +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, FPGrowthModel => MLlibFPGrowthModel} +import org.apache.spark.sql.{DataFrame, _} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types.{ArrayType, StringType, StructType} + +/** + * Common params for FPGrowth and FPGrowthModel + */ +private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with HasPredictionCol { + + /** + * Validates and transforms the input schema. + * @param schema input schema + * @return output schema + */ + protected def validateAndTransformSchema(schema: StructType): StructType = { +SchemaUtils.checkColumnType(schema, $(featuresCol), new ArrayType(StringType, false)) +SchemaUtils.appendColumn(schema, $(predictionCol), new ArrayType(StringType, false)) + } + + /** + * Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears + * more than (minSupport * size-of-the-dataset) times will be output + * Default: 0.3 + * @group param + */ + @Since("2.2.0") + val minSupport: DoubleParam = new DoubleParam(this, "minSupport", +"the minimal support level of the frequent pattern (Default: 0.3)", +ParamValidators.inRange(0.0, 1.0)) + setDefault(minSupport -> 0.3) + + /** @group getParam */ + @Since("2.2.0") + def getMinSupport: Double = $(minSupport) + + /** + * Number of partitions used by parallel FP-growth + * @group param + */ + @Since("2.2.0") + val numPartitions: IntParam = new IntParam(this, "numPartitions", +"Number of partitions used by parallel FP-growth", ParamValidators.gtEq[Int](1)) + + /** @group getParam */ + @Since("2.2.0") + def getNumPartitions: Int = $(numPartitions) + +} + +/** + * :: Experimental :: + * A parallel FP-growth algorithm to mine frequent itemsets. + * + * @see [[http://dx.doi.org/10.1145/1454008.1454027 Li et al., PFP: Parallel FP-Growth for Query + * Recommendation]] + */ +@Since("2.2.0") +@Experimental +class FPGrowth @Since("2.2.0") ( +@Since("2.2.0") override val uid: String) + extends Estimator[FPGrowthModel] with FPGrowthParams with DefaultParamsWritable { + + @Since("2.2.0") + def this() = this(Identifiable.randomUID("FPGrowth")) + + /** @group setParam */ + @Since("2.2.0") + def setMinSupport(value: Double): this.type = set(minSupport, value) + + /** @group setParam */ + @Since("2.2.0") + def setNumPartitions(value: Int): this.type = set(numPartitions, value) + + /** @group setParam */ + @Since("2.2.0") + def setFeaturesCol(value: String): this.type = set(featuresCol, value) + + /** @group setParam */ + @Since("2.2.0") + def setPredictionCol(value: String): this.type = set(predictionCol, value) + + def fit(dataset: Dataset[_]): FPGrowthModel = { +val data = dataset.select($(featuresCol)).rdd.map(r => r.getSeq[String](0).toArray) +val parentModel = new MLlibFPGrowth().setMinSupport($(minSupport)).run(data) +copyValues(new FPGrowthModel(uid, parentModel)) + } + + @Since("2.2.0") + override def transformSchema(schema: StructType): StructType = { +validateAndTransformSchema(schema) + } + + override def
[GitHub] spark issue #16737: [SPARK-19397] [SQL] Make option names of LIBSVM and TEXT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16737 I think `LIBSVM` and `TEXT` are the last two built-in sources that do not support case sensitivity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16737: [SPARK-19397] [SQL] Make option names of LIBSVM a...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16737#discussion_r98836103 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.source.libsvm + +import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap + +/** + * Options for the Text data source. --- End diff -- uh. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16753 LGTM except two comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @holdenk Updated! Thanks for your careful checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12135 **[Test build #72237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72237/testReport)** for PR 12135 at commit [`ccf4d8d`](https://github.com/apache/spark/commit/ccf4d8d078e733fb0a2346686e851fee95a9d582). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16753: [SPARK-19296][SQL] Deduplicate url and table in J...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16753#discussion_r98835277 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -53,33 +53,31 @@ class JdbcRelationProvider extends CreatableRelationProvider parameters: Map[String, String], df: DataFrame): BaseRelation = { val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url val table = jdbcOptions.table -val createTableOptions = jdbcOptions.createTableOptions val isTruncate = jdbcOptions.isTruncate --- End diff -- The same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16753: [SPARK-19296][SQL] Deduplicate url and table in J...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16753#discussion_r98835197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -53,33 +53,31 @@ class JdbcRelationProvider extends CreatableRelationProvider parameters: Map[String, String], df: DataFrame): BaseRelation = { val jdbcOptions = new JDBCOptions(parameters) -val url = jdbcOptions.url val table = jdbcOptions.table --- End diff -- Maybe, we do not need this too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16751 Hi, @rxin . Sure, I'll try to put them in a single PR except the ongoing one. BTW, every time, I noticed that committers have a better and broader perspective than me. Do you have something more in mind beside those issues mentioned #16281 and here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16727: [SPARK-19421][ML][PySpark] Remove numClasses and numFeat...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16727 @holdenk I created another jira to track this issue. Thanks all for revewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98834936 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -17,6 +17,9 @@ context("functions in utils.R") +# Ensure Spark is installed +sparkCheckInstall() --- End diff -- I understand that, but as pointed out https://github.com/apache/spark/pull/16720#issuecomment-275832013, some tests don't need SparkSession, and some tests will create/stop one as needed, and to have a function that does that all just mean more complicity? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98834935 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -27,6 +27,9 @@ library(SparkR) We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession). +```{r, include=FALSE} +SparkR:::sparkCheckInstall() --- End diff -- this has `include=FALSE` so it will run but the code and output will not be included in the vignettes text --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16690 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72228/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16762#discussion_r98834480 --- Diff: sql/core/src/test/resources/sql-tests/inputs/cross-join.sql --- @@ -33,3 +33,5 @@ create temporary view D(d, vd) as select * from nt1; -- Allowed since cross join with C is explicit select * from ((A join B on (a = b)) cross join C) join D on (a = d); +-- Cross joins with non-equal predicates +SELECT * FROM nt1 CROSS JOIN nt2 ON (nt1.k > nt2.k); --- End diff -- So far, the SQL syntax allows users to specify the join condition. The Dataset API does not allow users to do it, but users still can do it by using the filter, which will be pushed into the cross join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16690 **[Test build #72228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72228/testReport)** for PR 16690 at commit [`42eb540`](https://github.com/apache/spark/commit/42eb540f8617ee3dc22c81014452896eedec97a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16750 **[Test build #72236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72236/testReport)** for PR 16750 at commit [`d5ab37c`](https://github.com/apache/spark/commit/d5ab37c7e1bdcd79586b802f3450bfbc7a9a8f36). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r98834088 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -161,12 +163,3 @@ private[csv] class CSVOptions(@transient private val parameters: CaseInsensitive settings } } - -object CSVOptions { --- End diff -- The `CSVOptions` (and also `JSONOptions`) will always have to take `timeZone` option. I don't want callers to forget to specify it by these convenient methods. Or should I add the default timezone id to these methods? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r98834068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -329,7 +332,17 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * @since 1.4.0 */ def json(jsonRDD: RDD[String]): DataFrame = { -val parsedOptions: JSONOptions = new JSONOptions(extraOptions.toMap) +val optionsWithTimeZone = { --- End diff -- The `timeZone` option is used in the `JSONOptions`/`CSVOptions`, so we can't handle it the same as `columnNameOfCorruptRecord`. I'll modify to pass the default timezone id to `JSONOptions` and `CSVOptions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72227/testReport)** for PR 16620 at commit [`76961c3`](https://github.com/apache/spark/commit/76961c3ba64e19c43ebfc0b18651d68c54949edb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r98834049 --- Diff: python/pyspark/sql/readwriter.py --- @@ -297,7 +300,7 @@ def text(self, paths): def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=None, comment=None, header=None, inferSchema=None, ignoreLeadingWhiteSpace=None, ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, positiveInf=None, -negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None, +negativeInf=None, dateFormat=None, timestampFormat=None, timeZone=None, maxColumns=None, --- End diff -- Ah, I see, I'll move them to the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98833756 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -27,6 +27,9 @@ library(SparkR) We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession). +```{r, include=FALSE} +SparkR:::sparkCheckInstall() --- End diff -- Is it ok to include a `:::` function in the vignette ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16720: [SPARK-19387][SPARKR] Tests do not run with Spark...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16720#discussion_r98833957 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -17,6 +17,9 @@ context("functions in utils.R") +# Ensure Spark is installed +sparkCheckInstall() --- End diff -- What I had in mind was to combine the `sparkR.session` and this `sparkCheckInstall` into one function so its easy to remember for a new test file. Any thoughts on this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16762 **[Test build #72235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72235/testReport)** for PR 16762 at commit [`e4e3c9b`](https://github.com/apache/spark/commit/e4e3c9b84993ca415a1d07bcb07f20393c6fd5b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16762#discussion_r98833830 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -584,24 +602,37 @@ class JoinSuite extends QueryTest with SharedSQLContext { val cartesianQueries = Seq( /** The following should error out since there is no explicit cross join */ "SELECT * FROM testData inner join testData2", - "SELECT * FROM testData left outer join testData2", - "SELECT * FROM testData right outer join testData2", - "SELECT * FROM testData full outer join testData2", --- End diff -- These three queries are examples for outer joins that are unable to be replaced by cross join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16762#discussion_r98833748 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -215,18 +215,36 @@ class JoinSuite extends QueryTest with SharedSQLContext { Row(1, null, 2, 2) :: Row(2, 2, 1, null) :: Row(2, 2, 2, 2) :: Nil) + + checkAnswer( +testData3.as("x").join(testData3.as("y"), $"x.a" > $"y.a"), --- End diff -- This is a typical example of cartesian product, but our current detection is unable to find. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r98833772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions.{Expression, GetStructField} +import org.apache.spark.sql.types.StructField + +/** + * A Scala extractor that extracts the child expression and struct field from a [[GetStructField]]. + * This is in contrast to the [[GetStructField]] case class extractor which returns the field + * ordinal instead of the field itself. + */ +private[planning] object GetStructField2 { --- End diff -- But we can have a better name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r98833717 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions.{Expression, GetStructField} +import org.apache.spark.sql.types.StructField + +/** + * A Scala extractor that extracts the child expression and struct field from a [[GetStructField]]. + * This is in contrast to the [[GetStructField]] case class extractor which returns the field + * ordinal instead of the field itself. + */ +private[planning] object GetStructField2 { --- End diff -- oh, nvm, I thought `GetStructField` is another Scala extractor. Actually it is a case class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15279: SPARK-12347 [ML][WIP] Add a script to test Spark ML exam...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15279 What if we have a bunch of default values when arguments are not set, and those are the values we could test with? This way the same sample code can run with and without arguments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/16762 [SPARK-19419] [SPARK-19420] Fix the cross join detection ### What changes were proposed in this pull request? There are two issues in the existing detection of cartesian products. 1) When users use the outer joins where both sides of the tables are unable to be broadcasted, Spark will still select `BroadcastNestedLoopJoin`. CROSS JOIN syntax is unable to cover the scenario of outer join, but we still issue the following error message: ``` Use the CROSS JOIN syntax to allow cartesian products between these relations ``` 2) The existing detection is unable to cover all the cartesian product cases. For example, - Case 1) having non-equal predicates in join conditiions of an inner join. - Case 2) equi-join's key columns are not sortable and both sides are not small enough for broadcasting. This PR is to move the cross-join detection back to `BroadcastNestedLoopJoinExec` and `CartesianProductExec`. ### How was this patch tested? Added the extra test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark crossJoin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16762.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16762 commit e4e3c9b84993ca415a1d07bcb07f20393c6fd5b4 Author: gatorsmileDate: 2017-02-01T05:47:04Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16603 **[Test build #72234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72234/testReport)** for PR 16603 at commit [`e41c6bd`](https://github.com/apache/spark/commit/e41c6bdee18a4ab705bd8e1373ce09f421ca91c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16603 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans summar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16761 **[Test build #72233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72233/testReport)** for PR 16761 at commit [`5eb2835`](https://github.com/apache/spark/commit/5eb2835231892054573c560c57d0ebcbca9b988a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16761 [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans summary returns error when the cluster size doesn't equal to k ## What changes were proposed in this pull request? Backport fix of #1 ## How was this patch tested? Backport unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark kmeansport Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16761.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16761 commit 5eb2835231892054573c560c57d0ebcbca9b988a Author: wm...@hotmail.comDate: 2017-02-01T05:58:19Z backport fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16739 **[Test build #72232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72232/testReport)** for PR 16739 at commit [`1bd7163`](https://github.com/apache/spark/commit/1bd7163723641bfaa107c9a20974e163eaead0a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/1 I will backport it soon. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72231/testReport)** for PR 16620 at commit [`db354c7`](https://github.com/apache/spark/commit/db354c79eadbfe177291e14e9d020234b7cfd1c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72225/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72225/testReport)** for PR 16722 at commit [`1db8494`](https://github.com/apache/spark/commit/1db849417179b4cfc688cf9023ff225dac16ecfd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for the clarification and providing an implementation. So, the pros is some speed improvement and the cons is the increased complexity (now we have three case: one for intercept only, one for Gaussian with identity and one for all the others). Let's see get other committers' opinions. Yes, I will throw an error for the special case of no intercept and no features. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16620 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72226/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72226/testReport)** for PR 16620 at commit [`ed1791f`](https://github.com/apache/spark/commit/ed1791fd9b6e434ec69a6c118a433e2539fed7a4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary retur...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/1 merged, thanks! I think it'll good to have this in branch-2.1 - @wangmiao1981 would you by any chance would like to backport this fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16758 **[Test build #72230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72230/testReport)** for PR 16758 at commit [`8b18fa1`](https://github.com/apache/spark/commit/8b18fa1c5a457198e3b99d41aaf770c8cb11106d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 I agree having a special case is unsatisfying from an engineering perspective. In Spark it's a bit different than R since every iteration of IRLS will launch a Spark job, making a pass over the data, so the cost of the extra iterations is much higher. We have special-cased other algorithms for this reason. It's entirely possible I'm missing something since I do not know the GLM code quite so well, and I did not thoroughly check it, but this code seemed to do the trick: scala if (numFeatures == 0 && getFitIntercept) { val agg = dataset.agg(sum(w * col(getLabelCol)), sum(w)).first() val mu = agg.getDouble(0) / agg.getDouble(1) val diagInvAtA = (familyAndLink.family.variance(mu) * familyAndLink.link.deriv(mu)) / agg.getDouble(0) val model = copyValues(new GeneralizedLinearRegressionModel(uid, Vectors.zeros(0), familyAndLink.link.link(mu)).setParent(this)) val trainingSummary = new GeneralizedLinearRegressionTrainingSummary(dataset, model, Array(diagInvAtA), 1, getSolver) return model.setSummary(Some(trainingSummary)) } The best answer here may depend on the use cases - do we expect users to be training "intercept-only" models often? If yes, then the savings on the iteration time may be worth it. If not, it _is_ a clunky solution. We can see what others think. Also, I got some strange failures when training with no features and `fitIntercept == false`. We should just throw an error in this case and add a test for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #72229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72229/testReport)** for PR 16043 at commit [`bce45ea`](https://github.com/apache/spark/commit/bce45ea2720ba11e8a70979afe027e7f93633e33). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16725 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16725 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72224/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16725 **[Test build #72224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72224/testReport)** for PR 16725 at commit [`6206d10`](https://github.com/apache/spark/commit/6206d109b646e55223a4b162a37e70f42f4570a1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16690 **[Test build #72228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72228/testReport)** for PR 16690 at commit [`42eb540`](https://github.com/apache/spark/commit/42eb540f8617ee3dc22c81014452896eedec97a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72227/testReport)** for PR 16620 at commit [`76961c3`](https://github.com/apache/spark/commit/76961c3ba64e19c43ebfc0b18651d68c54949edb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16603 Thanks for the review @vanzin , I will leave this open for a day in case someone else also wants to review; and will commit tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for your input. I can add more tests, but they are not adding too much since the algorithm is already tested in other tests. The analytical approach does not integrate well with the summary method. One has to derive the general formula for the standard error of the intercept, and then change the code substantially to make it work with summary. This is not an optimal solution IMO. BTW, R fits the intercept only model also using IWLS with multiple iterations. It is just weird to have a special implementation in this case which does not integrate with the current setup. @srowen @yanboliang Please advise. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16603 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16603 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72221/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16603 **[Test build #72221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72221/testReport)** for PR 16603 at commit [`e41c6bd`](https://github.com/apache/spark/commit/e41c6bdee18a4ab705bd8e1373ce09f421ca91c1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16620 **[Test build #72226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72226/testReport)** for PR 16620 at commit [`ed1791f`](https://github.com/apache/spark/commit/ed1791fd9b6e434ec69a6c118a433e2539fed7a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72225/testReport)** for PR 16722 at commit [`1db8494`](https://github.com/apache/spark/commit/1db849417179b4cfc688cf9023ff225dac16ecfd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16722 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72223/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16758 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16758 **[Test build #72223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72223/testReport)** for PR 16758 at commit [`3dc0353`](https://github.com/apache/spark/commit/3dc035369be942ffa4c0630c3a23ea8e32a107e1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72220/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72220/testReport)** for PR 16722 at commit [`1db8494`](https://github.com/apache/spark/commit/1db849417179b4cfc688cf9023ff225dac16ecfd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r98819685 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1212,8 +1223,9 @@ class DAGScheduler( clearCacheLocs() - if (!shuffleStage.isAvailable) { -// Some tasks had failed; let's resubmit this shuffleStage + if (!shuffleStage.isAvailable && noActiveTaskSetManager) { --- End diff -- Hrmm... yes, @squito , we shouldn't go to else branch when the shuffleStage is not available but active `TaskSetManager` exists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r98819150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions.{Expression, GetStructField} +import org.apache.spark.sql.types.StructField + +/** + * A Scala extractor that extracts the child expression and struct field from a [[GetStructField]]. + * This is in contrast to the [[GetStructField]] case class extractor which returns the field + * ordinal instead of the field itself. + */ +private[planning] object GetStructField2 { --- End diff -- What do you mean by combining it with the existing case class extractor? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16536: [SPARK-19163][PYTHON][SQL] Delay _judf initialization to...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16536 Thanks a bunch @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16727: [SPARK-19336][FollowUp][ML][PySpark] Remove numClasses a...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16727 The original PR was merged a few days ago, would you be OK with making a new JIRA just to avoid confusion? You can make it related to the previous JIRA. But otherwise looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16758 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72217/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16758 **[Test build #72217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72217/testReport)** for PR 16758 at commit [`59c229b`](https://github.com/apache/spark/commit/59c229b0be934b643950e40ac75051f22b756c93). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16536: [SPARK-19163][PYTHON][SQL] Delay _judf initializa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16536 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16536: [SPARK-19163][PYTHON][SQL] Delay _judf initialization to...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16536 Going to go ahead and merge. Still need to sort out the JIRA permissions so will take a bit for me to get that updated for you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16725 LGTM pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16725 **[Test build #72224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72224/testReport)** for PR 16725 at commit [`6206d10`](https://github.com/apache/spark/commit/6206d109b646e55223a4b162a37e70f42f4570a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16725 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org