[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98899/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23049 **[Test build #98899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98899/testReport)** for PR 23049 at commit [`daf5e33`](https://github.com/apache/spark/commit/daf5e33f14f28fa28e85a703fbd3acc08075fd1b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98901/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23045 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23045 **[Test build #98901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98901/testReport)** for PR 23045 at commit [`574308e`](https://github.com/apache/spark/commit/574308e8f4c23f9549c647178709c7c85d4d2fc7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98905/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5073/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98903/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22138 **[Test build #98904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98904/testReport)** for PR 22138 at commit [`fd4ff83`](https://github.com/apache/spark/commit/fd4ff833b6c2b5889d55ee4053970b56ee2b273d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98894/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98898/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98894/testReport)** for PR 23038 at commit [`805ebb8`](https://github.com/apache/spark/commit/805ebb8e6b103cbc0688da64ec27841a1491039f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98898/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23025: [SPARK-26024][SQL]: Update documentation for repartition...
Github user JulienPeloton commented on the issue: https://github.com/apache/spark/pull/23025 @viirya OK all references to SPARK-26024 removed from the doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98895/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98895/testReport)** for PR 23038 at commit [`7c3a80b`](https://github.com/apache/spark/commit/7c3a80bce0a45131091ce11e80a939e9de6ebf50). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234103200 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2789,6 +2789,12 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23025: [SPARK-26024][SQL]: Update documentation for repartition...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23025 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234103194 --- Diff: python/pyspark/sql/dataframe.py --- @@ -732,6 +732,11 @@ def repartitionByRange(self, numPartitions, *cols): At least one partition-by expression must be specified. When no explicit sort order is specified, "ascending nulls first" is assumed. +[SPARK-26024] Note that due to performance reasons this method uses sampling to --- End diff -- "[SPARK-26024]" can be removed too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98893/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23038 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98893 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98893/testReport)** for PR 23038 at commit [`0d92185`](https://github.com/apache/spark/commit/0d921852045fdca3a528fa807fbd229076b52746). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98892/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23055 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23055 **[Test build #98892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98892/testReport)** for PR 23055 at commit [`2d3315a`](https://github.com/apache/spark/commit/2d3315a7dab429abc4d9ef5ed7f8f5484e8421f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23047: [BACKPORT][SPARK-25883][SQL][MINOR] Override meth...
Github user gengliangwang closed the pull request at: https://github.com/apache/spark/pull/23047 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user JulienPeloton commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234099934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to --- End diff -- Thanks. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23025: [SPARK-26024][SQL]: Update documentation for repa...
Github user JulienPeloton commented on a diff in the pull request: https://github.com/apache/spark/pull/23025#discussion_r234099956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2813,6 +2819,11 @@ class Dataset[T] private[sql]( * When no explicit sort order is specified, "ascending nulls first" is assumed. * Note, the rows are not sorted in each partition of the resulting Dataset. * + * [SPARK-26024] Note that due to performance reasons this method uses sampling to + * estimate the ranges. Hence, the output may not be consistent, since sampling can return + * different values. The sample size can be controlled by setting the value of the parameter + * {{spark.sql.execution.rangeExchange.sampleSizePerPartition}}. --- End diff -- Thanks. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98896/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21838: [SPARK-24811][SQL]Avro: add new function from_avr...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/21838#discussion_r234099158 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroCatalystDataConversionSuite.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.avro + +import org.apache.avro.Schema + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.{AvroDataToCatalyst, CatalystDataToAvro, RandomDataGenerator} +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, GenericInternalRow, Literal} +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData, MapData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class AvroCatalystDataConversionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def roundTripTest(data: Literal): Unit = { +val avroType = SchemaConverters.toAvroType(data.dataType, data.nullable) +checkResult(data, avroType.toString, data.eval()) + } + + private def checkResult(data: Literal, schema: String, expected: Any): Unit = { +checkEvaluation( + AvroDataToCatalyst(CatalystDataToAvro(data), schema), + prepareExpectedResult(expected)) + } + + private def assertFail(data: Literal, schema: String): Unit = { +intercept[java.io.EOFException] { + AvroDataToCatalyst(CatalystDataToAvro(data), schema).eval() +} + } + + private val testingTypes = Seq( +BooleanType, +ByteType, +ShortType, +IntegerType, +LongType, +FloatType, +DoubleType, +DecimalType(8, 0), // 32 bits decimal without fraction +DecimalType(8, 4), // 32 bits decimal +DecimalType(16, 0), // 64 bits decimal without fraction +DecimalType(16, 11), // 64 bits decimal +DecimalType(38, 0), +DecimalType(38, 38), +StringType, +BinaryType) + + protected def prepareExpectedResult(expected: Any): Any = expected match { +// Spark decimal is converted to avro string= +case d: Decimal => UTF8String.fromString(d.toString) +// Spark byte and short both map to avro int +case b: Byte => b.toInt +case s: Short => s.toInt +case row: GenericInternalRow => InternalRow.fromSeq(row.values.map(prepareExpectedResult)) +case array: GenericArrayData => new GenericArrayData(array.array.map(prepareExpectedResult)) +case map: MapData => + val keys = new GenericArrayData( + map.keyArray().asInstanceOf[GenericArrayData].array.map(prepareExpectedResult)) + val values = new GenericArrayData( + map.valueArray().asInstanceOf[GenericArrayData].array.map(prepareExpectedResult)) + new ArrayBasedMapData(keys, values) +case other => other + } + + testingTypes.foreach { dt => +val seed = scala.util.Random.nextLong() +test(s"single $dt with seed $seed") { + val rand = new scala.util.Random(seed) + val data = RandomDataGenerator.forType(dt, rand = rand).get.apply() + val converter = CatalystTypeConverters.createToCatalystConverter(dt) + val input = Literal.create(converter(data), dt) + roundTripTest(input) +} + } + + for (_ <- 1 to 5) { --- End diff -- Why not `(1 to 5).foreach`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23031 **[Test build #98896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98896/testReport)** for PR 23031 at commit [`336a331`](https://github.com/apache/spark/commit/336a331fdc817566c7fd09e5b36d5de24379c5b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23054 **[Test build #98902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98902/testReport)** for PR 23054 at commit [`42e32ad`](https://github.com/apache/spark/commit/42e32adda2da3717161fe5f8aa40febc1f32465e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5072/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5071/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23045 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23045: [SPARK-26071][SQL] disallow map as map key
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23045 **[Test build #98901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98901/testReport)** for PR 23045 at commit [`574308e`](https://github.com/apache/spark/commit/574308e8f4c23f9549c647178709c7c85d4d2fc7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98891/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23054 Ok. Let me update migration guide. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23054 **[Test build #98891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98891/testReport)** for PR 23054 at commit [`c7bbe91`](https://github.com/apache/spark/commit/c7bbe91519aec116ae2c2f449f518f59cc49c7c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234093063 --- Diff: python/pyspark/testing/mllibutils.py --- @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +if sys.version_info[:2] <= (2, 6): +try: +import unittest2 as unittest +except ImportError: +sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') +sys.exit(1) +else: +import unittest --- End diff -- Yeah, I wondered about that but thought it might be better to do in a followup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23054 makes sense to me. This is a behavior change right? Shall we write a migration guide? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comm...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23044 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23042: [SPARK-26070][SQL] add rule for implicit type coe...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23042#discussion_r234091858 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -138,6 +138,11 @@ object TypeCoercion { case (DateType, TimestampType) => if (conf.compareDateTimestampInTimestamp) Some(TimestampType) else Some(StringType) +// to support a popular use case of tables using Decimal(X, 0) for long IDs instead of strings +// see SPARK-26070 for more details +case (n: DecimalType, s: StringType) if n.scale == 0 => Some(DecimalType(n.precision, n.scale)) --- End diff -- CC @gatorsmile @mgaido91 I think it's time to look at the SQL standard and other mainstream databases, and see how shall we update the type coercions rules with safe mode. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23044: [SPARK-26073][SQL][FOLLOW-UP] remove invalid comment as ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23044 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23046 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.enableRad...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23046 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234088968 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- It's a small bug fix, so no need to backport to all the branches. I think 2.4 is good enough --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22309 adding @liancheng BTW. IIRC, he took a look for this one before and abandoned the change (fix me if I'm wrongly remembering this). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234086569 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > JVM could set the request This is handled in JVM so it wouldn't break. `worker` itself is strongly coupled to JVM. You mean that case when the client is in Windows machine and it uses a Unix-based clusters, right? I think this is what the fix already does - the `PythonRunner`s already are created at executor side and it wouldn't affect when the client is on Windows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23049 **[Test build #98900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98900/testReport)** for PR 23049 at commit [`3269862`](https://github.com/apache/spark/commit/3269862c0b80bb7c546e9d45fd5fd4aa17aa1c7e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r234085471 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -373,6 +383,32 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), path :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for value class that is top-level or array element, +// if it is used as another type (e.g. as its parent trait or generic), +// the compiler keeps the class so we must provide an instance of the +// class too. In other cases, the compiler will handle wrapping/unwrapping +// for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath, Some(t)) +val isCollectionElement = lastType.exists { lt => + lt <:< localTypeOf[Array[_]] || lt <:< localTypeOf[Seq[_]] --- End diff -- I added the support for Map --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5070/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234084002 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean that it is brittle to try to use `resource` if the JVM has set the property. You handle the `ImportError`, but the JVM could set the request and Python would break again. I think that this should not be entirely disabled on Windows. Resource requests to YARN or other schedulers should include this memory. The only feature that should be disabled is the resource limiting on the python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23049 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5069/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234081475 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I see. I think the point of view is a bit different. What I was trying to do is that: we declare this configuration is not supported on Windows, meaning we disable this configuration on Windows from the start, JVM side - because it's JVM to launch Python workers. So, I was trying to leave the control to JVM. > It seems brittle to disable this on the JVM side and rely on it here. Can we also set a flag in the ImportError case and also check that here? However, in a way, It's a bit odd to say it's brittle because we're already relying on that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/23049 Hi @vanzin , thanks for pointing it out! I have updated the script and PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23049: [SPARK-26076][Build][Minor] Revise ambiguous error messa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23049 **[Test build #98899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98899/testReport)** for PR 23049 at commit [`daf5e33`](https://github.com/apache/spark/commit/daf5e33f14f28fa28e85a703fbd3acc08075fd1b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98898/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/23055 Thanks for fixing this so quickly, @HyukjinKwon! I'd like a couple of changes, but overall it is going in the right direction. We should also plan on porting this to the 2.4 branch when it is committed since it is a regression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080578 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I don't think this is necessary. If `resource` can't be imported for any reason, then memory will not be limited in python. But the JVM side shouldn't be what determines whether that happens. The JVM should do everything the same way -- even requesting memory from schedulers like YARN because that space should still be allocated as python memory, even if python can't self-limit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5068/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234080468 --- Diff: python/pyspark/mllib/tests/test_linalg.py --- @@ -0,0 +1,642 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys +import array as pyarray + +from numpy import array, array_equal, zeros, arange, tile, ones, inf +from numpy import sum as array_sum + +if sys.version_info[:2] <= (2, 6): +try: +import unittest2 as unittest +except ImportError: +sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') +sys.exit(1) +else: +import unittest + +import pyspark.ml.linalg as newlinalg +from pyspark.mllib.linalg import Vector, SparseVector, DenseVector, VectorUDT, _convert_to_vector, \ +DenseMatrix, SparseMatrix, Vectors, Matrices, MatrixUDT +from pyspark.mllib.regression import LabeledPoint +from pyspark.testing.mllibutils import make_serializer, MLlibTestCase + +_have_scipy = False +try: +import scipy.sparse +_have_scipy = True +except: +# No SciPy, but that's okay, we'll skip those tests +pass + + +ser = make_serializer() + + +def _squared_distance(a, b): +if isinstance(a, Vector): +return a.squared_distance(b) +else: +return b.squared_distance(a) + + +class VectorTests(MLlibTestCase): + +def _test_serialize(self, v): +self.assertEqual(v, ser.loads(ser.dumps(v))) +jvec = self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.loads(bytearray(ser.dumps(v))) +nv = ser.loads(bytes(self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.dumps(jvec))) +self.assertEqual(v, nv) +vs = [v] * 100 +jvecs = self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.loads(bytearray(ser.dumps(vs))) +nvs = ser.loads(bytes(self.sc._jvm.org.apache.spark.mllib.api.python.SerDe.dumps(jvecs))) +self.assertEqual(vs, nvs) + +def test_serialize(self): +self._test_serialize(DenseVector(range(10))) +self._test_serialize(DenseVector(array([1., 2., 3., 4.]))) +self._test_serialize(DenseVector(pyarray.array('d', range(10 +self._test_serialize(SparseVector(4, {1: 1, 3: 2})) +self._test_serialize(SparseVector(3, {})) +self._test_serialize(DenseMatrix(2, 3, range(6))) +sm1 = SparseMatrix( +3, 4, [0, 2, 2, 4, 4], [1, 2, 1, 2], [1.0, 2.0, 4.0, 5.0]) +self._test_serialize(sm1) + +def test_dot(self): +sv = SparseVector(4, {1: 1, 3: 2}) +dv = DenseVector(array([1., 2., 3., 4.])) +lst = DenseVector([1, 2, 3, 4]) +mat = array([[1., 2., 3., 4.], + [1., 2., 3., 4.], + [1., 2., 3., 4.], + [1., 2., 3., 4.]]) +arr = pyarray.array('d', [0, 1, 2, 3]) +self.assertEqual(10.0, sv.dot(dv)) +self.assertTrue(array_equal(array([3., 6., 9., 12.]), sv.dot(mat))) +self.assertEqual(30.0, dv.dot(dv)) +self.assertTrue(array_equal(array([10., 20., 30., 40.]), dv.dot(mat))) +self.assertEqual(30.0, lst.dot(dv)) +self.assertTrue(array_equal(array([10., 20., 30., 40.]), lst.dot(mat))) +self.assertEqual(7.0, sv.dot(arr)) + +def test_squared_distance(self): +sv = SparseVector(4, {1: 1, 3: 2}) +dv = DenseVector(array([1., 2., 3., 4.])) +lst = DenseVector([4, 3, 2, 1]) +lst1 = [4, 3, 2, 1] +arr = pyarray.array('d', [0, 2, 1, 3]) +narr = array([0, 2, 1, 3]) +self.assertEqual(15.0, _squared_distance(sv, dv)) +self.assertEqual(25.0, _squared_distance(sv, lst)) +self.assertEqual(20.0, _squared_distance(dv, lst)) +self.assertEqual(15.0, _squared_distance(dv, sv)) +
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234080249 --- Diff: python/pyspark/testing/mllibutils.py --- @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import sys + +if sys.version_info[:2] <= (2, 6): +try: +import unittest2 as unittest +except ImportError: +sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') +sys.exit(1) +else: +import unittest --- End diff -- @BryanCutler, actually I remove this because we dropped 2.6 support while we are here. Im pretty sure we can just import unittest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080290 --- Diff: python/pyspark/worker.py --- @@ -268,9 +272,11 @@ def main(infile, outfile): # set up memory limits memory_limit_mb = int(os.environ.get('PYSPARK_EXECUTOR_MEMORY_MB', "-1")) -total_memory = resource.RLIMIT_AS -try: -if memory_limit_mb > 0: +# 'PYSPARK_EXECUTOR_MEMORY_MB' should be undefined on Windows because it depends on +# resource package which is a Unix specific package. +if memory_limit_mb > 0: --- End diff -- It seems brittle to disable this on the JVM side and rely on it here. Can we also set a flag in the ImportError case and also check that here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98897/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98897/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23037 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23037: [SPARK-26083][k8s] Add Copy pyspark into corresponding d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5063/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/23056 cc @HyukjinKwon @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23056 **[Test build #98897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98897/testReport)** for PR 23056 at commit [`2759521`](https://github.com/apache/spark/commit/2759521df7f2dffc9ddb9379e0b1dac6721da366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5067/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23026: [SPARK-25960][k8s] Support subpath mounting with Kuberne...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/23026 > > if such a list exists it should be the same list that triggers regular tests. > > I defer that to @shaneknapp no, @vanzin is right. i'll update that tomorrow. @vanzin for historical knowledge: once i get spark ported to ubuntu (literally down to one or two troublesome builds! such closeness!), the k8s prb will be merged in to the regular spark prb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/23056 Dist by line count: ``` 313 ./test_algorithms.py 201 ./test_feature.py 642 ./test_linalg.py 197 ./test_stat.py 523 ./test_streaming_algorithms.py 115 ./test_util.py ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/23056 [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py file into smaller files ## What changes were proposed in this pull request? This PR breaks down the large mllib/tests.py file that contains all Python MLlib unit tests into several smaller test files to be easier to read and maintain. The tests are broken down as follows: ``` pyspark âââ __init__.py ... âââ mllib â âââ __init__.py ... â âââ tests â â âââ __init__.py â â âââ test_algorithms.py â â âââ test_feature.py â â âââ test_linalg.py â â âââ test_stat.py â â âââ test_streaming_algorithms.py â â âââ test_util.py ... âââ testing ... â âââ mllibutils.py ... ``` ## How was this patch tested? Ran tests manually by module to ensure test count was the same, and ran `python/run-tests --modules=pyspark-mllib` to verify all passing with Python 2.7 and Python 3.6. Also installed scipy to include optional tests in test_linalg. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark python-test-breakup-mllib-SPARK-26034 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23056 commit 2759521df7f2dffc9ddb9379e0b1dac6721da366 Author: Bryan Cutler Date: 2018-11-16T03:01:22Z separated mllib tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23041: [SPARK-26069][TESTS]Fix flaky test: RpcIntegrationSuite....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23041 **[Test build #4427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4427/testReport)** for PR 23041 at commit [`6bebcb5`](https://github.com/apache/spark/commit/6bebcb5e004ed4b434c550d26ed1a922d13e0446). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23026: [SPARK-25960][k8s] Support subpath mounting with Kuberne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5062/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23026: [SPARK-25960][k8s] Support subpath mounting with Kuberne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23026 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23031 **[Test build #98896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98896/testReport)** for PR 23031 at commit [`336a331`](https://github.com/apache/spark/commit/336a331fdc817566c7fd09e5b36d5de24379c5b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5066/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23031: [SPARK-26060][SQL] Track SparkConf entries and make SET ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98895/testReport)** for PR 23038 at commit [`7c3a80b`](https://github.com/apache/spark/commit/7c3a80bce0a45131091ce11e80a939e9de6ebf50). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234073703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- Yes .. I don't mind it but was just thinking that we don't necessarily backport to all the branches if there's any concern. I will leave it to you guys as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics table doesn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23038 **[Test build #98894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98894/testReport)** for PR 23038 at commit [`805ebb8`](https://github.com/apache/spark/commit/805ebb8e6b103cbc0688da64ec27841a1491039f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234073072 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec { } // The comparator for comparing row hashcode, which should always be Integer. val prefixComparator = PrefixComparators.LONG - val canUseRadixSort = SparkEnv.get.conf.get(SQLConf.RADIX_SORT_ENABLED) + val canUseRadixSort = SQLConf.get.enableRadixSort --- End diff -- Ah, yes, to be exact, if users specified the config to `SparkConf` before Spark ran, it could be read. I'd leave which branch we should backport to to you and other reviewers. @jiangxb1987 @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23038: [SPARK-25451][CORE][WEBUI]Aggregated metrics tabl...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23038#discussion_r234072070 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -63,6 +63,7 @@ case class ApplicationAttemptInfo private[spark]( class ExecutorStageSummary private[spark]( val taskTime : Long, +val activeTasks: Int, --- End diff -- Hi @vanzin , I have modified based your comment. Kindly review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org