[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81762/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19227: [SPARK-20060][CORE] Support accessing secure Hadoop clus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19227 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81750/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81762/testReport)** for PR 19186 at commit [`3f11c67`](https://github.com/apache/spark/commit/3f11c67630dfc5402e49d7bf43d1ce9a31b400da). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19227: [SPARK-20060][CORE] Support accessing secure Hadoop clus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19227 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19227: [SPARK-20060][CORE] Support accessing secure Hadoop clus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19227 **[Test build #81750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81750/testReport)** for PR 19227 at commit [`2b3d2f2`](https://github.com/apache/spark/commit/2b3d2f24f94a1cee63fff9733b27f479673d7a90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19185 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19215: [MINOR][SQL] Only populate type metadata for requ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19215 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81762/testReport)** for PR 19186 at commit [`3f11c67`](https://github.com/apache/spark/commit/3f11c67630dfc5402e49d7bf43d1ce9a31b400da). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19215: [MINOR][SQL] Only populate type metadata for required ty...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/19215 many thanks @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138801682 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -897,6 +897,80 @@ class SparkSubmitSuite sysProps("spark.submit.pyFiles") should (startWith("/")) } + test("handle remote http(s) resources in yarn mode") { +val hadoopConf = new Configuration() +updateConfWithFakeS3Fs(hadoopConf) + +val tmpDir = Utils.createTempDir() +val mainResource = File.createTempFile("tmpPy", ".py", tmpDir) +val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> "USER"), tmpDir) +val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}" +// This assumes UT environment could access external network. --- End diff -- Yes, that's my concern, let me think out another way to handle this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138801550 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with Logging { }.orNull } +// When running in YARN cluster manager, +if (clusterManager == YARN) { + sparkConf.setIfMissing(SecurityManager.SPARK_AUTH_SECRET_CONF, "unused") + val secMgr = new SecurityManager(sparkConf) + val forceDownloadSchemes = sparkConf.get(FORCE_DOWNLOAD_SCHEMES) + + // Check the scheme list provided by "spark.yarn.dist.forceDownloadSchemes" to see if current + // resource's scheme is included in this list, or Hadoop FileSystem doesn't support current + // scheme, if so Spark will download the resources to local disk and upload to Hadoop FS. + def shouldDownload(scheme: String): Boolean = { +val isFsAvailable = Try { FileSystem.getFileSystemClass(scheme, hadoopConf) } + .map(_ => true).getOrElse(false) +forceDownloadSchemes.contains(scheme) || !isFsAvailable + } + + def downloadResource(resource: String): String = { +val uri = Utils.resolveURI(resource) +uri.getScheme match { + case "local" | "file" => resource + case e if shouldDownload(e) => +if (deployMode == CLIENT) { + // In client mode, we already download the resources, so figuring out the local one + // should be enough. + val fileName = new Path(uri).getName + new File(targetDir, fileName).toURI.toString +} else { + downloadFile(resource, targetDir, sparkConf, hadoopConf, secMgr) +} + case _ => uri.toString +} + } + + args.primaryResource = Option(args.primaryResource).map { downloadResource }.orNull + args.files = Option(args.files).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull + args.pyFiles = Option(args.pyFiles).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull + args.jars = Option(args.jars).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull + args.archives = Option(args.archives).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull --- End diff -- From the code `--files` and `--jars` overwrite `spark.yarn.*` long ago AFAIK. What I think is that we should make `spark.yarn.*` as an internal configurations to reduce the discrepancy. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19215: [MINOR][SQL] Only populate type metadata for required ty...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19215 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19215: [MINOR][SQL] Only populate type metadata for required ty...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19215 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19185 **[Test build #81757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81757/testReport)** for PR 19185 at commit [`6529fa6`](https://github.com/apache/spark/commit/6529fa6ecb7d607d3b38e68c8007bc22d9e27907). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19185 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81757/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json support c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19223 **[Test build #81761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81761/testReport)** for PR 19223 at commit [`158140e`](https://github.com/apache/spark/commit/158140e2b9c4adc8906dd25d9ec9fe37306b8436). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19135: [SPARK-21923][CORE]Avoid calling reserveUnrollMemoryForT...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/19135 Hi @jerryshao thanks for your reviewing. >So it somehow reflects that CPU core contention is the main issue for memory pre-occupation I have modified the code, now it will not request more memory, now it just reduce the times of calling `reserveUnrollMemoryForThisTask` followed by @cloud-fan comments. And also the method is same as `putIteratorAsValues`. Yeah, its impact will be small with small cores. In the above test results, it doesn't bring any regressions, and also better for many cores. For machine learning, we need cache the source data to OFF_HEAP in order to reduce the gc problem. For the configuration, I think the different application scenarios may be different. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19231: [SPARK-22002][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19231#discussion_r138800677 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -993,7 +996,10 @@ class JDBCSuite extends SparkFunSuite Seq(StructField("NAME", StringType, true), StructField("THEID", IntegerType, true))) val df = sql("select * from people_view") assert(df.schema.size === 2) - assert(df.schema === schema) --- End diff -- revert it back. Change the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309 to ``` fields(i) = StructField(columnName, columnType, nullable) ``` You also need to update some test cases due to the above change, I think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r138800321 --- Diff: python/pyspark/sql/functions.py --- @@ -1921,10 +1921,12 @@ def from_json(col, schema, options={}): @since(2.1) def to_json(col, options={}): """ -Converts a column containing a [[StructType]] or [[ArrayType]] of [[StructType]]s into a -JSON string. Throws an exception, in the case of an unsupported type. +Converts a column containing a [[StructType]], [[ArrayType]] of [[StructType]]s, +a [[MapType]] or [[ArrayType]] of [[MapType]] into a JSON string. +Throws an exception, in the case of an unsupported type. --- End diff -- ok Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r13870 --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out --- @@ -26,13 +26,13 @@ Extended Usage: {"time":"26/08/2015"} > SELECT to_json(array(named_struct('a', 1, 'b', 2)); [{"a":1,"b":2}] - > SELECT to_json(map('a',named_struct('b',1))); + > SELECT to_json(map('a', named_struct('b', 1))); --- End diff -- Oh. I see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r138799591 --- Diff: R/pkg/R/functions.R --- @@ -1715,7 +1717,15 @@ setMethod("to_date", #' #' # Converts an array of structs into a JSON array #' df2 <- sql("SELECT array(named_struct('name', 'Bob'), named_struct('name', 'Alice')) as people") -#' df2 <- mutate(df2, people_json = to_json(df2$people))} +#' df2 <- mutate(df2, people_json = to_json(df2$people)) +#' +#' # Converts a map into a JSON object +#' df2 <- sql("SELECT map('name', 'Bob')) as people") +#' df2 <- mutate(df2, people_json = to_json(df2$people)) +#' +#' # Converts an array of maps into a JSON array +#' df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people") +#' df2 <- mutate(df2, people_json = to_json(df2$people)) --- End diff -- ok Thanks for careful review :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r138799483 --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out --- @@ -26,13 +26,13 @@ Extended Usage: {"time":"26/08/2015"} > SELECT to_json(array(named_struct('a', 1, 'b', 2)); [{"a":1,"b":2}] - > SELECT to_json(map('a',named_struct('b',1))); + > SELECT to_json(map('a', named_struct('b', 1))); --- End diff -- umm. I modified `ExpressionDescription` of `StructsToJson` at @HyukjinKwon 's suggestions which didn't be merged in last PR. Here's the test for `describe function extended to_json`, so I needed to regenerate the golden file for it. So this change isn't from `json-functions.sql`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19226 **[Test build #81760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81760/testReport)** for PR 19226 at commit [`e99ed23`](https://github.com/apache/spark/commit/e99ed23ffa887311b8c77d57733ff005d6987bdb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19226 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81760/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19226 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19230#discussion_r138799219 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -99,73 +100,18 @@ public ArrayData copy() { @Override public Object[] array() { DataType dt = data.dataType(); + FunctiongetAtMethod = (Function ) i -> get(i, dt); Object[] list = new Object[length]; - - if (dt instanceof BooleanType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = data.getBoolean(offset + i); - } -} - } else if (dt instanceof ByteType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = data.getByte(offset + i); - } -} - } else if (dt instanceof ShortType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = data.getShort(offset + i); - } -} - } else if (dt instanceof IntegerType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = data.getInt(offset + i); - } -} - } else if (dt instanceof FloatType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = data.getFloat(offset + i); - } -} - } else if (dt instanceof DoubleType) { + try { for (int i = 0; i < length; i++) { if (!data.isNullAt(offset + i)) { -list[i] = data.getDouble(offset + i); +list[i] = getAtMethod.call(i); } } - } else if (dt instanceof LongType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = data.getLong(offset + i); - } -} - } else if (dt instanceof DecimalType) { -DecimalType decType = (DecimalType)dt; -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = getDecimal(i, decType.precision(), decType.scale()); - } -} - } else if (dt instanceof StringType) { -for (int i = 0; i < length; i++) { - if (!data.isNullAt(offset + i)) { -list[i] = getUTF8String(i).toString(); --- End diff -- This looks suspicious. Why we get `String` before? Seems we should get `UTF8String`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19231: [SPARK-22002][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19231#discussion_r138797882 --- Diff: docs/sql-programming-guide.md --- @@ -1333,7 +1333,7 @@ the following case-insensitive options: customSchema - The custom schema to use for reading data from JDBC connectors. For example, "id DECIMAL(38, 0), name STRING"). The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading. + The custom schema to use for reading data from JDBC connectors. For example, "id DECIMAL(38, 0), name STRING". You can also specify partial fields, others use default values. For example, "id DECIMAL(38, 0)". The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading. --- End diff -- `others` -> `and the others use the default type mapping` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19188 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19226#discussion_r138797273 --- Diff: python/pyspark/serializers.py --- @@ -343,6 +343,8 @@ def _load_stream_without_unbatching(self, stream): key_batch_stream = self.key_ser._load_stream_without_unbatching(stream) val_batch_stream = self.val_ser._load_stream_without_unbatching(stream) for (key_batch, val_batch) in zip(key_batch_stream, val_batch_stream): +key_batch = list(key_batch) +val_batch = list(val_batch) --- End diff -- Should we fix the doc in `Serializer._load_stream_without_unbatching` to say, it returns iterator of iterables? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19188#discussion_r138797271 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala --- @@ -29,7 +33,11 @@ class TPCDSQueryBenchmarkArguments(val args: Array[String]) { while(args.nonEmpty) { args match { case ("--data-location") :: value :: tail => - dataLocation = value + dataLocation = value.toLowerCase(Locale.ROOT) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19226#discussion_r138797113 --- Diff: python/pyspark/tests.py --- @@ -644,6 +644,18 @@ def test_cartesian_chaining(self): set([(x, (y, y)) for x in range(10) for y in range(10)]) ) +def test_zip_chaining(self): +# Tests for SPARK-21985 +rdd = self.sc.parallelize(range(10), 2) --- End diff -- This test case already passes, doesn't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @zhengruifeng Yeah, it is better. Actually the difference between running multiple `withColumn` and one `withColumns` is mainly in the cost of query analysis and plan/dataset initialization. I will re-run the benchmark. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19188 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19188 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19188#discussion_r138796870 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala --- @@ -29,7 +33,11 @@ class TPCDSQueryBenchmarkArguments(val args: Array[String]) { while(args.nonEmpty) { args match { case ("--data-location") :: value :: tail => - dataLocation = value + dataLocation = value.toLowerCase(Locale.ROOT) --- End diff -- I am not sure about that one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19229 In the test code, should we use `model.transform(df).count` instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19230 Add a test for it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19226 **[Test build #81760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81760/testReport)** for PR 19226 at commit [`e99ed23`](https://github.com/apache/spark/commit/e99ed23ffa887311b8c77d57733ff005d6987bdb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19231: [SPARK-22002][SQL] Read JDBC table use custom schema sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19231 **[Test build #81758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81758/testReport)** for PR 19231 at commit [`9e7a8a4`](https://github.com/apache/spark/commit/9e7a8a471835d5e93a729c15d166451e79567447). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19230 **[Test build #81759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81759/testReport)** for PR 19230 at commit [`adbaeab`](https://github.com/apache/spark/commit/adbaeabf18ee1f96611ecbd6ee627bc0a457289d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...
GitHub user liufengdb opened a pull request: https://github.com/apache/spark/pull/19230 [SPARK-22003][SQL] support array column in vectorized reader with UDF ## What changes were proposed in this pull request? The UDF needs to deserialize the `UnsafeRow`. When the column type is Array, the `get` method from the `ColumnVector`, which is used by the vectorized reader, is called, but this method is not implemented. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liufengdb/spark fix_array_open Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19230.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19230 commit adbaeabf18ee1f96611ecbd6ee627bc0a457289d Author: Feng LiuDate: 2017-09-12T21:56:55Z init --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19231: [SPARK-22002][SQL] Read JDBC table use custom sch...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/19231 [SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/18266 add a new feature to support read JDBC table use custom schema, but we must specify all the fields. For simplicity, this PR support specify partial fields. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22002 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19231 commit 9e7a8a471835d5e93a729c15d166451e79567447 Author: Yuming WangDate: 2017-09-14T04:26:46Z Read JDBC table use custom schema support specify partial fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81749/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19216 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19216 **[Test build #81749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81749/testReport)** for PR 19216 at commit [`e4cffda`](https://github.com/apache/spark/commit/e4cffda91cf9ab3673e12f1427ad1d02c5e5b71e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19185 **[Test build #81757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81757/testReport)** for PR 19185 at commit [`6529fa6`](https://github.com/apache/spark/commit/6529fa6ecb7d607d3b38e68c8007bc22d9e27907). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r138795133 --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out --- @@ -26,13 +26,13 @@ Extended Usage: {"time":"26/08/2015"} > SELECT to_json(array(named_struct('a', 1, 'b', 2)); [{"a":1,"b":2}] - > SELECT to_json(map('a',named_struct('b',1))); + > SELECT to_json(map('a', named_struct('b', 1))); --- End diff -- Or you forget to commit `json-functions.sql`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r138795006 --- Diff: sql/core/src/test/resources/sql-tests/results/json-functions.sql.out --- @@ -26,13 +26,13 @@ Extended Usage: {"time":"26/08/2015"} > SELECT to_json(array(named_struct('a', 1, 'b', 2)); [{"a":1,"b":2}] - > SELECT to_json(map('a',named_struct('b',1))); + > SELECT to_json(map('a', named_struct('b', 1))); --- End diff -- I think you committed unrelated change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] drop test tables and impr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19213 **[Test build #81747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81747/testReport)** for PR 19213 at commit [`d922c85`](https://github.com/apache/spark/commit/d922c85fe6e462df122450ed015c0a7e722d2e2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 FYI, the `withColumns` API was proposed in #17819. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Ran the similar benchmark as https://github.com/apache/spark/pull/18902#issuecomment-321727416: numColums | Old Mean | Old Median | New Mean | New Median -- | -- | -- | -- | -- 1 | 0.1290674059002 | 0.087246649 | 0.1263591766 | 0.05826856929996 10 | 0.4222436709003 | 0.2957120874 | 0.1382999133002 | 0.0752307166 100 | 6.93127441728 | 7.2270134943 | 0.3018686074 | 0.2554692345 The test code is the same basically but measuring transforming time now: import org.apache.spark.ml.feature._ import org.apache.spark.sql.Row import org.apache.spark.sql.types._ import spark.implicits._ import scala.util.Random val seed = 123l val random = new Random(seed) val n = 1 val m = 100 val rows = sc.parallelize(1 to n).map(i=> Row(Array.fill(m)(random.nextDouble): _*)) val struct = new StructType(Array.range(0,m,1).map(i => StructField(s"c$i",DoubleType,true))) val df = spark.createDataFrame(rows, struct) df.persist() df.count() for (strategy <- Seq("mean", "median"); k <- Seq(1,10,100)) { val imputer = new Imputer().setStrategy(strategy).setInputCols(Array.range(0,k,1).map(i=>s"c$i")).setOutputCols(Array.range(0,k,1).map(i=>s"o$i")) var duration = 0.0 for (i<- 0 until 10) { val model = imputer.fit(df) val start = System.nanoTime() model.transform(df) val end = System.nanoTime() duration += (end - start) / 1e9 } println((strategy, k, duration/10)) } --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138794370 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -59,6 +60,18 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes(MemoryBlock base, long offset, int lengthInBytes, int seed) { --- End diff -- It makes sense. Is it better to add postfix `MB` to another version of the method? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81756/testReport)** for PR 19229 at commit [`4b47709`](https://github.com/apache/spark/commit/4b477093737e9d9fae16c82836e421b5e0e7c63e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 cc @MLnick @zhengruifeng @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] drop test tables and impr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81747/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138794058 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -75,67 +76,131 @@ public static boolean unaligned() { return unaligned; } + public static int getInt(MemoryBlock object, long offset) { --- End diff -- Do you want to move them (i.e. methods with `MemoryBlock` argument) into `unsafe/memory/MemoryBlock.java`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] drop test tables and impr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19223: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json su...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/19223#discussion_r138794052 --- Diff: R/pkg/R/functions.R --- @@ -1715,7 +1717,15 @@ setMethod("to_date", #' #' # Converts an array of structs into a JSON array #' df2 <- sql("SELECT array(named_struct('name', 'Bob'), named_struct('name', 'Alice')) as people") -#' df2 <- mutate(df2, people_json = to_json(df2$people))} +#' df2 <- mutate(df2, people_json = to_json(df2$people)) +#' +#' # Converts a map into a JSON object +#' df2 <- sql("SELECT map('name', 'Bob')) as people") +#' df2 <- mutate(df2, people_json = to_json(df2$people)) +#' +#' # Converts an array of maps into a JSON array +#' df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people") +#' df2 <- mutate(df2, people_json = to_json(df2$people)) --- End diff -- ... meaning `}` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81744/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19211 **[Test build #81744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81744/testReport)** for PR 19211 at commit [`20b8382`](https://github.com/apache/spark/commit/20b83826a70ac8574e289db9fdcae37c305c01bd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81755/testReport)** for PR 19229 at commit [`4efb643`](https://github.com/apache/spark/commit/4efb64374b7c93bae3e9b0d2fc0ebc4f5ad1e1d5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19228 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81753/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19228 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19228 **[Test build #81753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81753/testReport)** for PR 19228 at commit [`0703b67`](https://github.com/apache/spark/commit/0703b67405fa721230af80421509a55eb88c5763). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19229: [SPARK-22001][ML][SQL] ImputerModel can do withCo...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19229 [SPARK-22001][ML][SQL] ImputerModel can do withColumn for all input columns at one pass ## What changes were proposed in this pull request? SPARK-21690 makes one-pass `Imputer` by parallelizing the computation of all input columns. When we transform dataset with `ImputerModel`, we do `withColumn` on all input columns sequentially. We can also do this on all input columns at once by adding a `withColumns` API to `Dataset`. The new `withColumns` API is for internal use only now. ## How was this patch tested? Existing tests for `ImputerModel`'s change. Added tests for `withColumns` API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22001 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19229.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19229 commit 4efb64374b7c93bae3e9b0d2fc0ebc4f5ad1e1d5 Author: Liang-Chi HsiehDate: 2017-09-14T03:49:16Z Do withColumn on all input columns at once. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19185#discussion_r138792851 --- Diff: python/pyspark/ml/tests.py --- @@ -1464,20 +1464,79 @@ def test_logistic_regression_summary(self): self.assertEqual(s.probabilityCol, "probability") self.assertEqual(s.labelCol, "label") self.assertEqual(s.featuresCol, "features") +self.assertEqual(s.predictionCol, "prediction") objHist = s.objectiveHistory self.assertTrue(isinstance(objHist, list) and isinstance(objHist[0], float)) self.assertGreater(s.totalIterations, 0) +self.assertTrue(isinstance(s.labels, list)) +self.assertTrue(isinstance(s.truePositiveRateByLabel, list)) +self.assertTrue(isinstance(s.falsePositiveRateByLabel, list)) +self.assertTrue(isinstance(s.precisionByLabel, list)) +self.assertTrue(isinstance(s.recallByLabel, list)) +self.assertTrue(isinstance(s.fMeasureByLabel(), list)) +self.assertTrue(isinstance(s.fMeasureByLabel(1.0), list)) self.assertTrue(isinstance(s.roc, DataFrame)) self.assertAlmostEqual(s.areaUnderROC, 1.0, 2) self.assertTrue(isinstance(s.pr, DataFrame)) self.assertTrue(isinstance(s.fMeasureByThreshold, DataFrame)) self.assertTrue(isinstance(s.precisionByThreshold, DataFrame)) self.assertTrue(isinstance(s.recallByThreshold, DataFrame)) +self.assertAlmostEqual(s.accuracy, 1.0, 2) +self.assertAlmostEqual(s.weightedTruePositiveRate, 1.0, 2) +self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.0, 2) +self.assertAlmostEqual(s.weightedRecall, 1.0, 2) +self.assertAlmostEqual(s.weightedPrecision, 1.0, 2) +self.assertAlmostEqual(s.weightedFMeasure(), 1.0, 2) +self.assertAlmostEqual(s.weightedFMeasure(1.0), 1.0, 2) # test evaluation (with training dataset) produces a summary with same values # one check is enough to verify a summary is returned, Scala version runs full test sameSummary = model.evaluate(df) self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC) +def test_multiclass_logistic_regression_summary(self): +df = self.spark.createDataFrame([(1.0, 2.0, Vectors.dense(1.0)), + (0.0, 2.0, Vectors.sparse(1, [], [])), + (2.0, 2.0, Vectors.dense(2.0)), + (2.0, 2.0, Vectors.dense(1.9))], +["label", "weight", "features"]) +lr = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight", fitIntercept=False) +model = lr.fit(df) +self.assertTrue(model.hasSummary) +s = model.summary +# test that api is callable and returns expected types +self.assertTrue(isinstance(s.predictions, DataFrame)) +self.assertEqual(s.probabilityCol, "probability") +self.assertEqual(s.labelCol, "label") +self.assertEqual(s.featuresCol, "features") +self.assertEqual(s.predictionCol, "prediction") +objHist = s.objectiveHistory +self.assertTrue(isinstance(objHist, list) and isinstance(objHist[0], float)) +self.assertGreater(s.totalIterations, 0) +self.assertTrue(isinstance(s.labels, list)) +self.assertTrue(isinstance(s.truePositiveRateByLabel, list)) +self.assertTrue(isinstance(s.falsePositiveRateByLabel, list)) +self.assertTrue(isinstance(s.precisionByLabel, list)) +self.assertTrue(isinstance(s.recallByLabel, list)) +self.assertTrue(isinstance(s.fMeasureByLabel(), list)) +self.assertTrue(isinstance(s.fMeasureByLabel(1.0), list)) +self.assertAlmostEqual(s.accuracy, 0.75, 2) +self.assertAlmostEqual(s.weightedTruePositiveRate, 0.75, 2) +self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2) +self.assertAlmostEqual(s.weightedRecall, 0.75, 2) +self.assertAlmostEqual(s.weightedPrecision, 0.583, 2) +self.assertAlmostEqual(s.weightedFMeasure(), 0.65, 2) +self.assertAlmostEqual(s.weightedFMeasure(1.0), 0.65, 2) +# test evaluation (with training dataset) produces a summary with same values +# one check is enough to verify a summary is returned, Scala version runs full test +sameSummary = model.evaluate(df) +self.assertAlmostEqual(sameSummary.accuracy, s.accuracy) --- End diff -- Nit: Like mentioned in annotation, one check is enough to verify a summary is returned, let's remove others to simplify the test. Thanks. ---
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81752/testReport)** for PR 19186 at commit [`74445cd`](https://github.com/apache/spark/commit/74445cdfec15bef2413ea88b712b9490a2997874). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81752/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19210: Fix Graphite re-connects for Graphite instances behind E...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19210 **[Test build #81754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81754/testReport)** for PR 19210 at commit [`8e982c7`](https://github.com/apache/spark/commit/8e982c7d450498580ab857baeed2650488ea1837). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19210: Fix Graphite re-connects for Graphite instances behind E...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19210 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138791246 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with Logging { }.orNull } +// When running in YARN cluster manager, --- End diff -- Sorry for the broken comment, my bad, I will fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81745/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19188 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19210: Fix Graphite re-connects for Graphite instances behind E...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19210 Sure. ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19188 **[Test build #81745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81745/testReport)** for PR 19188 at commit [`b543e71`](https://github.com/apache/spark/commit/b543e710dae79da33a9334d5bbe4bb474a44b39c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19226#discussion_r138790747 --- Diff: python/pyspark/serializers.py --- @@ -343,9 +346,6 @@ def _load_stream_without_unbatching(self, stream): key_batch_stream = self.key_ser._load_stream_without_unbatching(stream) val_batch_stream = self.val_ser._load_stream_without_unbatching(stream) for (key_batch, val_batch) in zip(key_batch_stream, val_batch_stream): -if len(key_batch) != len(val_batch): -raise ValueError("Can not deserialize PairRDD with different number of items" - " in batches: (%d, %d)" % (len(key_batch), len(val_batch))) # for correctness with repeated cartesian/zip this must be returned as one batch yield zip(key_batch, val_batch) --- End diff -- How about returning this batch as a list (and as described in the doc)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/19228 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19228 Doh, sorry @holdenk and @aray, I didn't know the PR was open and was in progress. Although the approach looks different with https://github.com/apache/spark/pull/19226, let me close mine and discuss first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19228 **[Test build #81753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81753/testReport)** for PR 19228 at commit [`0703b67`](https://github.com/apache/spark/commit/0703b67405fa721230af80421509a55eb88c5763). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19228: [SPARK-21985][PYTHON] Fix zip-chained RDD to work
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/19228 [SPARK-21985][PYTHON] Fix zip-chained RDD to work ## What changes were proposed in this pull request? This PR proposes to return an iterator of lists (batches) of objects in `CartesianDeserializer` and `PairDeserializer` rather than an iterator of iterators (batches) of objects so that `zip` chaining works. ## How was this patch tested? Unit tests added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-21985 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19228 commit 0703b67405fa721230af80421509a55eb88c5763 Author: hyukjinkwonDate: 2017-09-14T03:29:39Z Returns an iterator of lists --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19210: Fix Graphite re-connects for Graphite instances behind E...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19210 @HyukjinKwon would you please help to trigger the Jenkins? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81752/testReport)** for PR 19186 at commit [`74445cd`](https://github.com/apache/spark/commit/74445cdfec15bef2413ea88b712b9490a2997874). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19132: [SPARK-21922] Fix duration always updating when t...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19132#discussion_r138789492 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/OneStageResource.scala --- @@ -81,7 +83,8 @@ private[v1] class OneStageResource(ui: SparkUI) { @DefaultValue("20") @QueryParam("length") length: Int, @DefaultValue("ID") @QueryParam("sortBy") sortBy: TaskSorting): Seq[TaskData] = { withStageAttempt(stageId, stageAttemptId) { stage => - val tasks = stage.ui.taskData.values.map{AllStagesResource.convertTaskData}.toIndexedSeq + val tasks = stage.ui.taskData.values.map{ --- End diff -- The style should be changed to `map { AllStagesResource.convertTaskData(_, ui.lastUpdateTime) }`, requires whitespace between `{` and `}`. You could check other similar codes about the style. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/19226 Sure, no worries. I think we should keep the test for now and we can hope this goes into RC2 (I assume something will be missing from RC1 or I'll screw up its packaging in some way). Otherwise the fix can go out into 2.2.1 if somehow RC1 magically passes :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19132: [SPARK-21922] Fix duration always updating when t...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19132#discussion_r138789213 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -97,6 +97,7 @@ private[spark] object UIData { var memoryBytesSpilled: Long = _ var diskBytesSpilled: Long = _ var isBlacklisted: Int = _ +var jobLastUpdateTime: Option[Long] = None --- End diff -- Is it better to rename to `stageLastUpdateTime` or just `lastUpdateTime`? Since this structure unrelated to job, would be better to not involve "job". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81751/testReport)** for PR 19186 at commit [`aa04d4b`](https://github.com/apache/spark/commit/aa04d4bb5124ccd570775076047849b49025735f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81751/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19220: [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpa...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19220 LGTM Thanks for this catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81751/testReport)** for PR 19186 at commit [`aa04d4b`](https://github.com/apache/spark/commit/aa04d4bb5124ccd570775076047849b49025735f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18902 @MLnick Thanks for pinging me. I go through this quickly. The basic idea is the same, performing the operations on multiple inputs columns at one single Dataset/DataFrame operation. Unlike `Bucketizer`, `Imputer` has no compatibility concern because it already supports multiple input columns (`HasInputCols`). In `Bucketizer`, we don't want to break its current API so it makes thing more complicated a bit. Actually I'm noticed by `ImputerModel` which also applies `withColumn` sequentially on each input column. I'd like to address this part with the `withColumns` API proposed in #17819. What do you think @MLnick? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...
Github user aray commented on the issue: https://github.com/apache/spark/pull/19226 @holdenk I'm not going to be able to solve this tonight (short of just removing the failing test). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps...
Github user marktab commented on the issue: https://github.com/apache/spark/pull/19152 @srowen -- may I close this pull request? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 Pa...
GitHub user marktab reopened a pull request: https://github.com/apache/spark/pull/19152 [SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing @dongjoon-hyun @HyukjinKwon Error in PySpark example code: /examples/src/main/python/ml/estimator_transformer_param_example.py The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala ## What changes were proposed in this pull request? Proposing to call the lr variable instead of model1 or model2 ## How was this patch tested? This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. Pyspark returns nothing at present for those two print lines. The output for model2 in PySpark should be {Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06, Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0, Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', doc='prediction column name.'): 'prediction', Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', doc='features column name.'): 'features', Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', doc='label column name.'): 'label', Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.'): 'myProbability', Param(parent='LogisticRegression_4187be538f744d5a9090', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name.'): 'rawPrediction', Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial'): 'auto', Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', doc='whether to fit an intercept term.'): True, Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', doc='Threshold in binary classification prediction, in range [0, 1]. If threshold and thresholds are both set, they must match.e.g. if threshold is p, then thresholds must be equal to [1-p, p].'): 0.55, Param(parent='LogisticRegression_4187be538f744d5a9090', name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2, Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', doc='max number of iterations (>= 0).'): 30, Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', doc='regularization parameter (>= 0).'): 0.1, Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', doc='whether to standardize the training features before fitting the model.'): True} Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marktab/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19152 commit a2ccb8a83d13d39c95f0ac1cac1c74dca064 Author: MarkTab marktab.netDate: 2017-09-07T02:20:59Z Model 1 and Model 2 ParamMaps Missing @dongjoon-hyun @HyukjinKwon Error in PySpark example code: [https://github.com/apache/spark/blob/master/examples/src/main/python/ml/estimator_transformer_param_example.py] The original Scala code says println("Model 2 was fit using parameters: " + model2.parent.extractParamMap) The parent is lr There is no method for accessing parent as is done in Scala. This code has been tested in Python, and returns values consistent with Scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 Pa...
Github user marktab closed the pull request at: https://github.com/apache/spark/pull/19152 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState in shutdown hook.
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19219 cc @cloud-fan @jiangxb1987 Could you please take a look at this ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org