[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84679/testReport)** for PR 19594 at commit [`e69e213`](https://github.com/apache/spark/commit/e69e21348b4cde2abaec9dbb46381caf1ed3a1a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84677/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19591 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19591 **[Test build #84677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84677/testReport)** for PR 19591 at commit [`ee4098b`](https://github.com/apache/spark/commit/ee4098bf108c8e919b41e392c7316271173e6dc2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19931 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19931 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84678/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19931 **[Test build #84678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84678/testReport)** for PR 19931 at commit [`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19843 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19843 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84675/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19843 **[Test build #84675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84675/testReport)** for PR 19843 at commit [`930c113`](https://github.com/apache/spark/commit/930c113886dd27e784b8d2c6844dd92d8cdaa5a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84672/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19717 **[Test build #84672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84672/testReport)** for PR 19717 at commit [`caf2206`](https://github.com/apache/spark/commit/caf22060f600b3b382e2e98b7ee5f0aacc165f2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19676#discussion_r155913190 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -51,9 +52,17 @@ public static void main(String[] args) { KMeans kmeans = new KMeans().setK(2).setSeed(1L); KMeansModel model = kmeans.fit(dataset); -// Evaluate clustering by computing Within Set Sum of Squared Errors. -double WSSSE = model.computeCost(dataset); -System.out.println("Within Set Sum of Squared Errors = " + WSSSE); +// Make predictions +Dataset predictions = model.transform(dataset); + +// Evaluate clustering by computing Silhouette score +ClusteringEvaluator evaluator = new ClusteringEvaluator() + .setFeaturesCol("features") + .setPredictionCol("prediction") --- End diff -- We use default values here, so it's not necessary to set them explicitly. We should keep examples as simple as possible. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14129: [SPARK-16280][SQL] Implement histogram_numeric SQL funct...
Github user cenyuhai commented on the issue: https://github.com/apache/spark/pull/14129 Is this pr available? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19931 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19931 **[Test build #84678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84678/testReport)** for PR 19931 at commit [`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19591 **[Test build #84677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84677/testReport)** for PR 19591 at commit [`ee4098b`](https://github.com/apache/spark/commit/ee4098bf108c8e919b41e392c7316271173e6dc2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84673/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19931 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19931 **[Test build #84673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84673/testReport)** for PR 19931 at commit [`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19594 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19594 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84676/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84676/testReport)** for PR 19594 at commit [`e69e213`](https://github.com/apache/spark/commit/e69e21348b4cde2abaec9dbb46381caf1ed3a1a4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19591 Looks like a legitimate flaky test. Will take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84671/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84671/testReport)** for PR 19829 at commit [`96df5f2`](https://github.com/apache/spark/commit/96df5f26d163a4a17d8ab824995b57992afa6b8b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MessageWithHeader extends AbstractFileRegion ` * ` static class EncryptedMessage extends AbstractFileRegion ` * `public abstract class AbstractFileRegion extends AbstractReferenceCounted implements FileRegion ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19715 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19715 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84674/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19715 **[Test build #84674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84674/testReport)** for PR 19715 at commit [`445bd84`](https://github.com/apache/spark/commit/445bd84a6e5e81896d5c94ada7035b00e2c22337). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19769 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19769 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19769 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19769 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84667/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19769 **[Test build #84667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84667/testReport)** for PR 19769 at commit [`1ea75c0`](https://github.com/apache/spark/commit/1ea75c0a8f2c5fed33b2a6d6102ad1d8bdf73906). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19751 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19751 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84665/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19751 **[Test build #84665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84665/testReport)** for PR 19751 at commit [`2606fcd`](https://github.com/apache/spark/commit/2606fcd6493ce7a57f3555c2613d43f1a0391bf7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r155910267 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala --- @@ -67,6 +68,205 @@ class JoinEstimationSuite extends StatsEstimationTestBase { rowCount = 2, attributeStats = AttributeMap(Seq("key-1-2", "key-2-3").map(nameToColInfo))) + private def estimateByHistogram( + histogram1: Histogram, + histogram2: Histogram, + expectedMin: Double, + expectedMax: Double, + expectedNdv: Long, + expectedRows: Long): Unit = { +val col1 = attr("key1") +val col2 = attr("key2") +val c1 = generateJoinChild(col1, histogram1, expectedMin, expectedMax) +val c2 = generateJoinChild(col2, histogram2, expectedMin, expectedMax) + +val c1JoinC2 = Join(c1, c2, Inner, Some(EqualTo(col1, col2))) +val c2JoinC1 = Join(c2, c1, Inner, Some(EqualTo(col2, col1))) +val expectedStatsAfterJoin = Statistics( + sizeInBytes = expectedRows * (8 + 2 * 4), + rowCount = Some(expectedRows), + attributeStats = AttributeMap(Seq( +col1 -> c1.stats.attributeStats(col1).copy( + distinctCount = expectedNdv, min = Some(expectedMin), max = Some(expectedMax)), +col2 -> c2.stats.attributeStats(col2).copy( + distinctCount = expectedNdv, min = Some(expectedMin), max = Some(expectedMax +) + +// Join order should not affect estimation result. +Seq(c1JoinC2, c2JoinC1).foreach { join => + assert(join.stats == expectedStatsAfterJoin) +} + } + + private def generateJoinChild( + col: Attribute, + histogram: Histogram, + expectedMin: Double, + expectedMax: Double): LogicalPlan = { +val colStat = inferColumnStat(histogram) +val t = StatsTestPlan( + outputList = Seq(col), + rowCount = (histogram.height * histogram.bins.length).toLong, + attributeStats = AttributeMap(Seq(col -> colStat))) + +val filterCondition = new ArrayBuffer[Expression]() +if (expectedMin > colStat.min.get.toString.toDouble) { + filterCondition += GreaterThanOrEqual(col, Literal(expectedMin)) +} +if (expectedMax < colStat.max.get.toString.toDouble) { + filterCondition += LessThanOrEqual(col, Literal(expectedMax)) +} +if (filterCondition.isEmpty) t else Filter(filterCondition.reduce(And), t) + } + + private def inferColumnStat(histogram: Histogram): ColumnStat = { +var ndv = 0L +for (i <- histogram.bins.indices) { + val bin = histogram.bins(i) + if (i == 0 || bin.hi != histogram.bins(i - 1).hi) { +ndv += bin.ndv + } +} +ColumnStat(distinctCount = ndv, min = Some(histogram.bins.head.lo), + max = Some(histogram.bins.last.hi), nullCount = 0, avgLen = 4, maxLen = 4, + histogram = Some(histogram)) + } + + test("equi-height histograms: a bin is contained by another one") { +val histogram1 = Histogram(height = 300, Array( + HistogramBin(lo = 10, hi = 30, ndv = 10), HistogramBin(lo = 30, hi = 60, ndv = 30))) +val histogram2 = Histogram(height = 100, Array( + HistogramBin(lo = 0, hi = 50, ndv = 50), HistogramBin(lo = 50, hi = 100, ndv = 40))) +// test bin trimming +val (t1, h1) = trimBin(histogram2.bins(0), height = 100, min = 10, max = 60) +assert(t1 == HistogramBin(lo = 10, hi = 50, ndv = 40) && h1 == 80) +val (t2, h2) = trimBin(histogram2.bins(1), height = 100, min = 10, max = 60) +assert(t2 == HistogramBin(lo = 50, hi = 60, ndv = 8) && h2 == 20) + +val expectedRanges = Seq( + OverlappedRange(10, 30, math.min(10, 40*1/2), math.max(10, 40*1/2), 300, 80*1/2), + OverlappedRange(30, 50, math.min(30*2/3, 40*1/2), math.max(30*2/3, 40*1/2), 300*2/3, 80*1/2), + OverlappedRange(50, 60, math.min(30*1/3, 8), math.max(30*1/3, 8), 300*1/3, 20) +) +assert(expectedRanges.equals( + getOverlappedRanges(histogram1, histogram2, newMin = 10D, newMax = 60D))) + +estimateByHistogram( + histogram1 = histogram1, + histogram2 = histogram2, + expectedMin = 10D, + expectedMax = 60D, + // 10 + 20 + 8 + expectedNdv = 38L, + // 300*40/20 + 200*40/20 + 100*20/10 + expectedRows = 1200L) + } + + test("equi-height histograms: a bin has only one value") { +val histogram1 = Histogram(height = 300, Array( + HistogramBin(lo = 30, hi = 30, ndv = 1), HistogramBin(lo = 30, hi = 60, ndv = 30))) +val histogram2 = Histogram(height = 100, Array( +
[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r155910232 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala --- @@ -67,6 +68,205 @@ class JoinEstimationSuite extends StatsEstimationTestBase { rowCount = 2, attributeStats = AttributeMap(Seq("key-1-2", "key-2-3").map(nameToColInfo))) + private def estimateByHistogram( + histogram1: Histogram, + histogram2: Histogram, + expectedMin: Double, + expectedMax: Double, + expectedNdv: Long, + expectedRows: Long): Unit = { +val col1 = attr("key1") +val col2 = attr("key2") +val c1 = generateJoinChild(col1, histogram1, expectedMin, expectedMax) +val c2 = generateJoinChild(col2, histogram2, expectedMin, expectedMax) + +val c1JoinC2 = Join(c1, c2, Inner, Some(EqualTo(col1, col2))) +val c2JoinC1 = Join(c2, c1, Inner, Some(EqualTo(col2, col1))) +val expectedStatsAfterJoin = Statistics( + sizeInBytes = expectedRows * (8 + 2 * 4), + rowCount = Some(expectedRows), + attributeStats = AttributeMap(Seq( +col1 -> c1.stats.attributeStats(col1).copy( + distinctCount = expectedNdv, min = Some(expectedMin), max = Some(expectedMax)), +col2 -> c2.stats.attributeStats(col2).copy( + distinctCount = expectedNdv, min = Some(expectedMin), max = Some(expectedMax +) + +// Join order should not affect estimation result. +Seq(c1JoinC2, c2JoinC1).foreach { join => + assert(join.stats == expectedStatsAfterJoin) +} + } + + private def generateJoinChild( + col: Attribute, + histogram: Histogram, + expectedMin: Double, + expectedMax: Double): LogicalPlan = { +val colStat = inferColumnStat(histogram) +val t = StatsTestPlan( + outputList = Seq(col), + rowCount = (histogram.height * histogram.bins.length).toLong, + attributeStats = AttributeMap(Seq(col -> colStat))) + +val filterCondition = new ArrayBuffer[Expression]() +if (expectedMin > colStat.min.get.toString.toDouble) { + filterCondition += GreaterThanOrEqual(col, Literal(expectedMin)) +} +if (expectedMax < colStat.max.get.toString.toDouble) { + filterCondition += LessThanOrEqual(col, Literal(expectedMax)) +} +if (filterCondition.isEmpty) t else Filter(filterCondition.reduce(And), t) + } + + private def inferColumnStat(histogram: Histogram): ColumnStat = { +var ndv = 0L +for (i <- histogram.bins.indices) { + val bin = histogram.bins(i) + if (i == 0 || bin.hi != histogram.bins(i - 1).hi) { +ndv += bin.ndv + } +} +ColumnStat(distinctCount = ndv, min = Some(histogram.bins.head.lo), + max = Some(histogram.bins.last.hi), nullCount = 0, avgLen = 4, maxLen = 4, + histogram = Some(histogram)) + } + + test("equi-height histograms: a bin is contained by another one") { +val histogram1 = Histogram(height = 300, Array( + HistogramBin(lo = 10, hi = 30, ndv = 10), HistogramBin(lo = 30, hi = 60, ndv = 30))) +val histogram2 = Histogram(height = 100, Array( + HistogramBin(lo = 0, hi = 50, ndv = 50), HistogramBin(lo = 50, hi = 100, ndv = 40))) +// test bin trimming +val (t1, h1) = trimBin(histogram2.bins(0), height = 100, min = 10, max = 60) +assert(t1 == HistogramBin(lo = 10, hi = 50, ndv = 40) && h1 == 80) +val (t2, h2) = trimBin(histogram2.bins(1), height = 100, min = 10, max = 60) +assert(t2 == HistogramBin(lo = 50, hi = 60, ndv = 8) && h2 == 20) + +val expectedRanges = Seq( + OverlappedRange(10, 30, math.min(10, 40*1/2), math.max(10, 40*1/2), 300, 80*1/2), + OverlappedRange(30, 50, math.min(30*2/3, 40*1/2), math.max(30*2/3, 40*1/2), 300*2/3, 80*1/2), + OverlappedRange(50, 60, math.min(30*1/3, 8), math.max(30*1/3, 8), 300*1/3, 20) +) +assert(expectedRanges.equals( + getOverlappedRanges(histogram1, histogram2, newMin = 10D, newMax = 60D))) + +estimateByHistogram( + histogram1 = histogram1, + histogram2 = histogram2, + expectedMin = 10D, + expectedMax = 60D, + // 10 + 20 + 8 + expectedNdv = 38L, + // 300*40/20 + 200*40/20 + 100*20/10 + expectedRows = 1200L) + } + + test("equi-height histograms: a bin has only one value") { +val histogram1 = Histogram(height = 300, Array( + HistogramBin(lo = 30, hi = 30, ndv = 1), HistogramBin(lo = 30, hi = 60, ndv = 30))) +val histogram2 = Histogram(height = 100, Array( +
[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19594 **[Test build #84676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84676/testReport)** for PR 19594 at commit [`e69e213`](https://github.com/apache/spark/commit/e69e21348b4cde2abaec9dbb46381caf1ed3a1a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19843 **[Test build #84675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84675/testReport)** for PR 19843 at commit [`930c113`](https://github.com/apache/spark/commit/930c113886dd27e784b8d2c6844dd92d8cdaa5a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19931 **[Test build #84673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84673/testReport)** for PR 19931 at commit [`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19715 **[Test build #84674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84674/testReport)** for PR 19715 at commit [`445bd84`](https://github.com/apache/spark/commit/445bd84a6e5e81896d5c94ada7035b00e2c22337). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `sp...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19931 [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.conf` ## What changes were proposed in this pull request? During https://github.com/apache/spark/pull/19882, `conf` is mistakenly used to switch ORC implementation between `native` and `hive`. To affect `OrcTest`, `spark.conf` should be used. ## How was this patch tested? Pass the tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-22672-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19931.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19931 commit b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea Author: Dongjoon Hyun Date: 2017-12-08T22:17:48Z [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.conf` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84664/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19925 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84666/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19591 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19925 **[Test build #84664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84664/testReport)** for PR 19925 at commit [`4d166de`](https://github.com/apache/spark/commit/4d166ded90b071332c42704070e98e581fa92042). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19591 **[Test build #84666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84666/testReport)** for PR 19591 at commit [`8013766`](https://github.com/apache/spark/commit/8013766d730b9fa14b9d0c71d527dfcfcead8af1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/19930 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/19715 @MLnick Thank you very much for your comments! I will change these. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/19717 @vanzin Created https://issues.apache.org/jira/browse/SPARK-22743 to track the work on consolidating the common logic for handling driver and executor memory overhead. Addressed other comments in https://github.com/apache/spark/pull/19717/commits/caf22060f600b3b382e2e98b7ee5f0aacc165f2d. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19717 **[Test build #84672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84672/testReport)** for PR 19717 at commit [`caf2206`](https://github.com/apache/spark/commit/caf22060f600b3b382e2e98b7ee5f0aacc165f2d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r155908121 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -119,5 +117,46 @@ private[spark] object Config extends Logging { "must be a positive integer") .createWithDefault(10) + val WAIT_FOR_APP_COMPLETION = +ConfigBuilder("spark.kubernetes.submission.waitAppCompletion") + .doc("In cluster mode, whether to wait for the application to finish before exiting the " + +"launcher process.") + .booleanConf + .createWithDefault(true) + + val REPORT_INTERVAL = +ConfigBuilder("spark.kubernetes.report.interval") + .doc("Interval between reports of the current app status in cluster mode.") + .timeConf(TimeUnit.MILLISECONDS) + .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.") + .createWithDefaultString("1s") + + private[spark] val JARS_DOWNLOAD_LOCATION = --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r155908117 --- Diff: docs/configuration.md --- @@ -157,13 +157,31 @@ of the most common options to set are: or in your default properties file. + + spark.driver.memoryOverhead + driverMemory * 0.10, with minimum of 384 + +The amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode. This is +memory that accounts for things like VM overheads, interned strings, other native overheads, etc. +This tends to grow with the container size (typically 6-10%). --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17702#discussion_r155906572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -668,4 +672,31 @@ object DataSource extends Logging { } globPath } + + /** + * Return all paths represented by the wildcard string. + * Follow [[InMemoryFileIndex]].bulkListLeafFile and reuse the conf. + */ + private def getGlobbedPaths( + sparkSession: SparkSession, + fs: FileSystem, + hadoopConf: SerializableConfiguration, + qualified: Path): Seq[Path] = { +val paths = SparkHadoopUtil.get.expandGlobPath(fs, qualified) +if (paths.size <= sparkSession.sessionState.conf.parallelPartitionDiscoveryThreshold) { + SparkHadoopUtil.get.globPathIfNecessary(fs, qualified) +} else { + val parallelPartitionDiscoveryParallelism = + sparkSession.sessionState.conf.parallelPartitionDiscoveryParallelism + val numParallelism = Math.min(paths.size, parallelPartitionDiscoveryParallelism) + val expanded = sparkSession.sparkContext --- End diff -- Why do this using a Spark job, instead of just a local thread pool? I see this is the same thing done by `InMemoryFileIndex`, but it feels unnecessarily expensive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84671/testReport)** for PR 19829 at commit [`96df5f2`](https://github.com/apache/spark/commit/96df5f26d163a4a17d8ab824995b57992afa6b8b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84670/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84670/testReport)** for PR 19829 at commit [`c7cffe9`](https://github.com/apache/spark/commit/c7cffe9f284762432d1d320845c60f8586f434af). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MessageWithHeader extends AbstractFileRegion ` * ` static class EncryptedMessage extends AbstractFileRegion ` * `public abstract class AbstractFileRegion extends AbstractReferenceCounted implements FileRegion ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84670/testReport)** for PR 19829 at commit [`c7cffe9`](https://github.com/apache/spark/commit/c7cffe9f284762432d1d320845c60f8586f434af). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84669/testReport)** for PR 19829 at commit [`9a07d2b`](https://github.com/apache/spark/commit/9a07d2b64c506a6d822dea27ca76255c084569f4). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84669/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19829: [WIP]Upgrade Netty to 4.1.17
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/19829#discussion_r155904309 --- Diff: common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java --- @@ -203,6 +203,63 @@ public long transfered() { return transferred; } +@Override --- End diff -- Good point! Updated! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84669/testReport)** for PR 19829 at commit [`9a07d2b`](https://github.com/apache/spark/commit/9a07d2b64c506a6d822dea27ca76255c084569f4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84668/testReport)** for PR 19829 at commit [`94df9ae`](https://github.com/apache/spark/commit/94df9ae0bf5216f70c17b8aed297cda22f9566f4). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MessageWithHeader extends AbstractFileRegion ` * ` static class EncryptedMessage extends AbstractFileRegion ` * `public abstract class AbstractFileRegion extends AbstractReferenceCounted implements FileRegion ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19829 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84668/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19829 **[Test build #84668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84668/testReport)** for PR 19829 at commit [`94df9ae`](https://github.com/apache/spark/commit/94df9ae0bf5216f70c17b8aed297cda22f9566f4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case assump...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19930 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case assump...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84662/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case assump...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19930 **[Test build #84662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84662/testReport)** for PR 19930 at commit [`2449812`](https://github.com/apache/spark/commit/244981250ecfed35c53db96740284bcdb83fa0db). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r155900585 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -119,5 +117,46 @@ private[spark] object Config extends Logging { "must be a positive integer") .createWithDefault(10) + val WAIT_FOR_APP_COMPLETION = +ConfigBuilder("spark.kubernetes.submission.waitAppCompletion") + .doc("In cluster mode, whether to wait for the application to finish before exiting the " + +"launcher process.") + .booleanConf + .createWithDefault(true) + + val REPORT_INTERVAL = +ConfigBuilder("spark.kubernetes.report.interval") + .doc("Interval between reports of the current app status in cluster mode.") + .timeConf(TimeUnit.MILLISECONDS) + .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.") + .createWithDefaultString("1s") + + private[spark] val JARS_DOWNLOAD_LOCATION = --- End diff -- nit: `private[spark]` is redundant in this object. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r155900650 --- Diff: docs/configuration.md --- @@ -157,13 +157,31 @@ of the most common options to set are: or in your default properties file. + + spark.driver.memoryOverhead + driverMemory * 0.10, with minimum of 384 + +The amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode. This is +memory that accounts for things like VM overheads, interned strings, other native overheads, etc. +This tends to grow with the container size (typically 6-10%). --- End diff -- Should mention that not all cluster managers support this option, since this is now in the common configuration doc. Same below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19769 **[Test build #84667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84667/testReport)** for PR 19769 at commit [`1ea75c0`](https://github.com/apache/spark/commit/1ea75c0a8f2c5fed33b2a6d6102ad1d8bdf73906). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/19769#discussion_r155870601 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -363,9 +370,25 @@ class ParquetFileFormat fileSplit.getLocations, null) + val sharedConf = broadcastedHadoopConf.value.value + // PARQUET_INT96_TIMESTAMP_CONVERSION says to apply timezone conversions to int96 timestamps' + // *only* if the file was created by something other than "parquet-mr", so check the actual + // writer here for this file. We have to do this per-file, as each file in the table may + // have different writers. + def isCreatedByParquetMr(): Boolean = { +val footer = ParquetFileReader.readFooter(sharedConf, fileSplit.getPath, SKIP_ROW_GROUPS) +footer.getFileMetaData().getCreatedBy().startsWith("parquet-mr") + } + val convertTz = +if (timestampConversion && !isCreatedByParquetMr()) { + Some(DateTimeUtils.getTimeZone(sharedConf.get(SQLConf.SESSION_LOCAL_TIMEZONE.key))) +} else { + None +} + val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 0), 0) val hadoopAttemptContext = -new TaskAttemptContextImpl(broadcastedHadoopConf.value.value, attemptId) +new TaskAttemptContextImpl(broadcastedHadoopConf.value.value, attemptId); --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/19769#discussion_r155870513 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java --- @@ -105,10 +112,19 @@ */ private final MemoryMode MEMORY_MODE; - public VectorizedParquetRecordReader(boolean useOffHeap) { + public VectorizedParquetRecordReader(TimeZone convertTz, boolean useOffHeap) { +this.convertTz = convertTz; MEMORY_MODE = useOffHeap ? MemoryMode.OFF_HEAP : MemoryMode.ON_HEAP; } + public VectorizedParquetRecordReader(boolean useOffHeap) { +this(null, useOffHeap); + } + + VectorizedParquetRecordReader() { --- End diff -- Removed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r155898310 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.submit + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.ConfigurationUtils +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.steps._ +import org.apache.spark.launcher.SparkLauncher +import org.apache.spark.util.SystemClock + +/** + * Constructs the complete list of driver configuration steps to run to deploy the Spark driver. + */ +private[spark] class DriverConfigurationStepsOrchestrator( +namespace: String, +kubernetesAppId: String, +launchTime: Long, +mainAppResource: Option[MainAppResource], +appName: String, +mainClass: String, +appArgs: Array[String], +submissionSparkConf: SparkConf) { + + // The resource name prefix is derived from the application name, making it easy to connect the + // names of the Kubernetes resources from e.g. kubectl or the Kubernetes dashboard to the + // application the user submitted. However, we can't use the application name in the label, as + // label values are considerably restrictive, e.g. must be no longer than 63 characters in + // length. So we generate a separate identifier for the app ID itself, and bookkeeping that + // requires finding "all pods for this application" should use the kubernetesAppId. + private val kubernetesResourceNamePrefix = + s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-") + private val dockerImagePullPolicy = submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY) + private val jarsDownloadPath = submissionSparkConf.get(JARS_DOWNLOAD_LOCATION) + private val filesDownloadPath = submissionSparkConf.get(FILES_DOWNLOAD_LOCATION) + + def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = { +val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs( + submissionSparkConf, + KUBERNETES_DRIVER_LABEL_PREFIX) +require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with key " + + s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark bookkeeping " + + "operations.") +require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with key " + + s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark bookkeeping " + + "operations.") + +val allDriverLabels = driverCustomLabels ++ Map( + SPARK_APP_ID_LABEL -> kubernetesAppId, + SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE) + +val initialSubmissionStep = new BaseDriverConfigurationStep( + kubernetesAppId, + kubernetesResourceNamePrefix, + allDriverLabels, + dockerImagePullPolicy, + appName, + mainClass, + appArgs, + submissionSparkConf) + +val driverAddressStep = new DriverServiceBootstrapStep( + kubernetesResourceNamePrefix, + allDriverLabels, + submissionSparkConf, + new SystemClock) + +val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep( + submissionSparkConf, kubernetesResourceNamePrefix) + +val additionalMainAppJar = if (mainAppResource.nonEmpty) { + val mayBeResource = mainAppResource.get match { +case JavaMainAppResource(resource) if resource != SparkLauncher.NO_RESOURCE => + Some(resource) +case _ => Option.empty --- End diff -- Ah, I may have forgotten to actually make the change. Anyway, It's done now https://github.com/apache/spark/pull/19717/commits/7d2b30373b2e4d8d5311e10c3f9a62a2d900d568#dif
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19929 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19929 **[Test build #84660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84660/testReport)** for PR 19929 at commit [`6187d5a`](https://github.com/apache/spark/commit/6187d5a0df7c409a49cd636eb74dea9323044c6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19717 **[Test build #84658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84658/testReport)** for PR 19717 at commit [`7d2b303`](https://github.com/apache/spark/commit/7d2b30373b2e4d8d5311e10c3f9a62a2d900d568). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19717#discussion_r155896644 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.submit + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.ConfigurationUtils +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.steps._ +import org.apache.spark.launcher.SparkLauncher +import org.apache.spark.util.SystemClock + +/** + * Constructs the complete list of driver configuration steps to run to deploy the Spark driver. + */ +private[spark] class DriverConfigurationStepsOrchestrator( +namespace: String, +kubernetesAppId: String, +launchTime: Long, +mainAppResource: Option[MainAppResource], +appName: String, +mainClass: String, +appArgs: Array[String], +submissionSparkConf: SparkConf) { + + // The resource name prefix is derived from the application name, making it easy to connect the + // names of the Kubernetes resources from e.g. kubectl or the Kubernetes dashboard to the + // application the user submitted. However, we can't use the application name in the label, as + // label values are considerably restrictive, e.g. must be no longer than 63 characters in + // length. So we generate a separate identifier for the app ID itself, and bookkeeping that + // requires finding "all pods for this application" should use the kubernetesAppId. + private val kubernetesResourceNamePrefix = + s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-") + private val dockerImagePullPolicy = submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY) + private val jarsDownloadPath = submissionSparkConf.get(JARS_DOWNLOAD_LOCATION) + private val filesDownloadPath = submissionSparkConf.get(FILES_DOWNLOAD_LOCATION) + + def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = { +val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs( + submissionSparkConf, + KUBERNETES_DRIVER_LABEL_PREFIX) +require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with key " + + s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark bookkeeping " + + "operations.") +require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with key " + + s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark bookkeeping " + + "operations.") + +val allDriverLabels = driverCustomLabels ++ Map( + SPARK_APP_ID_LABEL -> kubernetesAppId, + SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE) + +val initialSubmissionStep = new BaseDriverConfigurationStep( + kubernetesAppId, + kubernetesResourceNamePrefix, + allDriverLabels, + dockerImagePullPolicy, + appName, + mainClass, + appArgs, + submissionSparkConf) + +val driverAddressStep = new DriverServiceBootstrapStep( + kubernetesResourceNamePrefix, + allDriverLabels, + submissionSparkConf, + new SystemClock) + +val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep( + submissionSparkConf, kubernetesResourceNamePrefix) + +val additionalMainAppJar = if (mainAppResource.nonEmpty) { + val mayBeResource = mainAppResource.get match { +case JavaMainAppResource(resource) if resource != SparkLauncher.NO_RESOURCE => + Some(resource) +case _ => Option.empty --- End diff -- I don't follow. The code still says `Option.empty` when Matt asked it to be replace with `None`. --- - T
[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19751 **[Test build #84665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84665/testReport)** for PR 19751 at commit [`2606fcd`](https://github.com/apache/spark/commit/2606fcd6493ce7a57f3555c2613d43f1a0391bf7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19925 **[Test build #84664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84664/testReport)** for PR 19925 at commit [`4d166de`](https://github.com/apache/spark/commit/4d166ded90b071332c42704070e98e581fa92042). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19591 **[Test build #84666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84666/testReport)** for PR 19591 at commit [`8013766`](https://github.com/apache/spark/commit/8013766d730b9fa14b9d0c71d527dfcfcead8af1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 LGTM, but I'll wait for the PR title & description updates to merge this. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r155896038 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.util + +import java.io.File + +import org.scalatest.Suite + +import org.apache.spark.SparkContext +import org.apache.spark.ml.Transformer +import org.apache.spark.sql.{DataFrame, Encoder, Row} +import org.apache.spark.sql.execution.streaming.MemoryStream +import org.apache.spark.sql.streaming.StreamTest +import org.apache.spark.sql.test.TestSparkSession +import org.apache.spark.util.Utils + +trait MLTest extends StreamTest with TempDirectory { self: Suite => + + @transient var sc: SparkContext = _ + @transient var checkpointDir: String = _ + + protected override def createSparkSession: TestSparkSession = { +new TestSparkSession(new SparkContext("local[2]", "MLlibUnitTest", sparkConf)) + } + + override def beforeAll(): Unit = { +super.beforeAll() +sc = spark.sparkContext +checkpointDir = Utils.createDirectory(tempDir.getCanonicalPath, "checkpoints").toString +sc.setCheckpointDir(checkpointDir) + } + + override def afterAll() { --- End diff -- Well, we'll find out in a few weeks : ) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 Also, can you please remove "WIP" from the PR title and update the Testing part of the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #84663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84663/testReport)** for PR 19884 at commit [`fdba406`](https://github.com/apache/spark/commit/fdba406f29216b8ef592de45dc36461217113410). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84663/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #84663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84663/testReport)** for PR 19884 at commit [`fdba406`](https://github.com/apache/spark/commit/fdba406f29216b8ef592de45dc36461217113410). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org