[GitHub] spark issue #21265: [SPARK-24146][PySpark][ML] spark.ml parity for sequentia...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21265 **[Test build #91325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91325/testReport)** for PR 21265 at commit [`0be3a94`](https://github.com/apache/spark/commit/0be3a94d27f4203608ef82d2ef197b37606c53b3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r191995667 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="items", def _create_model(self, java_model): return FPGrowthModel(java_model) + + +class PrefixSpan(object): +""" +.. note:: Experimental + +A parallel PrefixSpan algorithm to mine frequent sequential patterns. +The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns +Efficiently by Prefix-Projected Pattern Growth +(see http://doi.org/10.1109/ICDE.2001.914830;>here). + +.. versionadded:: 2.4.0 + +""" +@staticmethod +@since("2.4.0") +def findFrequentSequentialPatterns(dataset, + sequenceCol, + minSupport, + maxPatternLength, + maxLocalProjDBSize): +""" +.. note:: Experimental +Finds the complete set of frequent sequential patterns in the input sequences of itemsets. + +:param dataset: A dataset or a dataframe containing a sequence column which is +`Seq[Seq[_]]` type. +:param sequenceCol: The name of the sequence column in dataset, rows with nulls in this +column are ignored. +:param minSupport: The minimal support level of the sequential pattern, any pattern that + appears more than (minSupport * size-of-the-dataset) times will be + output (recommended value: `0.1`). +:param maxPatternLength: The maximal length of the sequential pattern + (recommended value: `10`). +:param maxLocalProjDBSize: The maximum number of items (including delimiters used in the + internal storage format) allowed in a projected database before + local processing. If a projected database exceeds this size, + another iteration of distributed prefix growth is run + (recommended value: `3200`). +:return: A `DataFrame` that contains columns of sequence and corresponding frequency. + The schema of it will be: + - `sequence: Seq[Seq[T]]` (T is the item type) + - `freq: Long` + +>>> from pyspark.ml.fpm import PrefixSpan +>>> from pyspark.sql import Row +>>> df = sc.parallelize([Row(sequence=[[1, 2], [3]]), --- End diff -- I think it is better to be put in a example. @mengxr What do you think ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21362: [SPARK-24197][SparkR][FOLLOWUP] Fixing failing te...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21362#discussion_r191994811 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1504,15 +1504,16 @@ test_that("column functions", { expect_equal(result, "cba") # Test array_sort() and sort_array() - df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 5L, NA, 4L + df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 5L, NA, 4L + as_integer_lists <- function(x) lapply(x, lapply, as.integer) - result <- collect(select(df, array_sort(df[[1]])))[[1]] - expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, NA))) + result <- as_integer_lists(collect(select(df, array_sort(df[[1]])))[[1]]) --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21454 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/21092 Will resolve comments today @mccheah --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20276 Hi, @cloud-fan . If this PR is still valid, could you resolve the conflicts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91318/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21454 **[Test build #91318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91318/testReport)** for PR 21454 at commit [`38ffa3e`](https://github.com/apache/spark/commit/38ffa3e551c7eee69d12cc736d33d137abd333b7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21362: [SPARK-24197][SparkR][FOLLOWUP] Fixing failing te...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21362#discussion_r191986143 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1504,15 +1504,16 @@ test_that("column functions", { expect_equal(result, "cba") # Test array_sort() and sort_array() - df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 5L, NA, 4L + df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 5L, NA, 4L + as_integer_lists <- function(x) lapply(x, lapply, as.integer) - result <- collect(select(df, array_sort(df[[1]])))[[1]] - expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, NA))) + result <- as_integer_lists(collect(select(df, array_sort(df[[1]])))[[1]]) --- End diff -- `as_integer_lists` doesn't seem like the right approach - that just basically changes everything returned into integer (even though it might not be returned that way from JVM) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91319/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91319/testReport)** for PR 21366 at commit [`260d82c`](https://github.com/apache/spark/commit/260d82ca9fbbd16ad8174d0dafa2f95bc177a219). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21362: [SPARK-24197][SparkR][FOLLOWUP] Fixing failing tests for...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21362 ok, let us know if you have more information on this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21453 I filed an issue in Scala community about the interface changes, and they said those REPL apis are intended to be private. https://github.com/scala/bug/issues/10913 Being said that, they gave us couple ways to walk around it, and I'm testing it now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21455 Simply making these fields publicly accessible seems a little weird from Spark's side. Maybe we can use reflection instead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21454 LGTM Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3716/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21453 Scala 2.11.12 cannot be built against with current Spark, due to some method changes in REPL. We have tried internally. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21346 **[Test build #91324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91324/testReport)** for PR 21346 at commit [`83c3271`](https://github.com/apache/spark/commit/83c3271d2f45bbef18d865bddbc6807e9fbd2503). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91316/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21456 From my understanding, that option is only available with G1GC, which is not really a good fit for spark (forget the exact details but something about humongous allocations which are common with all the large byte buffers typical for spark). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21454 **[Test build #91316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91316/testReport)** for PR 21454 at commit [`b7ff38f`](https://github.com/apache/spark/commit/b7ff38f16c91b7df326f49d1f821b14a6dc82e8d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21346 > is this effectively dead code at this point? yes, thats right. this PR by itself is not useful. Its a step towards https://github.com/apache/spark/pull/21451 This is a good point to put in the PR summary -- I'll do that, and also your summary notes above, if you don't mind. > what are the major risks of this change in terms of introducing performance or correctness issues? If we identify risks (e.g. "this is a historically tricky area of code?"), can we mitigate those risks through correctness testing / load testing? I've made an effort to make minimal modifications to all existing code paths, to minimize the risk of introducing bugs in current functionality. My intention is to only turn it on by default initially for cases we know would fail with the old code -- when the data is > 2gb ([SPARK-24297](https://issues.apache.org/jira/browse/SPARK-24297)). I've added unit tests and shared the test I'm doing on a cluster just to find holes in functionality (posted on the parent jira here: https://issues.apache.org/jira/browse/SPARK-6235?focusedCommentId=16484069=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16484069). I have not done load testing yet but plan to. Extra testing, of course, would certainly be good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191981552 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java --- @@ -38,15 +38,24 @@ * * This method will not be called in parallel for a single TransportClient (i.e., channel). * + * The rpc *might* included a data stream in streamData (eg. for uploading a large + * amount of data which should not be buffered in memory here). Any errors while handling the + * streamData will lead to failing this entire connection -- all other in-flight rpcs will fail. --- End diff -- pretty good question actually :) I will take a closer look at this myself but I believe this connection is shared by other tasks running on the same executor which are trying to talk to the same destination. So that might mean another task which is replicating to the same destination, or reading data from that same remote executor. those don't have specific retry behavior for connection closed -- that might result in the data just not getting replicated, fetching data from elsewhere, or the task getting retried. I think this is actually OK -- the existing code could cause an OOM on the remote end anyway, which obviously would fail a lot more. This failure behavior seems reasonable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21010 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3715/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresh...
Github user ivoson commented on a diff in the pull request: https://github.com/apache/spark/pull/21400#discussion_r191980682 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -34,7 +34,8 @@ class ContinuousSuiteBase extends StreamTest { new SparkContext( "local[10]", "continuous-stream-test-sql-context", - sparkConf.set("spark.sql.testkey", "true"))) + sparkConf.set("spark.sql.testkey", "true") +.set("spark.sql.streaming.minBatchesToRetain", "2"))) --- End diff -- @jose-torres Thanks for pointing this out. I will make a separate suite for clarity and isolation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21445 > I think the best way to do it is to make the shuffle writer responsible for incrementing the epoch within its task, the same way the data source writer does currently. Yeah, @LiangchangZ please consider this way. The writer part of a task is responsible to pull data from upstream. It's more consistent and wouldn't break existing logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21010 **[Test build #91323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91323/testReport)** for PR 21010 at commit [`9ccb648`](https://github.com/apache/spark/commit/9ccb6488f6f8309e0cfa71c4b332e6d680f24ffa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21010 LGTM pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21010 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21010 Thanks, sounds good. Let me retrigger the build just for checking again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191979425 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.protocol; + +import java.io.IOException; +import java.nio.ByteBuffer; + +import com.google.common.base.Objects; +import io.netty.buffer.ByteBuf; + +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.buffer.NettyManagedBuffer; + +/** + * An RPC with data that is sent outside of the frame, so it can be read as a stream. + */ +public final class UploadStream extends AbstractMessage implements RequestMessage { + /** Used to link an RPC request with its response. */ + public final long requestId; + public final ManagedBuffer meta; + public final long bodyByteCount; + + public UploadStream(long requestId, ManagedBuffer meta, ManagedBuffer body) { +super(body, false); // body is *not* included in the frame +this.requestId = requestId; +this.meta = meta; +bodyByteCount = body.size(); + } + + // this version is called when decoding the bytes on the receiving end. The body is handled + // separately. + private UploadStream(long requestId, ManagedBuffer meta, long bodyByteCount) { +super(null, false); +this.requestId = requestId; +this.meta = meta; +this.bodyByteCount = bodyByteCount; + } + + @Override + public Type type() { return Type.UploadStream; } + + @Override + public int encodedLength() { +// the requestId, meta size, meta and bodyByteCount (body is not included) +return 8 + 4 + ((int) meta.size()) + 8; + } + + @Override + public void encode(ByteBuf buf) { +buf.writeLong(requestId); +try { + ByteBuffer metaBuf = meta.nioByteBuffer(); + buf.writeInt(metaBuf.remaining()); + buf.writeBytes(metaBuf); +} catch (IOException io) { + throw new RuntimeException(io); +} +buf.writeLong(bodyByteCount); + } + + public static UploadStream decode(ByteBuf buf) { +long requestId = buf.readLong(); +int metaSize = buf.readInt(); +ManagedBuffer meta = new NettyManagedBuffer(buf.readRetainedSlice(metaSize)); +long bodyByteCount = buf.readLong(); +// This is called by the frame decoder, so the data is still null. We need a StreamInterceptor +// to read the data. +return new UploadStream(requestId, meta, bodyByteCount); + } + + @Override + public int hashCode() { +return Objects.hashCode(requestId, body()); + } + + @Override + public boolean equals(Object other) { +if (other instanceof UploadStream) { + UploadStream o = (UploadStream) other; + return requestId == o.requestId && super.equals(o); +} +return false; + } + + @Override + public String toString() { +return Objects.toStringHelper(this) + .add("requestId", requestId) + .add("body", body()) --- End diff -- to be honest, this was also just parroted from other classes -- looking now at implementations of ManagedBuffer, if they have a `toString()` it does something reasonable. Is that actually useful for debugging? maybe not, don't think I ever actually looked at this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191979019 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.protocol; + +import java.io.IOException; +import java.nio.ByteBuffer; + +import com.google.common.base.Objects; +import io.netty.buffer.ByteBuf; + +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.buffer.NettyManagedBuffer; + +/** + * An RPC with data that is sent outside of the frame, so it can be read as a stream. + */ +public final class UploadStream extends AbstractMessage implements RequestMessage { + /** Used to link an RPC request with its response. */ + public final long requestId; + public final ManagedBuffer meta; + public final long bodyByteCount; + + public UploadStream(long requestId, ManagedBuffer meta, ManagedBuffer body) { +super(body, false); // body is *not* included in the frame +this.requestId = requestId; +this.meta = meta; +bodyByteCount = body.size(); + } + + // this version is called when decoding the bytes on the receiving end. The body is handled + // separately. + private UploadStream(long requestId, ManagedBuffer meta, long bodyByteCount) { +super(null, false); +this.requestId = requestId; +this.meta = meta; +this.bodyByteCount = bodyByteCount; + } + + @Override + public Type type() { return Type.UploadStream; } + + @Override + public int encodedLength() { +// the requestId, meta size, meta and bodyByteCount (body is not included) +return 8 + 4 + ((int) meta.size()) + 8; + } + + @Override + public void encode(ByteBuf buf) { +buf.writeLong(requestId); +try { + ByteBuffer metaBuf = meta.nioByteBuffer(); + buf.writeInt(metaBuf.remaining()); + buf.writeBytes(metaBuf); +} catch (IOException io) { + throw new RuntimeException(io); +} +buf.writeLong(bodyByteCount); + } + + public static UploadStream decode(ByteBuf buf) { +long requestId = buf.readLong(); +int metaSize = buf.readInt(); +ManagedBuffer meta = new NettyManagedBuffer(buf.readRetainedSlice(metaSize)); +long bodyByteCount = buf.readLong(); +// This is called by the frame decoder, so the data is still null. We need a StreamInterceptor +// to read the data. +return new UploadStream(requestId, meta, bodyByteCount); + } + + @Override + public int hashCode() { +return Objects.hashCode(requestId, body()); --- End diff -- this is a good point. Admittedly I just copied this from `StreamResponse` without thinking about it too much -- that class exhibits the same issue. I'll remove `body` from both. (In practice, we're not using sticking them in hashmaps now so there wouldn't be any bugs in behavior because of this.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191978545 --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java --- @@ -141,26 +141,14 @@ public void fetchChunk( StreamChunkId streamChunkId = new StreamChunkId(streamId, chunkIndex); handler.addFetchRequest(streamChunkId, callback); -channel.writeAndFlush(new ChunkFetchRequest(streamChunkId)).addListener(future -> { --- End diff -- yes exactly. Marcelo asked for this refactoring in his review -- there was already a ton of copy-paste, and instead of adding more made sense to refactor. Shouldn't be any behavior change (there are minor changes that shouldn't matter ... `channel.close()` happens before the more specific cleanup operations whereas it was in the middle previously, the `try` encompasses a bit more than before.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191978140 --- Diff: common/network-common/src/main/java/org/apache/spark/network/client/StreamInterceptor.java --- @@ -50,16 +52,22 @@ @Override public void exceptionCaught(Throwable cause) throws Exception { -handler.deactivateStream(); +deactivateStream(); callback.onFailure(streamId, cause); } @Override public void channelInactive() throws Exception { -handler.deactivateStream(); +deactivateStream(); callback.onFailure(streamId, new ClosedChannelException()); } + private void deactivateStream() { +if (handler instanceof TransportResponseHandler) { --- End diff -- the only purpose of `TransportResponseHandler.deactivateStream()` is to include the stream request in the count for `numOutstandingRequests` (its not doing any critical cleanup). I will include a comment here explaining that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191976952 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/StreamData.java --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.server; + +import java.io.IOException; +import java.nio.ByteBuffer; + +import org.apache.spark.network.client.RpcResponseCallback; +import org.apache.spark.network.client.StreamCallback; +import org.apache.spark.network.client.StreamInterceptor; +import org.apache.spark.network.util.TransportFrameDecoder; + +/** + * A holder for streamed data sent along with an RPC message. + */ +public class StreamData { + + private final TransportRequestHandler handler; + private final TransportFrameDecoder frameDecoder; + private final RpcResponseCallback rpcCallback; + private final ByteBuffer meta; --- End diff -- whoops, you're right. I was using this at one point in the follow-on patch, then changed it and didn't fully clean this up. thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91314/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21464 **[Test build #91314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91314/testReport)** for PR 21464 at commit [`90c9ddc`](https://github.com/apache/spark/commit/90c9ddca2ecb458ccde2945ab67548403c3b4256). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21464 **[Test build #4191 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4191/testReport)** for PR 21464 at commit [`90c9ddc`](https://github.com/apache/spark/commit/90c9ddca2ecb458ccde2945ab67548403c3b4256). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user xdcjie commented on the issue: https://github.com/apache/spark/pull/21447 @maropu @gatorsmile Do you have any comment/suggestion for this PR? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91315/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21454 **[Test build #91315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91315/testReport)** for PR 21454 at commit [`c1b4d16`](https://github.com/apache/spark/commit/c1b4d1670fe1bb5c2b9b0d66c14e3da26627d29e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21453 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21453 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91322/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21453 **[Test build #91322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91322/testReport)** for PR 21453 at commit [`90d3842`](https://github.com/apache/spark/commit/90d3842616aec94d603a68d44463eb043c5a66f9). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21456 **[Test build #91321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91321/testReport)** for PR 21456 at commit [`88bb478`](https://github.com/apache/spark/commit/88bb4780d20ad952aa1936f4e78a420d9baf0f2c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21453 **[Test build #91322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91322/testReport)** for PR 21453 at commit [`90d3842`](https://github.com/apache/spark/commit/90d3842616aec94d603a68d44463eb043c5a66f9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21456 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21010 > Basically LGTM, but I'm wondering what if the expr2 is not like a format string? The same as Hive: ```sql spark-sql> SELECT format_number(12332.123456, 'abc'); abc12332 ``` ```sql hive> SELECT format_number(12332.123456, 'abc'); OK abc12332 Time taken: 0.218 seconds, Fetched: 1 row(s) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191969840 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Makes sense but +1 for leaving it out since I either don't know how commonly it will be used for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21378#discussion_r191969654 --- Diff: docs/running-on-mesos.md --- @@ -753,6 +753,16 @@ See the [configuration page](configuration.html) for information on Spark config spark.cores.max is reached ++ --- End diff -- +1 for reverting whitespaces. Let's leave this minimised and targeted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21378: [SPARK-24326][Mesos] add support for local:// scheme for...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21378 @felixcheung yup. I was just looking at this PR out of my curiosity. I don't currently have an env to test Mesos. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21426 @vanzin and @jerryshao, thanks you so much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21464 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21464 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91313/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21464 **[Test build #91313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91313/testReport)** for PR 21464 at commit [`aad159c`](https://github.com/apache/spark/commit/aad159c561094b53a719c8950fa087dacd1d9d8d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3581/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191966223 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- Ok, it's important for me to be clear here. There are currently two PRBs. This will continue in the immediate future. 1. General Spark PRB, mainly for unit tests. This can run on all hosts. 2. K8s integration-specific PRB. This early-outs on many PRs that don't seem relevant. This is specifically for running K8s integration tests, and can only run on some hosts. Because of the host restriction issue, these are two separate PRBs. It is definitely true that each one of these will build the main Spark jars separately, so that 11 minute time will be spent twice. Since the K8s-integration PRB is only doing this on a small set of PRs, it's not a significant cost to the Jenkins infrastructure. Within the K8s-integration PRB, the entire maven reactor is only built once, during the make distribution step. The integration test step doesn't rebuild it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3714/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3581/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191965147 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- Sorry, I mean if the Spark PRB will try to build the entire maven reactor twice - once for unit tests and once for integration tests. The TGZ bundling in and of itself I agree should be fast if the jars are already built by the maven reactor. But it's unclear to me if we'll end up building jars redundantly here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21346 Summary of key changes (WIP; notes to self): > Summary of changes: > > - Introduce a new `UploadStream` RPC which is sent to push a large payload as a stream (in contrast, the pre-existing `StreamRequest` and `StreamResponse` RPCs are used for pull-based streaming). > - Generalize `RpcHandler.receive()` to support requests which contain streams. > - Generalize `StreamInterceptor` to handle both request and response messages (previously it only handled responses). > - Introduce `StdChannelListener` to abstract away common logging logic in `ChannelFuture` listeners. Question: is this effectively dead code at this point? In other words, this PR just adds the lower-level pieces but there's nothing currently using the new API? So this patch as of now has no behavior change and actual functional changes impacting queries / actual usage will come later when we wire this up to the block replicator? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21346#discussion_r191964329 --- Diff: project/MimaExcludes.scala --- @@ -36,6 +36,9 @@ object MimaExcludes { // Exclude rules for 2.4.x lazy val v24excludes = v23excludes ++ Seq( +// [SPARK-6237][NETWORK] Network-layer changes to allow stream upload + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.network.netty.NettyBlockRpcServer.receive"), --- End diff -- I suspect that it's because we might want to access these across Java package boundaries and Java doesn't have the equivalent of Scala's nested package scoped `private[package]`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91320/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91320/testReport)** for PR 20697 at commit [`4102b25`](https://github.com/apache/spark/commit/4102b25537288fd4d6cf8267f07c52a79f49dd72). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91320/testReport)** for PR 20697 at commit [`4102b25`](https://github.com/apache/spark/commit/4102b25537288fd4d6cf8267f07c52a79f49dd72). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191962980 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- I timed the build on my laptop. To build the Spark jars took just over 11 minutes. To build the .tgz took about 7 seconds. So this extra step adds ~1% overhead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21010 Basically LGTM, but I'm wondering what if the `expr2` is not like a format string? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191961749 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- Also - Jenkins has *already* been doing that - building the distribution .tgz for each Kubernetes related PRB invokation. Relevant is the fact that the filtering of what is a Kubernetes-related change happens *before* the distribution .tgz is done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3580/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191961317 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- The act of building the .tgz is much cheaper (faster) than doing the java build or the integration test. I wouldn't worry about that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3713/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3580/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91317/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21465 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21465 **[Test build #91317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)** for PR 21465 at commit [`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HasValidationIndicatorCol(Params):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r191956713 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,117 @@ case class ArrayRepeat(left: Expression, right: Expression) } } + +/** + * Remove all elements that equal to element from the given array + */ +@ExpressionDescription( + usage = "_FUNC_(array, element) - Remove all elements that equal to element from array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3), 3); + [1,2,null] + """, since = "2.4.0") +case class ArrayRemove(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = left.dataType + + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + lazy val elementType: DataType = left.dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(right.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (!left.dataType.isInstanceOf[ArrayType] + || left.dataType.asInstanceOf[ArrayType].elementType != right.dataType) { + TypeCheckResult.TypeCheckFailure( +"Arguments must be an array followed by a value of same type as the array members") +} else { + TypeUtils.checkForOrderingExpr(right.dataType, s"function $prettyName") +} + } + + override def nullSafeEval(arr: Any, value: Any): Any = { +val newArray = new Array[Any](arr.asInstanceOf[ArrayData].numElements()) +var pos = 0 +arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => + if (v == null || !ordering.equiv(v, value)) { +newArray(pos) = v +pos += 1 + } +) +new GenericArrayData(newArray.slice(0, pos)) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr, value) => { + val numsToRemove = ctx.freshName("numsToRemove") + val newArraySize = ctx.freshName("newArraySize") + val i = ctx.freshName("i") + val getValue = CodeGenerator.getValue(arr, elementType, i) + val isEqual = ctx.genEqual(elementType, value, getValue) + s""" + |int $numsToRemove = 0; + |for (int $i = 0; $i < $arr.numElements(); $i ++) { + | if (!$arr.isNullAt($i) && $isEqual) { + |$numsToRemove = $numsToRemove + 1; + | } + |} + |int $newArraySize = $arr.numElements() - $numsToRemove; + |${genCodeForResult(ctx, ev, arr, value, newArraySize)} + """.stripMargin +}) + } + + def genCodeForResult( + ctx: CodegenContext, + ev: ExprCode, + inputArray: String, + value: String, + newArraySize: String): String = { +val values = ctx.freshName("values") +val i = ctx.freshName("i") +val pos = ctx.freshName("pos") +val getValue = CodeGenerator.getValue(inputArray, elementType, i) +val isEqual = ctx.genEqual(elementType, value, getValue) +if (!CodeGenerator.isPrimitiveType(elementType)) { + val arrayClass = classOf[GenericArrayData].getName + s""" + |int $pos = 0; + |Object[] $values = new Object[$newArraySize]; + |for (int $i = 0; $i < $inputArray.numElements(); $i ++) { + | if (!($isEqual)) { --- End diff -- Don't we need to check null? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r191955752 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,117 @@ case class ArrayRepeat(left: Expression, right: Expression) } } + +/** + * Remove all elements that equal to element from the given array + */ +@ExpressionDescription( + usage = "_FUNC_(array, element) - Remove all elements that equal to element from array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3), 3); + [1,2,null] + """, since = "2.4.0") +case class ArrayRemove(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = left.dataType + + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + lazy val elementType: DataType = left.dataType.asInstanceOf[ArrayType].elementType + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(right.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (!left.dataType.isInstanceOf[ArrayType] + || left.dataType.asInstanceOf[ArrayType].elementType != right.dataType) { --- End diff -- Maybe we need to change here as well. We can follow the implementation of `ArrayPosition`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r191955459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1882,3 +1882,117 @@ case class ArrayRepeat(left: Expression, right: Expression) } } + +/** + * Remove all elements that equal to element from the given array + */ +@ExpressionDescription( + usage = "_FUNC_(array, element) - Remove all elements that equal to element from array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3, null, 3), 3); + [1,2,null] + """, since = "2.4.0") +case class ArrayRemove(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = left.dataType + + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) --- End diff -- This will cause `ClassCastException`. See #21401. Also could you add tests similar to tests added in #21401? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21069#discussion_r191958470 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -552,4 +552,60 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(ArrayRepeat(strArray, Literal(2)), Seq(Seq("hi", "hola"), Seq("hi", "hola"))) checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null) } + + test("Array remove") { +val a0 = Literal.create(Seq(1, 2, 3, 2, 2, 5), ArrayType(IntegerType)) +val a1 = Literal.create(Seq("b", "a", "a", "c", "b"), ArrayType(StringType)) +val a2 = Literal.create(Seq[String](null, "", null, ""), ArrayType(StringType)) +val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType)) +val a4 = Literal.create(null, ArrayType(StringType)) +val a5 = Literal.create(Seq(1, null, 8, 9, null), ArrayType(IntegerType)) +val a6 = Literal.create(Seq(true, false, false, true), ArrayType(BooleanType)) + +checkEvaluation(ArrayRemove(a0, Literal(0)), Seq(1, 2, 3, 2, 2, 5)) +checkEvaluation(ArrayRemove(a0, Literal(1)), Seq(2, 3, 2, 2, 5)) +checkEvaluation(ArrayRemove(a0, Literal(2)), Seq(1, 3, 5)) +checkEvaluation(ArrayRemove(a0, Literal(3)), Seq(1, 2, 2, 2, 5)) +checkEvaluation(ArrayRemove(a0, Literal(5)), Seq(1, 2, 3, 2, 2)) +checkEvaluation(ArrayRemove(a0, Literal(null, IntegerType)), null) + +checkEvaluation(ArrayRemove(a1, Literal("")), Seq("b", "a", "a", "c", "b")) +checkEvaluation(ArrayRemove(a1, Literal("a")), Seq("b", "c", "b")) +checkEvaluation(ArrayRemove(a1, Literal("b")), Seq("a", "a", "c")) +checkEvaluation(ArrayRemove(a1, Literal("c")), Seq("b", "a", "a", "b")) + +checkEvaluation(ArrayRemove(a2, Literal("")), Seq(null, null)) +checkEvaluation(ArrayRemove(a2, Literal(null, StringType)), null) + +checkEvaluation(ArrayRemove(a3, Literal(1)), Seq.empty[Integer]) + +checkEvaluation(ArrayRemove(a4, Literal("a")), null) + +checkEvaluation(ArrayRemove(a5, Literal(9)), Seq(1, null, 8, null)) +checkEvaluation(ArrayRemove(a6, Literal(false)), Seq(true, true)) + +// complex data types +val b0 = Literal.create(Seq[Array[Byte]](Array[Byte](5, 6), Array[Byte](1, 2), + Array[Byte](1, 2), Array[Byte](5, 6)), ArrayType(BinaryType)) +val b1 = Literal.create(Seq[Array[Byte]](Array[Byte](2, 1), null), + ArrayType(BinaryType)) +val b2 = Literal.create(Seq[Array[Byte]](null, Array[Byte](1, 2)), + ArrayType(BinaryType)) +val nullBinary = Literal.create(null, BinaryType) + +val dataToRemoved1 = Literal.create(Array[Byte](5, 6), BinaryType) +checkEvaluation(ArrayRemove(b0, dataToRemoved1), + Seq[Array[Byte]](Array[Byte](1, 2), Array[Byte](1, 2))) +checkEvaluation(ArrayRemove(b0, nullBinary), null) +checkEvaluation(ArrayRemove(b1, dataToRemoved1), Seq[Array[Byte]](Array[Byte](2, 1), null)) +checkEvaluation(ArrayRemove(b2, dataToRemoved1), Seq[Array[Byte]](null, Array[Byte](1, 2))) + +val c0 = Literal.create(Seq[Seq[Int]](Seq[Int](1, 2), Seq[Int](3, 4)), + ArrayType(ArrayType(IntegerType))) +val c1 = Literal.create(Seq[Seq[Int]](Seq[Int](5, 6), Seq[Int](2, 1)), + ArrayType(ArrayType(IntegerType))) --- End diff -- What if for `val c2 = Literal.create(Seq[Seq[Int]](null, Seq[Int](2, 1)), ArrayType(ArrayType(IntegerType)))`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21465 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3712/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21454 **[Test build #91318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91318/testReport)** for PR 21454 at commit [`38ffa3e`](https://github.com/apache/spark/commit/38ffa3e551c7eee69d12cc736d33d137abd333b7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91319/testReport)** for PR 21366 at commit [`260d82c`](https://github.com/apache/spark/commit/260d82ca9fbbd16ad8174d0dafa2f95bc177a219). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21465 **[Test build #91317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)** for PR 21465 at commit [`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21463: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIterati...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21463 oh, I meant I was wondering if you could have some time to give another try by fixing it in worker side. You could revert the current approach in your PR and try the fix in worker side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21092 @ifilonenko ping on this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191957294 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- Would the Jenkins framework tied into Spark pull requests have to build the tgz then? I expect that would create a non-trivial amount of overhead on each build. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21465 [SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API ## What changes were proposed in this pull request? Add validationIndicatorCol and validationTol to GBT Python. ## How was this patch tested? Add test in doctest to test the new API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-24333 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21465.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21465 commit 79fc83bae376c430a23fd7c9f502690a1c4d321e Author: Huaxin Gao Date: 2018-05-30T23:49:31Z [SPARK-24333][ML][PYTHON]Add fit with validation set to spark.ml GBT: Python API --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org