date:20180530

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21265
  
**[Test build #91325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91325/testReport)**
 for PR 21265 at commit 
[`0be3a94`](https://github.com/apache/spark/commit/0be3a94d27f4203608ef82d2ef197b37606c53b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-30 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21265#discussion_r191995667
  
--- Diff: python/pyspark/ml/fpm.py ---
@@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, 
itemsCol="items",
 
 def _create_model(self, java_model):
 return FPGrowthModel(java_model)
+
+
+class PrefixSpan(object):
+"""
+.. note:: Experimental
+
+A parallel PrefixSpan algorithm to mine frequent sequential patterns.
+The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: 
Mining Sequential Patterns
+Efficiently by Prefix-Projected Pattern Growth
+(see http://doi.org/10.1109/ICDE.2001.914830;>here).
+
+.. versionadded:: 2.4.0
+
+"""
+@staticmethod
+@since("2.4.0")
+def findFrequentSequentialPatterns(dataset,
+   sequenceCol,
+   minSupport,
+   maxPatternLength,
+   maxLocalProjDBSize):
+"""
+.. note:: Experimental
+Finds the complete set of frequent sequential patterns in the 
input sequences of itemsets.
+
+:param dataset: A dataset or a dataframe containing a sequence 
column which is
+`Seq[Seq[_]]` type.
+:param sequenceCol: The name of the sequence column in dataset, 
rows with nulls in this
+column are ignored.
+:param minSupport: The minimal support level of the sequential 
pattern, any pattern that
+   appears more than (minSupport * 
size-of-the-dataset) times will be
+   output (recommended value: `0.1`).
+:param maxPatternLength: The maximal length of the sequential 
pattern
+ (recommended value: `10`).
+:param maxLocalProjDBSize: The maximum number of items (including 
delimiters used in the
+   internal storage format) allowed in a 
projected database before
+   local processing. If a projected 
database exceeds this size,
+   another iteration of distributed prefix 
growth is run
+   (recommended value: `3200`).
+:return: A `DataFrame` that contains columns of sequence and 
corresponding frequency.
+ The schema of it will be:
+  - `sequence: Seq[Seq[T]]` (T is the item type)
+  - `freq: Long`
+
+>>> from pyspark.ml.fpm import PrefixSpan
+>>> from pyspark.sql import Row
+>>> df = sc.parallelize([Row(sequence=[[1, 2], [3]]),
--- End diff --

I think it is better to be put in a example. @mengxr What do you think ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21362: [SPARK-24197][SparkR][FOLLOWUP] Fixing failing te...

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21362#discussion_r191994811
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1504,15 +1504,16 @@ test_that("column functions", {
   expect_equal(result, "cba")
 
   # Test array_sort() and sort_array()
-  df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 
5L, NA, 4L
+  df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 5L, 
NA, 4L
+  as_integer_lists <- function(x) lapply(x, lapply, as.integer)
 
-  result <- collect(select(df, array_sort(df[[1]])))[[1]]
-  expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, 
NA)))
+  result <- as_integer_lists(collect(select(df, array_sort(df[[1]])))[[1]])
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...

2018-05-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21454


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-30 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/21092
  
Will resolve comments today @mccheah


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-05-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20276
  
Hi, @cloud-fan . If this PR is still valid, could you resolve the conflicts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91318/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21454
  
**[Test build #91318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91318/testReport)**
 for PR 21454 at commit 
[`38ffa3e`](https://github.com/apache/spark/commit/38ffa3e551c7eee69d12cc736d33d137abd333b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21362: [SPARK-24197][SparkR][FOLLOWUP] Fixing failing te...

2018-05-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21362#discussion_r191986143
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1504,15 +1504,16 @@ test_that("column functions", {
   expect_equal(result, "cba")
 
   # Test array_sort() and sort_array()
-  df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 
5L, NA, 4L
+  df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 5L, 
NA, 4L
+  as_integer_lists <- function(x) lapply(x, lapply, as.integer)
 
-  result <- collect(select(df, array_sort(df[[1]])))[[1]]
-  expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, 
NA)))
+  result <- as_integer_lists(collect(select(df, array_sort(df[[1]])))[[1]])
--- End diff --

`as_integer_lists` doesn't seem like the right approach - that just 
basically changes everything returned into integer (even though it might not be 
returned that way from JVM)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91319/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91319/testReport)**
 for PR 21366 at commit 
[`260d82c`](https://github.com/apache/spark/commit/260d82ca9fbbd16ad8174d0dafa2f95bc177a219).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21362: [SPARK-24197][SparkR][FOLLOWUP] Fixing failing tests for...

2018-05-30 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21362
  
ok, let us know if you have more information on this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-30 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21453
  
I filed an issue in Scala community about the interface changes, and they 
said those REPL apis are intended to be private. 
https://github.com/scala/bug/issues/10913

Being said that, they gave us couple ways to walk around it, and I'm 
testing it now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...

2018-05-30 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21455
  
Simply making these fields publicly accessible seems a little weird from 
Spark's side. Maybe we can use reflection instead.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21454
  
LGTM

Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3716/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-30 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21453
  
Scala 2.11.12 cannot be built against with current Spark, due to some 
method changes in REPL. We have tried internally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #91324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91324/testReport)**
 for PR 21346 at commit 
[`83c3271`](https://github.com/apache/spark/commit/83c3271d2f45bbef18d865bddbc6807e9fbd2503).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91316/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21456
  
From my understanding, that option is only available with G1GC, which is 
not really a good fit for spark (forget the exact details but something about 
humongous allocations which are common with all the large byte buffers typical 
for spark).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21454
  
**[Test build #91316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91316/testReport)**
 for PR 21454 at commit 
[`b7ff38f`](https://github.com/apache/spark/commit/b7ff38f16c91b7df326f49d1f821b14a6dc82e8d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21346

> is this effectively dead code at this point?

yes, thats right. this PR by itself is not useful. Its a step towards
https://github.com/apache/spark/pull/21451

This is a good point to put in the PR summary -- I'll do that, and also
your summary notes above, if you don't mind.

> what are the major risks of this change in terms of introducing
performance or correctness issues? If we identify risks (e.g. "this is a
historically tricky area of code?"), can we mitigate those risks through
correctness testing / load testing?

I've made an effort to make minimal modifications to all existing code
paths, to minimize the risk of introducing bugs in current functionality. My
intention is to only turn it on by default initially for cases we know would
fail with the old code -- when the data is > 2gb
([SPARK-24297](https://issues.apache.org/jira/browse/SPARK-24297)). I've added
unit tests and shared the test I'm doing on a cluster just to find holes in
functionality (posted on the parent jira here:
https://issues.apache.org/jira/browse/SPARK-6235?focusedCommentId=16484069=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16484069).
I have not done load testing yet but plan to. Extra testing, of course,
would certainly be good.

---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191981552
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java
 ---
@@ -38,15 +38,24 @@
*
* This method will not be called in parallel for a single 
TransportClient (i.e., channel).
*
+   * The rpc *might* included a data stream in streamData 
(eg. for uploading a large
+   * amount of data which should not be buffered in memory here).  Any 
errors while handling the
+   * streamData will lead to failing this entire connection -- all other 
in-flight rpcs will fail.
--- End diff --

pretty good question actually :)

I will take a closer look at this myself but I believe this connection is 
shared by other tasks running on the same executor which are trying to talk to 
the same destination.  So that might mean another task which is replicating to 
the same destination, or reading data from that same remote executor.  those 
don't have specific retry behavior for connection closed -- that might result 
in the data just not getting replicated, fetching data from elsewhere, or the 
task getting retried.

I think this is actually OK -- the existing code could cause an OOM on the 
remote end anyway, which obviously would fail a lot more.   This failure 
behavior seems reasonable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21010
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3715/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresh...

2018-05-30 Thread ivoson

Github user ivoson commented on a diff in the pull request:

https://github.com/apache/spark/pull/21400#discussion_r191980682
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
 ---
@@ -34,7 +34,8 @@ class ContinuousSuiteBase extends StreamTest {
 new SparkContext(
   "local[10]",
   "continuous-stream-test-sql-context",
-  sparkConf.set("spark.sql.testkey", "true")))
+  sparkConf.set("spark.sql.testkey", "true")
+.set("spark.sql.streaming.minBatchesToRetain", "2")))
--- End diff --

@jose-torres Thanks for pointing this out. I will make a separate suite for 
clarity and isolation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-30 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21445
  
> I think the best way to do it is to make the shuffle writer responsible 
for incrementing the epoch within its task, the same way the data source writer 
does currently.

Yeah, @LiangchangZ please consider this way. The writer part of a task is 
responsible to pull data from upstream. It's more consistent and wouldn't break 
existing logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21010
  
**[Test build #91323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91323/testReport)**
 for PR 21010 at commit 
[`9ccb648`](https://github.com/apache/spark/commit/9ccb6488f6f8309e0cfa71c4b332e6d680f24ffa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21010
  
LGTM pending Jenkins.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21010
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21010
  
Thanks, sounds good.
Let me retrigger the build just for checking again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191979425
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.protocol;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+import com.google.common.base.Objects;
+import io.netty.buffer.ByteBuf;
+
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.buffer.NettyManagedBuffer;
+
+/**
+ * An RPC with data that is sent outside of the frame, so it can be read 
as a stream.
+ */
+public final class UploadStream extends AbstractMessage implements 
RequestMessage {
+  /** Used to link an RPC request with its response. */
+  public final long requestId;
+  public final ManagedBuffer meta;
+  public final long bodyByteCount;
+
+  public UploadStream(long requestId, ManagedBuffer meta, ManagedBuffer 
body) {
+super(body, false); // body is *not* included in the frame
+this.requestId = requestId;
+this.meta = meta;
+bodyByteCount = body.size();
+  }
+
+  // this version is called when decoding the bytes on the receiving end.  
The body is handled
+  // separately.
+  private UploadStream(long requestId, ManagedBuffer meta, long 
bodyByteCount) {
+super(null, false);
+this.requestId = requestId;
+this.meta = meta;
+this.bodyByteCount = bodyByteCount;
+  }
+
+  @Override
+  public Type type() { return Type.UploadStream; }
+
+  @Override
+  public int encodedLength() {
+// the requestId, meta size, meta and bodyByteCount (body is not 
included)
+return 8 + 4 + ((int) meta.size()) + 8;
+  }
+
+  @Override
+  public void encode(ByteBuf buf) {
+buf.writeLong(requestId);
+try {
+  ByteBuffer metaBuf = meta.nioByteBuffer();
+  buf.writeInt(metaBuf.remaining());
+  buf.writeBytes(metaBuf);
+} catch (IOException io) {
+  throw new RuntimeException(io);
+}
+buf.writeLong(bodyByteCount);
+  }
+
+  public static UploadStream decode(ByteBuf buf) {
+long requestId = buf.readLong();
+int metaSize = buf.readInt();
+ManagedBuffer meta = new 
NettyManagedBuffer(buf.readRetainedSlice(metaSize));
+long bodyByteCount = buf.readLong();
+// This is called by the frame decoder, so the data is still null.  We 
need a StreamInterceptor
+// to read the data.
+return new UploadStream(requestId, meta, bodyByteCount);
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hashCode(requestId, body());
+  }
+
+  @Override
+  public boolean equals(Object other) {
+if (other instanceof UploadStream) {
+  UploadStream o = (UploadStream) other;
+  return requestId == o.requestId && super.equals(o);
+}
+return false;
+  }
+
+  @Override
+  public String toString() {
+return Objects.toStringHelper(this)
+  .add("requestId", requestId)
+  .add("body", body())
--- End diff --

to be honest, this was also just parroted from other classes -- looking now 
at implementations of ManagedBuffer, if they have a `toString()` it does 
something reasonable.

Is that actually useful for debugging?  maybe not, don't think I ever 
actually looked at this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191979019
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/protocol/UploadStream.java
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.protocol;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+import com.google.common.base.Objects;
+import io.netty.buffer.ByteBuf;
+
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.buffer.NettyManagedBuffer;
+
+/**
+ * An RPC with data that is sent outside of the frame, so it can be read 
as a stream.
+ */
+public final class UploadStream extends AbstractMessage implements 
RequestMessage {
+  /** Used to link an RPC request with its response. */
+  public final long requestId;
+  public final ManagedBuffer meta;
+  public final long bodyByteCount;
+
+  public UploadStream(long requestId, ManagedBuffer meta, ManagedBuffer 
body) {
+super(body, false); // body is *not* included in the frame
+this.requestId = requestId;
+this.meta = meta;
+bodyByteCount = body.size();
+  }
+
+  // this version is called when decoding the bytes on the receiving end.  
The body is handled
+  // separately.
+  private UploadStream(long requestId, ManagedBuffer meta, long 
bodyByteCount) {
+super(null, false);
+this.requestId = requestId;
+this.meta = meta;
+this.bodyByteCount = bodyByteCount;
+  }
+
+  @Override
+  public Type type() { return Type.UploadStream; }
+
+  @Override
+  public int encodedLength() {
+// the requestId, meta size, meta and bodyByteCount (body is not 
included)
+return 8 + 4 + ((int) meta.size()) + 8;
+  }
+
+  @Override
+  public void encode(ByteBuf buf) {
+buf.writeLong(requestId);
+try {
+  ByteBuffer metaBuf = meta.nioByteBuffer();
+  buf.writeInt(metaBuf.remaining());
+  buf.writeBytes(metaBuf);
+} catch (IOException io) {
+  throw new RuntimeException(io);
+}
+buf.writeLong(bodyByteCount);
+  }
+
+  public static UploadStream decode(ByteBuf buf) {
+long requestId = buf.readLong();
+int metaSize = buf.readInt();
+ManagedBuffer meta = new 
NettyManagedBuffer(buf.readRetainedSlice(metaSize));
+long bodyByteCount = buf.readLong();
+// This is called by the frame decoder, so the data is still null.  We 
need a StreamInterceptor
+// to read the data.
+return new UploadStream(requestId, meta, bodyByteCount);
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hashCode(requestId, body());
--- End diff --

this is a good point.  Admittedly I just copied this from `StreamResponse` 
without thinking about it too much -- that class exhibits the same issue.  I'll 
remove `body` from both.

(In practice, we're not using sticking them in hashmaps now so there 
wouldn't be any bugs in behavior because of this.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191978545
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
 ---
@@ -141,26 +141,14 @@ public void fetchChunk(
 StreamChunkId streamChunkId = new StreamChunkId(streamId, chunkIndex);
 handler.addFetchRequest(streamChunkId, callback);
 
-channel.writeAndFlush(new 
ChunkFetchRequest(streamChunkId)).addListener(future -> {
--- End diff --

yes exactly.  Marcelo asked for this refactoring in his review -- there was 
already a ton of copy-paste, and instead of adding more made sense to refactor. 
 Shouldn't be any behavior change (there are minor changes that shouldn't 
matter ...  `channel.close()` happens before the more specific cleanup 
operations whereas it was in the middle previously, the `try` encompasses a bit 
more than before.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191978140
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/client/StreamInterceptor.java
 ---
@@ -50,16 +52,22 @@
 
   @Override
   public void exceptionCaught(Throwable cause) throws Exception {
-handler.deactivateStream();
+deactivateStream();
 callback.onFailure(streamId, cause);
   }
 
   @Override
   public void channelInactive() throws Exception {
-handler.deactivateStream();
+deactivateStream();
 callback.onFailure(streamId, new ClosedChannelException());
   }
 
+  private void deactivateStream() {
+if (handler instanceof TransportResponseHandler) {
--- End diff --

the only purpose of `TransportResponseHandler.deactivateStream()` is to 
include the stream request in the count for `numOutstandingRequests` (its not 
doing any critical cleanup).  I will include a comment here explaining that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191976952
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/StreamData.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.server;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+import org.apache.spark.network.client.RpcResponseCallback;
+import org.apache.spark.network.client.StreamCallback;
+import org.apache.spark.network.client.StreamInterceptor;
+import org.apache.spark.network.util.TransportFrameDecoder;
+
+/**
+ * A holder for streamed data sent along with an RPC message.
+ */
+public class StreamData {
+
+  private final TransportRequestHandler handler;
+  private final TransportFrameDecoder frameDecoder;
+  private final RpcResponseCallback rpcCallback;
+  private final ByteBuffer meta;
--- End diff --

whoops, you're right.  I was using this at one point in the follow-on 
patch, then changed it and didn't fully clean this up.  thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91314/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21464
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21464
  
**[Test build #91314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91314/testReport)**
 for PR 21464 at commit 
[`90c9ddc`](https://github.com/apache/spark/commit/90c9ddca2ecb458ccde2945ab67548403c3b4256).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21464
  
**[Test build #4191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4191/testReport)**
 for PR 21464 at commit 
[`90c9ddc`](https://github.com/apache/spark/commit/90c9ddca2ecb458ccde2945ab67548403c3b4256).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-05-30 Thread xdcjie

Github user xdcjie commented on the issue:

https://github.com/apache/spark/pull/21447
  
@maropu @gatorsmile Do you have any comment/suggestion for this PR? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91315/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21454
  
**[Test build #91315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91315/testReport)**
 for PR 21454 at commit 
[`c1b4d16`](https://github.com/apache/spark/commit/c1b4d1670fe1bb5c2b9b0d66c14e3da26627d29e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21453
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21453
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91322/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21453
  
**[Test build #91322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91322/testReport)**
 for PR 21453 at commit 
[`90d3842`](https://github.com/apache/spark/commit/90d3842616aec94d603a68d44463eb043c5a66f9).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21456
  
**[Test build #91321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91321/testReport)**
 for PR 21456 at commit 
[`88bb478`](https://github.com/apache/spark/commit/88bb4780d20ad952aa1936f4e78a420d9baf0f2c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21453
  
**[Test build #91322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91322/testReport)**
 for PR 21453 at commit 
[`90d3842`](https://github.com/apache/spark/commit/90d3842616aec94d603a68d44463eb043c5a66f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21453
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21456
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21453
  
Jenkins, add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

2018-05-30 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21010
  
> Basically LGTM, but I'm wondering what if the expr2 is not like a format 
string?

The same as Hive:
```sql
spark-sql> SELECT format_number(12332.123456, 'abc');
abc12332
```
```sql
hive> SELECT format_number(12332.123456, 'abc');
OK
abc12332
Time taken: 0.218 seconds, Fetched: 1 row(s)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191969840
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Makes sense but +1 for leaving it out since I either don't know how 
commonly it will be used for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21378#discussion_r191969654
  
--- Diff: docs/running-on-mesos.md ---
@@ -753,6 +753,16 @@ See the [configuration page](configuration.html) for 
information on Spark config
 spark.cores.max is reached
   
 
++
--- End diff --

+1 for reverting whitespaces. Let's leave this minimised and targeted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21378: [SPARK-24326][Mesos] add support for local:// scheme for...

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21378
  
@felixcheung yup. I was just looking at this PR out of my curiosity. I 
don't currently have an env to test Mesos.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21426
  
@vanzin and @jerryshao, thanks you so much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21464
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21464
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91313/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21464: [WEBUI] Avoid possibility of script in query param keys

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21464
  
**[Test build #91313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91313/testReport)**
 for PR 21464 at commit 
[`aad159c`](https://github.com/apache/spark/commit/aad159c561094b53a719c8950fa087dacd1d9d8d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3581/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191966223
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
+UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
+IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+IMAGE_TAG="N/A"
+SPARK_TGZ="N/A"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+--unpacked-spark-tgz)
+  UNPACKED_SPARK_TGZ="$2"
+  shift
+  ;;
+--image-repo)
+  IMAGE_REPO="$2"
+  shift
+  ;;
+--image-tag)
+  IMAGE_TAG="$2"
+  shift
+  ;;
+--image-tag-output-file)
+  IMAGE_TAG_OUTPUT_FILE="$2"
+  shift
+  ;;
+--deploy-mode)
+  DEPLOY_MODE="$2"
+  shift
+  ;;
+--spark-tgz)
+  SPARK_TGZ="$2"
+  shift
+  ;;
+*)
+  break
+  ;;
+  esac
+  shift
+done
+
+if [[ $SPARK_TGZ == "N/A" ]];
+then
+  echo "Must specify a Spark tarball to build Docker images against with 
--spark-tgz." && exit 1;
--- End diff --

Ok, it's important for me to be clear here. There are currently two PRBs. 
This will continue in the immediate future.

1. General Spark PRB, mainly for unit tests. This can run on all hosts.

2. K8s integration-specific PRB. This early-outs on many PRs that don't 
seem relevant. This is specifically for running K8s integration tests, and can 
only run on some hosts.

Because of the host restriction issue, these are two separate PRBs.

It is definitely true that each one of these will build the main Spark jars 
separately, so that 11 minute time will be spent twice. Since the 
K8s-integration PRB is only doing this on a small set of PRs, it's not a 
significant cost to the Jenkins infrastructure.

Within the K8s-integration PRB, the entire maven reactor is only built 
once, during the make distribution step. The integration test step doesn't 
rebuild it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3714/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3581/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-30 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191965147
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
+UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
+IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+IMAGE_TAG="N/A"
+SPARK_TGZ="N/A"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+--unpacked-spark-tgz)
+  UNPACKED_SPARK_TGZ="$2"
+  shift
+  ;;
+--image-repo)
+  IMAGE_REPO="$2"
+  shift
+  ;;
+--image-tag)
+  IMAGE_TAG="$2"
+  shift
+  ;;
+--image-tag-output-file)
+  IMAGE_TAG_OUTPUT_FILE="$2"
+  shift
+  ;;
+--deploy-mode)
+  DEPLOY_MODE="$2"
+  shift
+  ;;
+--spark-tgz)
+  SPARK_TGZ="$2"
+  shift
+  ;;
+*)
+  break
+  ;;
+  esac
+  shift
+done
+
+if [[ $SPARK_TGZ == "N/A" ]];
+then
+  echo "Must specify a Spark tarball to build Docker images against with 
--spark-tgz." && exit 1;
--- End diff --

Sorry, I mean if the Spark PRB will try to build the entire maven reactor 
twice - once for unit tests and once for integration tests. The TGZ bundling in 
and of itself I agree should be fast if the jars are already built by the maven 
reactor. But it's unclear to me if we'll end up building jars redundantly here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-30 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/21346
  
Summary of key changes (WIP; notes to self):


> Summary of changes:
> 
> - Introduce a new `UploadStream` RPC which is sent to push a large 
payload as a stream (in contrast, the pre-existing `StreamRequest` and 
`StreamResponse` RPCs are used for pull-based  streaming).
> - Generalize `RpcHandler.receive()` to support requests which contain 
streams. 
> - Generalize `StreamInterceptor` to handle both request and response 
messages (previously it only handled responses).
> - Introduce `StdChannelListener` to abstract away common logging logic in 
`ChannelFuture` listeners.

Question: is this effectively dead code at this point? In other words, this 
PR just adds the lower-level pieces but there's nothing currently using the new 
API? So this patch as of now has no behavior change and actual functional 
changes impacting queries / actual usage will come later when we wire this up 
to the block replicator?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21346: [SPARK-6237][NETWORK] Network-layer changes to al...

2018-05-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21346#discussion_r191964329
  
--- Diff: project/MimaExcludes.scala ---
@@ -36,6 +36,9 @@ object MimaExcludes {
 
   // Exclude rules for 2.4.x
   lazy val v24excludes = v23excludes ++ Seq(
+// [SPARK-6237][NETWORK] Network-layer changes to allow stream upload
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.network.netty.NettyBlockRpcServer.receive"),
--- End diff --

I suspect that it's because we might want to access these across Java 
package boundaries and Java doesn't have the equivalent of Scala's nested 
package scoped `private[package]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91320/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91320/testReport)**
 for PR 20697 at commit 
[`4102b25`](https://github.com/apache/spark/commit/4102b25537288fd4d6cf8267f07c52a79f49dd72).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91320/testReport)**
 for PR 20697 at commit 
[`4102b25`](https://github.com/apache/spark/commit/4102b25537288fd4d6cf8267f07c52a79f49dd72).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191962980
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
+UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
+IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+IMAGE_TAG="N/A"
+SPARK_TGZ="N/A"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+--unpacked-spark-tgz)
+  UNPACKED_SPARK_TGZ="$2"
+  shift
+  ;;
+--image-repo)
+  IMAGE_REPO="$2"
+  shift
+  ;;
+--image-tag)
+  IMAGE_TAG="$2"
+  shift
+  ;;
+--image-tag-output-file)
+  IMAGE_TAG_OUTPUT_FILE="$2"
+  shift
+  ;;
+--deploy-mode)
+  DEPLOY_MODE="$2"
+  shift
+  ;;
+--spark-tgz)
+  SPARK_TGZ="$2"
+  shift
+  ;;
+*)
+  break
+  ;;
+  esac
+  shift
+done
+
+if [[ $SPARK_TGZ == "N/A" ]];
+then
+  echo "Must specify a Spark tarball to build Docker images against with 
--spark-tgz." && exit 1;
--- End diff --

I timed the build on my laptop. To build the Spark jars took just over 11 
minutes. To build the .tgz took about 7 seconds. So this extra step adds ~1% 
overhead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21010
  
Basically LGTM, but I'm wondering what if the `expr2` is not like a format 
string?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191961749
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
+UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
+IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+IMAGE_TAG="N/A"
+SPARK_TGZ="N/A"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+--unpacked-spark-tgz)
+  UNPACKED_SPARK_TGZ="$2"
+  shift
+  ;;
+--image-repo)
+  IMAGE_REPO="$2"
+  shift
+  ;;
+--image-tag)
+  IMAGE_TAG="$2"
+  shift
+  ;;
+--image-tag-output-file)
+  IMAGE_TAG_OUTPUT_FILE="$2"
+  shift
+  ;;
+--deploy-mode)
+  DEPLOY_MODE="$2"
+  shift
+  ;;
+--spark-tgz)
+  SPARK_TGZ="$2"
+  shift
+  ;;
+*)
+  break
+  ;;
+  esac
+  shift
+done
+
+if [[ $SPARK_TGZ == "N/A" ]];
+then
+  echo "Must specify a Spark tarball to build Docker images against with 
--spark-tgz." && exit 1;
--- End diff --

Also - Jenkins has *already* been doing that - building the distribution 
.tgz for each Kubernetes related PRB invokation. Relevant is the fact that the 
filtering of what is a Kubernetes-related change happens *before* the 
distribution .tgz is done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3580/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191961317
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
+UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
+IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+IMAGE_TAG="N/A"
+SPARK_TGZ="N/A"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+--unpacked-spark-tgz)
+  UNPACKED_SPARK_TGZ="$2"
+  shift
+  ;;
+--image-repo)
+  IMAGE_REPO="$2"
+  shift
+  ;;
+--image-tag)
+  IMAGE_TAG="$2"
+  shift
+  ;;
+--image-tag-output-file)
+  IMAGE_TAG_OUTPUT_FILE="$2"
+  shift
+  ;;
+--deploy-mode)
+  DEPLOY_MODE="$2"
+  shift
+  ;;
+--spark-tgz)
+  SPARK_TGZ="$2"
+  shift
+  ;;
+*)
+  break
+  ;;
+  esac
+  shift
+done
+
+if [[ $SPARK_TGZ == "N/A" ]];
+then
+  echo "Must specify a Spark tarball to build Docker images against with 
--spark-tgz." && exit 1;
--- End diff --

The act of building the .tgz is much cheaper (faster) than doing the java 
build or the integration test. I wouldn't worry about that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3713/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3580/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91317/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21465
  
**[Test build #91317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)**
 for PR 21465 at commit 
[`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HasValidationIndicatorCol(Params):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21069#discussion_r191956713
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1882,3 +1882,117 @@ case class ArrayRepeat(left: Expression, right: 
Expression)
   }
 
 }
+
+/**
+ * Remove all elements that equal to element from the given array
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array, element) - Remove all elements that equal to 
element from array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, null, 3), 3);
+   [1,2,null]
+  """, since = "2.4.0")
+case class ArrayRemove(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = left.dataType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  lazy val elementType: DataType = 
left.dataType.asInstanceOf[ArrayType].elementType
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(right.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!left.dataType.isInstanceOf[ArrayType]
+  || left.dataType.asInstanceOf[ArrayType].elementType != 
right.dataType) {
+  TypeCheckResult.TypeCheckFailure(
+"Arguments must be an array followed by a value of same type as 
the array members")
+} else {
+  TypeUtils.checkForOrderingExpr(right.dataType, s"function 
$prettyName")
+}
+  }
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+val newArray = new 
Array[Any](arr.asInstanceOf[ArrayData].numElements())
+var pos = 0
+arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
+  if (v == null || !ordering.equiv(v, value)) {
+newArray(pos) = v
+pos += 1
+  }
+)
+new GenericArrayData(newArray.slice(0, pos))
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (arr, value) => {
+  val numsToRemove = ctx.freshName("numsToRemove")
+  val newArraySize = ctx.freshName("newArraySize")
+  val i = ctx.freshName("i")
+  val getValue = CodeGenerator.getValue(arr, elementType, i)
+  val isEqual = ctx.genEqual(elementType, value, getValue)
+  s"""
+ |int $numsToRemove = 0;
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  if (!$arr.isNullAt($i) && $isEqual) {
+ |$numsToRemove = $numsToRemove + 1;
+ |  }
+ |}
+ |int $newArraySize = $arr.numElements() - $numsToRemove;
+ |${genCodeForResult(ctx, ev, arr, value, newArraySize)}
+   """.stripMargin
+})
+  }
+
+  def genCodeForResult(
+  ctx: CodegenContext,
+  ev: ExprCode,
+  inputArray: String,
+  value: String,
+  newArraySize: String): String = {
+val values = ctx.freshName("values")
+val i = ctx.freshName("i")
+val pos = ctx.freshName("pos")
+val getValue = CodeGenerator.getValue(inputArray, elementType, i)
+val isEqual = ctx.genEqual(elementType, value, getValue)
+if (!CodeGenerator.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[GenericArrayData].getName
+  s"""
+ |int $pos = 0;
+ |Object[] $values = new Object[$newArraySize];
+ |for (int $i = 0; $i < $inputArray.numElements(); $i ++) {
+ |  if (!($isEqual)) {
--- End diff --

Don't we need to check null?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21069#discussion_r191955752
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1882,3 +1882,117 @@ case class ArrayRepeat(left: Expression, right: 
Expression)
   }
 
 }
+
+/**
+ * Remove all elements that equal to element from the given array
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array, element) - Remove all elements that equal to 
element from array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, null, 3), 3);
+   [1,2,null]
+  """, since = "2.4.0")
+case class ArrayRemove(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = left.dataType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  lazy val elementType: DataType = 
left.dataType.asInstanceOf[ArrayType].elementType
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(right.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!left.dataType.isInstanceOf[ArrayType]
+  || left.dataType.asInstanceOf[ArrayType].elementType != 
right.dataType) {
--- End diff --

Maybe we need to change here as well.
We can follow the implementation of `ArrayPosition`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21069#discussion_r191955459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1882,3 +1882,117 @@ case class ArrayRepeat(left: Expression, right: 
Expression)
   }
 
 }
+
+/**
+ * Remove all elements that equal to element from the given array
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array, element) - Remove all elements that equal to 
element from array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, null, 3), 3);
+   [1,2,null]
+  """, since = "2.4.0")
+case class ArrayRemove(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = left.dataType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
--- End diff --

This will cause `ClassCastException`. See #21401.
Also could you add tests similar to tests added in #21401?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21069: [SPARK-23920][SQL]add array_remove to remove all ...

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21069#discussion_r191958470
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -552,4 +552,60 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayRepeat(strArray, Literal(2)), Seq(Seq("hi", 
"hola"), Seq("hi", "hola")))
 checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, 
IntegerType)), null)
   }
+
+  test("Array remove") {
+val a0 = Literal.create(Seq(1, 2, 3, 2, 2, 5), ArrayType(IntegerType))
+val a1 = Literal.create(Seq("b", "a", "a", "c", "b"), 
ArrayType(StringType))
+val a2 = Literal.create(Seq[String](null, "", null, ""), 
ArrayType(StringType))
+val a3 = Literal.create(Seq.empty[Integer], ArrayType(IntegerType))
+val a4 = Literal.create(null, ArrayType(StringType))
+val a5 = Literal.create(Seq(1, null, 8, 9, null), 
ArrayType(IntegerType))
+val a6 = Literal.create(Seq(true, false, false, true), 
ArrayType(BooleanType))
+
+checkEvaluation(ArrayRemove(a0, Literal(0)), Seq(1, 2, 3, 2, 2, 5))
+checkEvaluation(ArrayRemove(a0, Literal(1)), Seq(2, 3, 2, 2, 5))
+checkEvaluation(ArrayRemove(a0, Literal(2)), Seq(1, 3, 5))
+checkEvaluation(ArrayRemove(a0, Literal(3)), Seq(1, 2, 2, 2, 5))
+checkEvaluation(ArrayRemove(a0, Literal(5)), Seq(1, 2, 3, 2, 2))
+checkEvaluation(ArrayRemove(a0, Literal(null, IntegerType)), null)
+
+checkEvaluation(ArrayRemove(a1, Literal("")), Seq("b", "a", "a", "c", 
"b"))
+checkEvaluation(ArrayRemove(a1, Literal("a")), Seq("b", "c", "b"))
+checkEvaluation(ArrayRemove(a1, Literal("b")), Seq("a", "a", "c"))
+checkEvaluation(ArrayRemove(a1, Literal("c")), Seq("b", "a", "a", "b"))
+
+checkEvaluation(ArrayRemove(a2, Literal("")), Seq(null, null))
+checkEvaluation(ArrayRemove(a2, Literal(null, StringType)), null)
+
+checkEvaluation(ArrayRemove(a3, Literal(1)), Seq.empty[Integer])
+
+checkEvaluation(ArrayRemove(a4, Literal("a")), null)
+
+checkEvaluation(ArrayRemove(a5, Literal(9)), Seq(1, null, 8, null))
+checkEvaluation(ArrayRemove(a6, Literal(false)), Seq(true, true))
+
+// complex data types
+val b0 = Literal.create(Seq[Array[Byte]](Array[Byte](5, 6), 
Array[Byte](1, 2),
+  Array[Byte](1, 2), Array[Byte](5, 6)), ArrayType(BinaryType))
+val b1 = Literal.create(Seq[Array[Byte]](Array[Byte](2, 1), null),
+  ArrayType(BinaryType))
+val b2 = Literal.create(Seq[Array[Byte]](null, Array[Byte](1, 2)),
+  ArrayType(BinaryType))
+val nullBinary = Literal.create(null, BinaryType)
+
+val dataToRemoved1 = Literal.create(Array[Byte](5, 6), BinaryType)
+checkEvaluation(ArrayRemove(b0, dataToRemoved1),
+  Seq[Array[Byte]](Array[Byte](1, 2), Array[Byte](1, 2)))
+checkEvaluation(ArrayRemove(b0, nullBinary), null)
+checkEvaluation(ArrayRemove(b1, dataToRemoved1), 
Seq[Array[Byte]](Array[Byte](2, 1), null))
+checkEvaluation(ArrayRemove(b2, dataToRemoved1), 
Seq[Array[Byte]](null, Array[Byte](1, 2)))
+
+val c0 = Literal.create(Seq[Seq[Int]](Seq[Int](1, 2), Seq[Int](3, 4)),
+  ArrayType(ArrayType(IntegerType)))
+val c1 = Literal.create(Seq[Seq[Int]](Seq[Int](5, 6), Seq[Int](2, 1)),
+  ArrayType(ArrayType(IntegerType)))
--- End diff --

What if for `val c2 = Literal.create(Seq[Seq[Int]](null, Seq[Int](2, 1)), 
ArrayType(ArrayType(IntegerType)))`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3712/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21454
  
**[Test build #91318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91318/testReport)**
 for PR 21454 at commit 
[`38ffa3e`](https://github.com/apache/spark/commit/38ffa3e551c7eee69d12cc736d33d137abd333b7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91319/testReport)**
 for PR 21366 at commit 
[`260d82c`](https://github.com/apache/spark/commit/260d82ca9fbbd16ad8174d0dafa2f95bc177a219).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21465
  
**[Test build #91317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91317/testReport)**
 for PR 21465 at commit 
[`79fc83b`](https://github.com/apache/spark/commit/79fc83bae376c430a23fd7c9f502690a1c4d321e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21463: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIterati...