date:20180131

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165248266
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/InProcessAppHandle.java ---
@@ -66,7 +66,7 @@ synchronized void start(String appName, Method main, 
String[] args) {
 setState(State.FAILED);
   }
 
-  disconnect();
+  dispose();
--- End diff --

If we call `disconnect` here, we would close the connection, and then wait 
the close to finish in `dispose`. If we call `dispose` directly, we also close 
and wait the connection(in `waitForClose`). What the actual difference here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165248236
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
@@ -363,17 +362,28 @@ public void close() throws IOException {
 /**
  * Close the connection and wait for any buffered data to be processed 
before returning.
--- End diff --

That wouldn't be accurate anymore, because the wait happens first now. 
`waitAndClose()` is an option but also not totally accurate. Open to 
suggestions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165247921
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
@@ -363,17 +362,28 @@ public void close() throws IOException {
 /**
  * Close the connection and wait for any buffered data to be processed 
before returning.
--- End diff --

according to the document, shall we still call it `closeAndWait`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20437: [SPARK-23270][Streaming][WEB-UI]FileInputDStream ...

2018-01-31 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/20437#discussion_r165247567
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -157,7 +157,7 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
 val metadata = Map(
   "files" -> newFiles.toList,
   StreamInputInfo.METADATA_KEY_DESCRIPTION -> newFiles.mkString("\n"))
-val inputInfo = StreamInputInfo(id, 0, metadata)
+val inputInfo = StreamInputInfo(id, rdds.map(_.count).sum, metadata)
--- End diff --

Asynchronous processing, does not affect the backbone of the Streaming job, 
also can get the number of records.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86903/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20459
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20459
  
**[Test build #86903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86903/testReport)**
 for PR 20459 at commit 
[`b8c6c19`](https://github.com/apache/spark/commit/b8c6c1922b006373592cb7c7687ee970d2c8f519).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20455
  
LGTM so far, one thing I wanna add is to also document the behavior of 
accessing null primitive values, e.g. `getInt`. We can say that the return 
value is undefined and can be anything, if that slot is null. Same to the 
batched version like `getInts`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20462
  
**[Test build #86904 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86904/testReport)**
 for PR 20462 at commit 
[`b967775`](https://github.com/apache/spark/commit/b96777573bdc9dc92b3419fb44bbd790117ee00e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20462
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20462
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/455/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20461
  
LGTM, also cc @jinxing64 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.Download...

2018-01-31 Thread ConeyLiu

Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20461#discussion_r165246022
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -171,7 +171,9 @@ private void failRemainingBlocks(String[] 
failedBlockIds, Throwable e) {
 
 @Override
 public void onData(String streamId, ByteBuffer buf) throws IOException 
{
-  channel.write(buf);
+  while (buf.hasRemaining()) {
+channel.write(buf);
--- End diff --


[FileSuite.writeBinaryData](https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L247)
 this also should be fixed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165245941
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -112,7 +112,7 @@ void monitorChild() {
 }
   }
 
-  disconnect();
+  dispose();
--- End diff --

I updated the documentation for dispose.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165245861
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -112,7 +112,7 @@ void monitorChild() {
 }
   }
 
-  disconnect();
+  dispose();
--- End diff --

Sorry I'm still not able to figure out what's the difference between them 
after reading the doc, do you mind leave a short description here? thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20454: [SPARK-23202][SQL] Add new API in DataSourceWrite...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20454#discussion_r165245483
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -62,6 +62,15 @@
*/
   DataWriterFactory createWriterFactory();
 
+  /**
+   * Handles a commit message on receiving from a successful data writer.
+   *
+   * If this method fails (by throwing an exception), this writing job is 
considered to to have been
+   * failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
+   * is undefined and {@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
--- End diff --

makes sense, let's remove the last sentence.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165245465
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -112,7 +112,7 @@ void monitorChild() {
 }
   }
 
-  disconnect();
+  dispose();
--- End diff --

`disconnect()` is actually a public method and already documented in the 
SparkAppHandle interface.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165245388
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractAppHandle.java ---
@@ -42,9 +42,9 @@ protected AbstractAppHandle(LauncherServer server) {
   }
 
   @Override
-  public synchronized void addListener(Listener l) {
+  public void addListener(Listener l) {
--- End diff --

I should add this one back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165245334
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
 
   /**
* Returns the map type value for rowId.
+   *
+   * In Spark, map type value is basically a key data array and a value 
data array. A key from the
+   * key array with a index and a value from the value array with the same 
index contribute to
+   * an entry of this map type value.
+   *
+   * To support map type, implementations must construct an {@link 
ColumnarMap} and return it in
--- End diff --

Sure. LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20461
  
lgtm


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkD...

2018-01-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20443


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165244212
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -112,7 +112,7 @@ void monitorChild() {
 }
   }
 
-  disconnect();
+  dispose();
--- End diff --

Can we add more document to `disconnect` and `dispose`? So that people can 
understand the difference between them clearly and have a better understanding 
of changes like this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20443
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20462#discussion_r165243779
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractAppHandle.java ---
@@ -42,9 +42,9 @@ protected AbstractAppHandle(LauncherServer server) {
   }
 
   @Override
-  public synchronized void addListener(Listener l) {
+  public void addListener(Listener l) {
--- End diff --

why remove `synchronized` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20443
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86900/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20361
  
LGTM, pending jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20443
  
**[Test build #86900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86900/testReport)**
 for PR 20443 at commit 
[`96f9284`](https://github.com/apache/spark/commit/96f928481c5e2b0af0d4dd9151978fd59773fb97).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20361: [SPARK-23188][SQL] Make vectorized columar reader...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20361#discussion_r165242969
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -40,7 +40,9 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
 
List.fill(n)(ROW).toDF.repartition(1).write.parquet(dir.getCanonicalPath)
 val file = 
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head
 
-val reader = new 
VectorizedParquetRecordReader(sqlContext.conf.offHeapColumnVectorEnabled)
+val conf = sqlContext.conf
--- End diff --

nit: `val capacity = sqlContext.conf. parquetVectorizedReaderBatchSize `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165242689
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
 
   /**
* Returns the map type value for rowId.
+   *
+   * In Spark, map type value is basically a key data array and a value 
data array. A key from the
+   * key array with a index and a value from the value array with the same 
index contribute to
+   * an entry of this map type value.
+   *
+   * To support map type, implementations must construct an {@link 
ColumnarMap} and return it in
--- End diff --

This is very minor, may not worth to wait for another QA round. Maybe we 
can fix it in your "return null" PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165242471
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
 
   /**
* Returns the map type value for rowId.
+   *
+   * In Spark, map type value is basically a key data array and a value 
data array. A key from the
+   * key array with a index and a value from the value array with the same 
index contribute to
+   * an entry of this map type value.
+   *
+   * To support map type, implementations must construct an {@link 
ColumnarMap} and return it in
+   * this method. {@link ColumnarMap} requires a {@link ColumnVector} that 
stores the data of all
+   * the keys of all the maps in this vector, and another {@link 
ColumnVector} that stores the data
--- End diff --

`keys of map entries` sounds weird...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165242508
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarMap.java ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.vectorized;
+
+import org.apache.spark.sql.catalyst.util.MapData;
+
+/**
+ * Map abstraction in {@link ColumnVector}.
+ */
+public final class ColumnarMap extends MapData {
+  private final ColumnarArray keys;
+  private final ColumnarArray values;
+  private final int length;
+
+  public ColumnarMap(ColumnVector keys, ColumnVector values, int offset, 
int length) {
+this.length = length;
+this.keys = new ColumnarArray(keys, offset, length);
+this.values = new ColumnarArray(values, offset, length);
+  }
+
+  @Override
+  public int numElements() { return length; }
--- End diff --

This is a API from parent, we can change it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165242401
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -530,7 +530,7 @@ public int putByteArray(int rowId, byte[] value, int 
offset, int length) {
   @Override
   protected void reserveInternal(int newCapacity) {
 int oldCapacity = (nulls == 0L) ? 0 : capacity;
-if (isArray()) {
+if (isArray() || type instanceof MapType) {
--- End diff --

This might be an overkill, `isArray` needs to take care of many types, but 
`isMap` we only accept one type: MapType.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20455
  
LGTM so far. Let's wait for map type support.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20462
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86893/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20462
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20462
  
**[Test build #86893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86893/testReport)**
 for PR 20462 at commit 
[`daa5b70`](https://github.com/apache/spark/commit/daa5b70d66b32d582dc7c2cdba79ce748ca5cc66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86892/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20455
  
cc @cloud-fan @ueshin @kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20461
  
**[Test build #86892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86892/testReport)**
 for PR 20461 at commit 
[`c186a6b`](https://github.com/apache/spark/commit/c186a6b489fd0233449692dd80d0cae6452905bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20437: [SPARK-23270][Streaming][WEB-UI]FileInputDStream ...

2018-01-31 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20437#discussion_r165240426
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -157,7 +157,7 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
 val metadata = Map(
   "files" -> newFiles.toList,
   StreamInputInfo.METADATA_KEY_DESCRIPTION -> newFiles.mkString("\n"))
-val inputInfo = StreamInputInfo(id, 0, metadata)
+val inputInfo = StreamInputInfo(id, rdds.map(_.count).sum, metadata)
--- End diff --

What's the difference between using HDFS API to count in another thread 
compared to current solution? You still cannot avoid reading files twice.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/454/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20459
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20437: [SPARK-23270][Streaming][WEB-UI]FileInputDStream ...

2018-01-31 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/20437#discussion_r165239860
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -157,7 +157,7 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
 val metadata = Map(
   "files" -> newFiles.toList,
   StreamInputInfo.METADATA_KEY_DESCRIPTION -> newFiles.mkString("\n"))
-val inputInfo = StreamInputInfo(id, 0, metadata)
+val inputInfo = StreamInputInfo(id, rdds.map(_.count).sum, metadata)
--- End diff --

The number of rows in a file. Is this solution possible?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165067849
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
 
   /**
* Returns the map type value for rowId.
+   *
+   * In Spark, map type value is basically a key data array and a value 
data array. A key from the
+   * key array with a index and a value from the value array with the same 
index contribute to
+   * an entry of this map type value.
+   *
+   * To support map type, implementations must construct an {@link 
ColumnarMap} and return it in
--- End diff --

construct an -> construct a.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20459
  
**[Test build #86903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86903/testReport)**
 for PR 20459 at commit 
[`b8c6c19`](https://github.com/apache/spark/commit/b8c6c1922b006373592cb7c7687ee970d2c8f519).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20463: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFr...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20463
  
**[Test build #86902 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86902/testReport)**
 for PR 20463 at commit 
[`895e17c`](https://github.com/apache/spark/commit/895e17c067dffa69ce97368edbf9289239574431).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs...

2018-01-31 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/20459#discussion_r165239367
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
@@ -93,7 +93,7 @@ private[feature] trait QuantileDiscretizerBase extends 
Params
* @group param
*/
   @Since("2.1.0")
-  override val handleInvalid: Param[String] = new Param[String](this, 
"handleInvalid",
+  final override val handleInvalid: Param[String] = new 
Param[String](this, "handleInvalid",
--- End diff --

Fair enough. I will leave ```handleInvalid``` in all estimators 
```non-final```.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20463: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFr...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20463
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/453/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20463: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFr...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20463
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20463: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultW...

2018-01-31 Thread jiangxb1987

GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/20463

[SQL][MINOR] Inline SpecifiedWindowFrame.defaultWindowFrame().

## What changes were proposed in this pull request?

SpecifiedWindowFrame.defaultWindowFrame(hasOrderSpecification, 
acceptWindowFrame) was designed to handle the cases when some Window functions 
don't support setting a window frame (e.g. rank). However this param is never 
used.

We may inline the whole of this function to simplify the code.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark defaultWindowFrame

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20463


commit 895e17c067dffa69ce97368edbf9289239574431
Author: Xingbo Jiang 
Date:   2018-01-31T23:02:00Z

inline SpecifiedWindowFrame.defaultWindowFrame().




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20361
  
**[Test build #86901 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86901/testReport)**
 for PR 20361 at commit 
[`5ad935f`](https://github.com/apache/spark/commit/5ad935f28dae3d8879c0f65711f9b0861c993a24).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20361
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20361
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/452/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-31 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20383
  
@ssaavedra Do you mean the your current patch doesn't work even for master 
branch? In case of that do we need to revert the current patch? CC @felixcheung 
@foxish .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20459
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86891/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs.

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20459
  
**[Test build #86891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86891/testReport)**
 for PR 20459 at commit 
[`407cd80`](https://github.com/apache/spark/commit/407cd808984158b502d4e9f6dca7e95406d97277).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r165236983
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case exc: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
--- End diff --

Thanks, @squito. My only worry is, what is the error message like befer / 
after this change when `daemonPort` happens to be 0 because it was filled in 
pyspark.daemon's stdout unexpectedly by the sitecustomize or usercustomize 
whatever?

I assume the error message becomes different as you described above because 
`java.net.InetSocketAddress.checkPort` passes `0` case but your proposed change 
also catches `0` case too for the reason you described above.

This case `0` is different because the previous codes probably try to 
attempt a connection whether it's a valid port or not, after passing 
`java.net.InetSocketAddress.checkPort`, if I didn't misunderstand.

I want to be very sure on this. Is it guranteed to be always failed without 
any hole and the error message is always better in this case? How about other 
ports reserved or "well-known" ? Should we really take this into account here?

Please let me know if I missed something.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86890/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
**[Test build #86890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86890/testReport)**
 for PR 20460 at commit 
[`a971870`](https://github.com/apache/spark/commit/a97187016c7f0711f71da6cb5b1a66ac791e843e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20415
  
@mgaido91 thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

2018-01-31 Thread rednaxelafx

Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20419
  
I gave it a bit more thought. Here's an alternative proposal: instead of 
using a "force comment" mechanism in the current form (which still gets a 
`ctx.freshName("c")` that the caller has no control over), why don't we provide 
an alternative API that also forces comment, but allows the caller to specify a 
(within the `CodegenContext`) unique ID for the placeholder comment?

In this case, instead of using `/*wholestagecodegen_c*/` as the 
placeholder, which can be brittle if someone adds new calls to 
`ctx.registerComment()` in `WholeStageCodegenExec`, we can call 
`ctx.registerComment(comment, placeholderId = "wsc_codegenStageId")`, which we 
can be sure that it'll end up creating a placeholder comment of 
`/*wsc_codegenStageId*/`, stable across all whole-stage generated main classes, 
that would never affect the codegen cache behavior.

@kiszk WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20443
  
**[Test build #86900 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86900/testReport)**
 for PR 20443 at commit 
[`96f9284`](https://github.com/apache/spark/commit/96f928481c5e2b0af0d4dd9151978fd59773fb97).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20443
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20443: [SPARK-23157][SQL][FOLLOW-UP] DataFrame -> SparkDataFram...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/451/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86897/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20400
  
**[Test build #86897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86897/testReport)**
 for PR 20400 at commit 
[`45545d6`](https://github.com/apache/spark/commit/45545d65ce9ecc077bf842602ecc465ceeeda061).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20361: [SPARK-23188][SQL] Make vectorized columar reader...

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20361#discussion_r165234841
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
 ---
@@ -49,8 +49,9 @@
  * After creating, `initialize` and `initBatch` should be called 
sequentially.
  */
 public class OrcColumnarBatchReader extends RecordReader {
-  // TODO: make this configurable.
-  private static final int CAPACITY = 4 * 1024;
+
+  // The default size of vectorized batch.
--- End diff --

How about rephrase to `The capacity of vectorized batch` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20415
  
**[Test build #86898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86898/testReport)**
 for PR 20415 at commit 
[`5d70fd1`](https://github.com/apache/spark/commit/5d70fd1f939a67707f16c1afdefea6d4342c019e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17185
  
**[Test build #86899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86899/testReport)**
 for PR 17185 at commit 
[`8206dc3`](https://github.com/apache/spark/commit/8206dc3bc2595507ba71e7f50fddeed0c3b16479).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-01-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17185
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r165233944
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
 ---
@@ -561,8 +561,11 @@ class ExpressionParserSuite extends PlanTest {
   Literal(CalendarInterval.fromSingleUnitString(u, s))
 }
 
-// Empty interval statement
-intercept("interval", "at least one time unit should be given for 
interval literal")
--- End diff --

we shall still check the empty interval statement, now it shall produce a 
different error message?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20450
  
LGTM only some nits and naming issues.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165231647
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarMap.java ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.vectorized;
+
+import org.apache.spark.sql.catalyst.util.MapData;
+
+/**
+ * Map abstraction in {@link ColumnVector}.
+ */
+public final class ColumnarMap extends MapData {
+  private final ColumnarArray keys;
+  private final ColumnarArray values;
+  private final int length;
+
+  public ColumnarMap(ColumnVector keys, ColumnVector values, int offset, 
int length) {
+this.length = length;
+this.keys = new ColumnarArray(keys, offset, length);
+this.values = new ColumnarArray(values, offset, length);
+  }
+
+  @Override
+  public int numElements() { return length; }
--- End diff --

`numElements` or `length`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165231132
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
@@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
 
   /**
* Returns the map type value for rowId.
+   *
+   * In Spark, map type value is basically a key data array and a value 
data array. A key from the
+   * key array with a index and a value from the value array with the same 
index contribute to
+   * an entry of this map type value.
+   *
+   * To support map type, implementations must construct an {@link 
ColumnarMap} and return it in
+   * this method. {@link ColumnarMap} requires a {@link ColumnVector} that 
stores the data of all
+   * the keys of all the maps in this vector, and another {@link 
ColumnVector} that stores the data
--- End diff --

nit: `maps` -> `map entries` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20450#discussion_r165230542
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -530,7 +530,7 @@ public int putByteArray(int rowId, byte[] value, int 
offset, int length) {
   @Override
   protected void reserveInternal(int newCapacity) {
 int oldCapacity = (nulls == 0L) ? 0 : capacity;
-if (isArray()) {
+if (isArray() || type instanceof MapType) {
--- End diff --

nit: we may also have a method `isMap()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-01-31 Thread skambha

Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r165230391
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -88,12 +88,12 @@ case class UnresolvedAttribute(nameParts: Seq[String]) 
extends Attribute with Un
   override def exprId: ExprId = throw new UnresolvedException(this, 
"exprId")
   override def dataType: DataType = throw new UnresolvedException(this, 
"dataType")
   override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
-  override def qualifier: Option[String] = throw new 
UnresolvedException(this, "qualifier")
+  override def qualifier: Option[Seq[String]] = throw new 
UnresolvedException(this, "qualifier")
--- End diff --

I followed the style that exists today. If we change here, then there are 
other places where it needs to change as well.  For e.g, in AttributeReference 
we initialize it with None today and now it will be a Seq.empty.  I am not sure 
if we want to create these objects or just leave it using  None by keeping the 
Option[]  

If you have a strong preference, I can update it.   I do have the changes 
locally but have not pushed that version.   Thanks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/450/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20450
  
**[Test build #86896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86896/testReport)**
 for PR 20450 at commit 
[`182b404`](https://github.com/apache/spark/commit/182b4041e31b55c2c07c14813a66d6904ae61c19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/449/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20400
  
**[Test build #86897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86897/testReport)**
 for PR 20400 at commit 
[`45545d6`](https://github.com/apache/spark/commit/45545d65ce9ecc077bf842602ecc465ceeeda061).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20450
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20450
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20400#discussion_r165229664
  
--- Diff: python/pyspark/sql/window.py ---
@@ -124,16 +126,20 @@ def rangeBetween(start, end):
 values directly.
 
 :param start: boundary start, inclusive.
-  The frame is unbounded if this is 
``Window.unboundedPreceding``, or
+  The frame is unbounded if this is 
``Window.unboundedPreceding``,
+  
``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
   any value less than or equal to max(-sys.maxsize, 
-9223372036854775808).
 :param end: boundary end, inclusive.
-The frame is unbounded if this is 
``Window.unboundedFollowing``, or
+The frame is unbounded if this is 
``Window.unboundedFollowing``,
+
``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
 any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
+any value greater than or equal to 9223372036854775807.
 """
-if start <= Window._PRECEDING_THRESHOLD:
-start = Window.unboundedPreceding
-if end >= Window._FOLLOWING_THRESHOLD:
-end = Window.unboundedFollowing
+if isinstance(start, (int, long)) and isinstance(end, (int, long)):
+if start <= Window._PRECEDING_THRESHOLD:
+start = Window.unboundedPreceding
+if end >= Window._FOLLOWING_THRESHOLD:
+end = Window.unboundedFollowing
--- End diff --

will change. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20400#discussion_r165229511
  
--- Diff: python/pyspark/sql/window.py ---
@@ -212,16 +218,20 @@ def rangeBetween(self, start, end):
 values directly.
 
 :param start: boundary start, inclusive.
-  The frame is unbounded if this is 
``Window.unboundedPreceding``, or
+  The frame is unbounded if this is 
``Window.unboundedPreceding``,
+  
``org.apache.spark.sql.catalyst.expressions.UnboundedPreceding``, or
   any value less than or equal to max(-sys.maxsize, 
-9223372036854775808).
 :param end: boundary end, inclusive.
-The frame is unbounded if this is 
``Window.unboundedFollowing``, or
+The frame is unbounded if this is 
``Window.unboundedFollowing``,
+
``org.apache.spark.sql.catalyst.expressions.UnboundedFollowing``, or
 any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
+
--- End diff --

@HyukjinKwon Thank you very much for your comments. I will make the changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-01-31 Thread skambha

Github user skambha commented on the issue:

https://github.com/apache/spark/pull/17185
  
I have rebased and pushed the changes.  I ran the unit tests ( sql, 
catalyst and hive). 
Earlier, I was having issues running the hive test suite locally but that 
is resolved with the fix from SPARK-23275.  

Please take a look.   Thanks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20459: [SPARK-23107][ML] ML 2.3 QA: New Scala APIs, docs...

2018-01-31 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20459#discussion_r165229102
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
@@ -93,7 +93,7 @@ private[feature] trait QuantileDiscretizerBase extends 
Params
* @group param
*/
   @Since("2.1.0")
-  override val handleInvalid: Param[String] = new Param[String](this, 
"handleInvalid",
+  final override val handleInvalid: Param[String] = new 
Param[String](this, "handleInvalid",
--- End diff --

I'm more prefer to mark them as non final, since if user want to extend 
these estimators they would be able to override `handleInvalid` to define their 
own `handleInvalid` valid values (through `doc` parameter of `Param` class 
constructor) , this is different from other classes. What do you think of it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20458: changed scala example from java "style" to scala

2018-01-31 Thread mantovani

Github user mantovani commented on the issue:

https://github.com/apache/spark/pull/20458
  
Sean, I already tested. Can you please be more specific how should be the
tests ? I just tested on Spark 2.1.1 and Scala 2.11 and 2.10.

On Wed, Jan 31, 2018 at 19:19 Sean Owen  wrote:

> The point is that it won't always work in Spark, so, certainly not unless
> you test them. I just don't think it's worth it, especially as it's 
setting
> an expectation as an example that it is the best way to do it.
>
> â
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>
-- 

--
Daniel de Oliveira Mantovani
Perl Evangelist/Data Hacker
+1 786 459 1341



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20458: changed scala example from java "style" to scala

2018-01-31 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20458
  
The point is that it won't always work in Spark, so, certainly not unless 
you test them. I just don't think it's worth it, especially as it's setting an 
expectation as an example that it is the best way to do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/446/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/446/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/448/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20458: changed scala example from java "style" to scala

2018-01-31 Thread mantovani

Github user mantovani commented on the issue:

https://github.com/apache/spark/pull/20458
  
If I change all examples of the documentation will be accept ?

On Wed, Jan 31, 2018 at 6:41 PM, Sean Owen  wrote:

> Yeah, that's what I was thinking of. Subclasses of App don't necessarily
> work in Spark (not just Scala). I don't know if that has changed as Spark
> and Scala moved forward. While that's an interesting item to investigate,
> here the question is changing one example to use App, while none of the
> others do and there's a reason not to. We don't want to do that.
>
> â
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



-- 

--
Daniel de Oliveira Mantovani
Perl Evangelist/Data Hacker
+1 786 459 1341



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20422: [SPARK-23253][Core][Shuffle]Only write shuffle te...

2018-01-31 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20422#discussion_r165226127
  
--- Diff: 
core/src/test/scala/org/apache/spark/shuffle/sort/IndexShuffleBlockResolverSuite.scala
 ---
@@ -133,4 +133,65 @@ class IndexShuffleBlockResolverSuite extends 
SparkFunSuite with BeforeAndAfterEa
 }
 assert(firstByte2(0) === 2)
   }
+
+  test("SPARK-23253: index files should be created properly") {
--- End diff --

+1 we can add check the index file in the original test case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

201 - 300 of 624 matches

Mail list logo