[GitHub] spark pull request #20730: [SQL] [minor] XPathDouble prettyPrint should say ...

2018-03-03 Thread ericl
GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/20730

[SQL] [minor] XPathDouble prettyPrint should say 'double' not 'float'

## What changes were proposed in this pull request?

It looks like this was incorrectly copied from `XPathFloat` in the class 
above.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark fix-typo-xpath

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20730


commit c2f23350750019e49525eb5c133a9472cee0d08d
Author: Eric Liang 
Date:   2018-03-04T07:55:47Z

s/float/double




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1262/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87942/testReport)**
 for PR 19222 at commit 
[`291203c`](https://github.com/apache/spark/commit/291203cd9f473cd6a5a90858eeb6a19bb053fd5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172039217
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 
   @Override
   public void free(MemoryBlock memory) {
-assert (memory.obj == null) :
+assert(memory instanceof OffHeapMemoryBlock);
+if (memory == OffHeapMemoryBlock.NULL) return;
+assert (memory.getBaseObject() == null) :
--- End diff --

I see. To remove this assertion would not change the behavior. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172039153
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
--- End diff --

Thank you for your suggestion. I would like to hear other opinions, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19472
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1261/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19472
  
**[Test build #87941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87941/testReport)**
 for PR 19472 at commit 
[`158e37c`](https://github.com/apache/spark/commit/158e37c703a6b4c401178b12ba0adaa0cc6baeb1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19472
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87940/testReport)**
 for PR 19222 at commit 
[`9cbb876`](https://github.com/apache/spark/commit/9cbb8762e0cc3791580e0e316930479cc83b55d7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20729: [SPARK-23578][ML]Add multicolumn support for Bina...

2018-03-03 Thread tengpeng
Github user tengpeng closed the pull request at:

https://github.com/apache/spark/pull/20729


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1260/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20729: [SPARK-23578][ML]Add multicolumn support for Bina...

2018-03-03 Thread tengpeng
GitHub user tengpeng opened a pull request:

https://github.com/apache/spark/pull/20729

[SPARK-23578][ML]Add multicolumn support for Binarizer

[Spark-20542] added an API that Bucketizer that can bin multiple columns. 
Based on this change, a multicolumn support is added for Binarizer.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tengpeng/spark Binarizer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20729


commit 9ca0f6eaf6744c090cab4ac6720cf11c9b83915e
Author: gatorsmile 
Date:   2018-01-11T13:32:36Z

[SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suite 
DataSourceWithHiveMetastoreCatalogSuite

## What changes were proposed in this pull request?
The Spark 2.3 branch still failed due to the flaky test suite 
`DataSourceWithHiveMetastoreCatalogSuite `. 
https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/

Although https://github.com/apache/spark/pull/20207 is unable to reproduce 
it in Spark 2.3, it sounds like the current DB of Spark's Catalog is changed 
based on the following stacktrace. Thus, we just need to reset it.

```
[info] DataSourceWithHiveMetastoreCatalogSuite:
02:40:39.486 ERROR org.apache.hadoop.hive.ql.parse.CalcitePlanner: 
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Table not found 't'
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1594)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1545)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10077)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:694)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:683)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:683)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:673)
at 
org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply$mcV$sp(HiveMetastoreCatalogSuite.scala:185)
at 
org.apache.spark.sql.test.SQLTestUtilsBase$class.withTable(SQLTestUtils.scala:273)
at 
org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite.withTable(HiveMetastoreCatalogSuite.scala:139)
at 
org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1.apply$mcV$sp(HiveMetastoreCatalogSuite.scala:163)
at 
org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1.apply(HiveMetastoreCatalogSuite.scala:163)
at 
org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1.apply(HiveMetastoreCatalogSuite.scala:163)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
at 

[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-03-03 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172037938
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
+
+  private final long[] array;
+
+  public OnHeapMemoryBlock(long[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) :
--- End diff --

Thank you for your explanation. Now, we focus on the path 3. I understand 
what you pointed out. How the following code should work?
```
val mb = new OnHeapMemoryBlock(new long[16]), Platform.LONG_ARRAY_OFFSET, 
8);
val mb1 = mb.subBlock(0, 96);
```

According to the name `subBlock()`, I understand that you would like to 
ensure `sub block's size <= super block's size` (`not (96 <= 8)`)` logically.
I would like to allow us to the above code since the array actually 
reserved enough space (`96 <= 128`). This is because I have already seen such a 
code when we would use `MemoryBlock` for `

I. Add additional method `block()` (or different name) with this assertion. 
Keep `subBlock()` with a new assert that you proposed.
2. Use another name `block()` (or different name) instead of `subBlock()` 
with this assertion.

WDYT? cc: @cloud-fan 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172037418
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
+
+  private final long[] array;
+
+  public OnHeapMemoryBlock(long[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) :
--- End diff --

Hi, @kiszk. I still doubt about this ```assert```'s correctness. Below is 
my understanding, please feel free to point my wrong spot out. 

This method ```public OnHeapMemoryBlock(long[] obj, long offset, long 
size)``` could be called from three paths:
1. called from ```public OnHeapMemoryBlock(long size)```
In this case, this ```assert``` certainly to be true, and ```offset``` is 
equal to ```Platform.LONG_ARRAY_OFFSET```.

2. called directly from other methods 
In this case, I think the caller will mostly set ```offset``` to be 
```Platform.LONG_ARRAY_OFFSET```. And the ```assert``` is also right, but the 
result due to caller's settings.

3. called from ``` public MemoryBlock subBlock(long offset, long size)```
In this case, a sub block can only use the memory which a super block could 
use, which is super block's ```size```, rather than ```obj.length * 8L``` (as 
we reach agreement before).
So, I think, here, the ```assert``` should be ```(offset - 
Platform.LONG_ARRAY_OFFSET + sub block's size <= super block's size)```. More 
precisely, we should ```assert(offset >=  Platform.LONG_ARRAY_OFFSET)```.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20373
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20373
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87939/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20373
  
**[Test build #87939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87939/testReport)**
 for PR 20373 at commit 
[`2d19f0a`](https://github.com/apache/spark/commit/2d19f0abd35d064e26c47d13a4719a7354845d87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20373
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1259/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20373
  
**[Test build #87939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87939/testReport)**
 for PR 20373 at commit 
[`2d19f0a`](https://github.com/apache/spark/commit/2d19f0abd35d064e26c47d13a4719a7354845d87).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20373
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20373
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20728
  
Usually I leave it open for few days so that I or other reviewers can check 
this change. I or other reviewers will leave some review comments, or leave an 
approval on this PR if it looks good without additional changes. Will try to 
guide you explicitly here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread mstewart141
Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20728
  
what should next step be here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...

2018-03-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20464#discussion_r172033021
  
--- Diff: R/pkg/R/column.R ---
@@ -169,7 +169,7 @@ setMethod("alias",
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
   function(x, start, stop) {
-jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
+jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
--- End diff --

I think we should do two things:
1. add to the func doc that the `start` param should be 0-base and to add 
to the example with the result
`collect(select(df, substr(df$a, 0, 5))) # this should give you...`



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...

2018-03-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20464#discussion_r172033037
  
--- Diff: docs/sparkr.md ---
@@ -663,3 +663,7 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - The `stringsAsFactors` parameter was previously ignored with `collect`, 
for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It 
has been corrected.
  - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
+
+## Upgrading to Spark 2.4.0
+
+ - The `start` parameter of `substr` method was wrongly subtracted by one, 
previously. This can lead to inconsistent substring results and also does not 
match with the behaviour with `substr` in R. It has been corrected.
--- End diff --

2. in the migration guide we should give a concrete example with non-0 
start index, eg.
`substr(df$a, 1, 6)` should be changed to `substr(df$a, 0, 5)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87938/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20728
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20728
  
**[Test build #87938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87938/testReport)**
 for PR 20728 at commit 
[`3cd53f3`](https://github.com/apache/spark/commit/3cd53f39f23ebd1b9b4134a9ac22348b301f8bd4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...

2018-03-03 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/18736
  
Closed as #18998 takes too long to wait.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...

2018-03-03 Thread facaiy
Github user facaiy closed the pull request at:

https://github.com/apache/spark/pull/18736


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2018-03-03 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17503
  
Colsed since its duplicate PR #20632 has been merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2018-03-03 Thread facaiy
Github user facaiy closed the pull request at:

https://github.com/apache/spark/pull/17503


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20728
  
**[Test build #87938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87938/testReport)**
 for PR 20728 at commit 
[`3cd53f3`](https://github.com/apache/spark/commit/3cd53f39f23ebd1b9b4134a9ac22348b301f8bd4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20728
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread mstewart141
Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20728
  
cc @HyukjinKwon 
👍 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20728
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20728
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work wi...

2018-03-03 Thread mstewart141
GitHub user mstewart141 opened a pull request:

https://github.com/apache/spark/pull/20728

[SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style 
type-annotated functions

## What changes were proposed in this pull request?

Check python version to determine whether to use `inspect.getargspec` or 
inspect.getfullargspec` before applying `pandas_udf` core logic to a function. 
The former is python2.7 (deprecated in python3) and the latter is python3.x. 
The latter correctly accounts for type annotations, which are syntax errors in 
python2.x.

## How was this patch tested?

Locally, on python 2.7 and 3.6.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mstewart141/spark pandas_udf_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20728.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20728


commit 3cd53f39f23ebd1b9b4134a9ac22348b301f8bd4
Author: Michael (Stu) Stewart 
Date:   2018-03-03T21:54:53Z

[SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style 
type-annotated functions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87937/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20705
  
**[Test build #87937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87937/testReport)**
 for PR 20705 at commit 
[`d9d2564`](https://github.com/apache/spark/commit/d9d2564923afad880374d355f2de92c2973e5514).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87933/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87933 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87933/testReport)**
 for PR 19222 at commit 
[`95ffd0f`](https://github.com/apache/spark/commit/95ffd0f4385423b87810977ba8d531eba644f2f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87936/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87936/testReport)**
 for PR 20716 at commit 
[`258bae4`](https://github.com/apache/spark/commit/258bae403b0b6e12fdfcad9ab1b805a8e61a8ac6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20705
  
**[Test build #87937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87937/testReport)**
 for PR 20705 at commit 
[`d9d2564`](https://github.com/apache/spark/commit/d9d2564923afad880374d355f2de92c2973e5514).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1258/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87936/testReport)**
 for PR 20716 at commit 
[`258bae4`](https://github.com/apache/spark/commit/258bae403b0b6e12fdfcad9ab1b805a8e61a8ac6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...

2018-03-03 Thread animenon
Github user animenon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20716#discussion_r172024390
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -612,9 +614,10 @@ def repartition(self, numPartitions, *cols):
 Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
 resulting DataFrame is hash partitioned.
 
-``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
-If it is a Column, it will be used as the first partitioning 
column. If not specified,
-the default number of partitions is used.
+:param numPartitions:
+can be an int to specify the target number of partitions or a 
Column.
--- End diff --

Cool, was maintaining consistency with other `:param` on page.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20705
  
For the above two JSON failures, they are `MetastoreDataSourcesSuite` and 
`PartitionedTablePerfStatsSuite`. They are not designed to `parquet`. They are 
general test cases. If we explicitly set it to `parquet`, we should change it 
when we test it to new data source again and again.
> How about we explicitly set spark.sql.sources.default to parquet for both 
test cases in #20705 (comment) too?

Instead, I think we had better file two JIRA issues as JSON improvement if 
that is feasible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172024213
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("data source table created in InMemoryCatalog should be able to 
read/write") {
 withTable("tbl") {
-  sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+  val provider = spark.sessionState.conf.defaultDataSourceName
--- End diff --

So far, the purpose of this PR is **setting once in `SQLConf.scala`** to 
order to test a new data source to find out the limitation instead of 
**touching every data suite**.

BTW, `spark.sql.sources.default=parquet` doesn't help this existing code 
because the SQL has a fixed string `USING parquet`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172024043
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("data source table created in InMemoryCatalog should be able to 
read/write") {
 withTable("tbl") {
-  sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+  val provider = spark.sessionState.conf.defaultDataSourceName
--- End diff --

This is `SQLQuerySuite`. The test case is correctly testing its 
**purpose**. Every data source have its own capability and limitation. Your 
example is only `text` data source's limitation supporting `a single column 
schema`, isn't it? For the other `csv/json/orc/parquet` will pass this specific 
test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172023909
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, 
**options):
or a DDL-formatted string (For example ``col0 INT, 
col1 DOUBLE``).
 :param options: all other string options
 
+>>> spark.conf.set("spark.sql.sources.default", "parquet")
--- End diff --

Yep. That was my first commit 
[here](https://github.com/apache/spark/pull/20705#discussion-diff-171729865R150).
 I'll rollback this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20716
  
BTW, it doesn't need a JIRA for minor fixes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20716#discussion_r172023219
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -612,9 +614,10 @@ def repartition(self, numPartitions, *cols):
 Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
 resulting DataFrame is hash partitioned.
 
-``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
-If it is a Column, it will be used as the first partitioning 
column. If not specified,
-the default number of partitions is used.
+:param numPartitions:
+can be an int to specify the target number of partitions or a 
Column.
--- End diff --

I think it's fine to:

```
:param numPartitions: can be an int to specify the target number of 
partitions or a Column.
If it is a Column, it will be used as the first partitioning 
column. If not specified,
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20716#discussion_r172023339
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -892,6 +896,8 @@ def colRegex(self, colName):
 def alias(self, alias):
 """Returns a new :class:`DataFrame` with an alias set.
 
+:param alias: string, alias names to be set for the DataFrame.
--- End diff --

`alias names` -> `an alias name`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172022980
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("data source table created in InMemoryCatalog should be able to 
read/write") {
 withTable("tbl") {
-  sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+  val provider = spark.sessionState.conf.defaultDataSourceName
--- End diff --

Hm .. how about just explicitly setting `spark.sql.sources.default` to 
`parquet` in all places rather than using the default? If it's set to, for 
example, `text`, this test becomes failed. I thought it's a bit odd a test it 
is dependent on a default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20705#discussion_r172022384
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, 
**options):
or a DDL-formatted string (For example ``col0 INT, 
col1 DOUBLE``).
 :param options: all other string options
 
+>>> spark.conf.set("spark.sql.sources.default", "parquet")
--- End diff --

Can we just call `format('parquet')` like the doctest for JSON below?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread animenon
Github user animenon commented on the issue:

https://github.com/apache/spark/pull/20716
  
@HyukjinKwon Found couple more, have updated the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87935/testReport)**
 for PR 20716 at commit 
[`91d8473`](https://github.com/apache/spark/commit/91d8473143bda28f073e21d0f54dbfa6f65f84b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87935/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-03 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20700
  
@rednaxelafx thanks, I integrated some of your changes into PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20705
  
For https://github.com/apache/spark/pull/20705#issuecomment-370119879, yup. 
JSON uses string for keys in MapType.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87935/testReport)**
 for PR 20716 at commit 
[`91d8473`](https://github.com/apache/spark/commit/91d8473143bda28f073e21d0f54dbfa6f65f84b9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87930/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87934/testReport)**
 for PR 20716 at commit 
[`98ef4a1`](https://github.com/apache/spark/commit/98ef4a1cee03798e73b2214f3f99e5daa62a5675).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87934/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19788
  
**[Test build #87930 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87930/testReport)**
 for PR 19788 at commit 
[`7009a5e`](https://github.com/apache/spark/commit/7009a5e3440f83c0ca663d70cd33fd1720db3e95).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87934/testReport)**
 for PR 20716 at commit 
[`98ef4a1`](https://github.com/apache/spark/commit/98ef4a1cee03798e73b2214f3f99e5daa62a5675).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87933/testReport)**
 for PR 19222 at commit 
[`95ffd0f`](https://github.com/apache/spark/commit/95ffd0f4385423b87810977ba8d531eba644f2f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172021036
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
+
+  private final long[] array;
+
+  public OnHeapMemoryBlock(long[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) :
+ "The size " + size + " and size " + size + " are larger than the 
array size " +
+   ((obj.length * 8L) - Platform.LONG_ARRAY_OFFSET);
+  }
+
+  public OnHeapMemoryBlock(long size) {
+this(new long[(int)((size + 7) / 8)], Platform.LONG_ARRAY_OFFSET, 
size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OnHeapMemoryBlock(array, this.offset + offset, size);
+  }
+
+  public long[] getLongArray() { return array; }
+
+  /**
+   * Creates a memory block pointing to the memory used by the long array.
+   */
+  public static OnHeapMemoryBlock fromArray(final long[] array) {
+return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, 
array.length * 8L);
+  }
+
+  public static OnHeapMemoryBlock fromArray(final long[] array, long size) 
{
+return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, size);
+  }
+
+  @Override
+  public final int getInt(long offset) {
+assert(offset + 4 - Platform.LONG_ARRAY_OFFSET <= array.length * 8L);
--- End diff --

Thank you very much for your comment. I should have to remove these 
assertions due to performance reason as other classes did.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172021011
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
+
+  private final long[] array;
+
+  public OnHeapMemoryBlock(long[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) :
--- End diff --

This condition is correct, but the message is not correct. Good catch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87932/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87932 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87932/testReport)**
 for PR 20716 at commit 
[`84d08e9`](https://github.com/apache/spark/commit/84d08e9bc9762b6d8c12a8779697c0f54d958156).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87931/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20716
  
**[Test build #87932 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87932/testReport)**
 for PR 20716 at commit 
[`84d08e9`](https://github.com/apache/spark/commit/84d08e9bc9762b6d8c12a8779697c0f54d958156).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87931/testReport)**
 for PR 19222 at commit 
[`4567781`](https://github.com/apache/spark/commit/4567781000618be81121140404df2a01430582ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87929/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20727
  
**[Test build #87929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87929/testReport)**
 for PR 20727 at commit 
[`93652b3`](https://github.com/apache/spark/commit/93652b349ca8107035465096370464053b558ef0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172019362
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
+
+  private final long[] array;
+
+  public OnHeapMemoryBlock(long[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) :
--- End diff --

So, ```obj.length * 8L``` should also change to ```size```, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172019312
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
+
+  private final long[] array;
+
+  public OnHeapMemoryBlock(long[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) :
+ "The size " + size + " and size " + size + " are larger than the 
array size " +
+   ((obj.length * 8L) - Platform.LONG_ARRAY_OFFSET);
+  }
+
+  public OnHeapMemoryBlock(long size) {
+this(new long[(int)((size + 7) / 8)], Platform.LONG_ARRAY_OFFSET, 
size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OnHeapMemoryBlock(array, this.offset + offset, size);
+  }
+
+  public long[] getLongArray() { return array; }
+
+  /**
+   * Creates a memory block pointing to the memory used by the long array.
+   */
+  public static OnHeapMemoryBlock fromArray(final long[] array) {
+return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, 
array.length * 8L);
+  }
+
+  public static OnHeapMemoryBlock fromArray(final long[] array, long size) 
{
+return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, size);
+  }
+
+  @Override
+  public final int getInt(long offset) {
+assert(offset + 4 - Platform.LONG_ARRAY_OFFSET <= array.length * 8L);
--- End diff --

So, in asserts in getXXX/putXXX methods, we should change ```array.length * 
8L``` to ```size()```, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-03 Thread animenon
Github user animenon commented on the issue:

https://github.com/apache/spark/pull/20716
  
@HyukjinKwon Sure, shall do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172018886
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -17,47 +17,168 @@
 
 package org.apache.spark.unsafe.memory;
 
-import javax.annotation.Nullable;
-
 import org.apache.spark.unsafe.Platform;
 
+import javax.annotation.Nullable;
+
 /**
- * A consecutive block of memory, starting at a {@link MemoryLocation} 
with a fixed size.
+ * A declaration of interfaces of MemoryBlock classes .
  */
-public class MemoryBlock extends MemoryLocation {
+public abstract class MemoryBlock {
+  @Nullable
+  protected final Object obj;
 
-  private final long length;
+  protected final long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = -1;
+  private int pageNumber = -1;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
--- End diff --

>@Ngone51 could you please elaborate on your idea about MemoryLocation 
while we do not have MemoryLocation now? I think that it would be good to have 
a final method for getBaseObject for performance.

Hi, @kiszk. Actually, this is a comment follow the discussion between 
@cloud-fan & @viirya :

> @cloud-fan : we need some document about it. Now we have the general 
MemoryBlock abstraction, do we still need to expose the baseObj and baseOffset 
concept?
@cloud-fan : BTW if we really need to expose them, shall we expose 
MemoryLocation instead?
@viirya Sounds reasonable. Should we still need MemoryLocation to expose a 
logical memory location?

(I have no idea why those comments hide)

And my suggestion is something like:
```
public abstract class MemoryBlock {
  MemoryLocation location;
  ...
}
```
rather than the original path:
```
public class MemoryBlock extends MemoryLocation {
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172019234
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 
   @Override
   public void free(MemoryBlock memory) {
-assert (memory.obj == null) :
+assert(memory instanceof OffHeapMemoryBlock);
+if (memory == OffHeapMemoryBlock.NULL) return;
+assert (memory.getBaseObject() == null) :
--- End diff --

> its object may not be``` null```

Hi, @kiszk. Actually, I think, here, object must be ```null```. Because I 
only see one constructor of ```OffHeapMemoryBlock ```:
```
public OffHeapMemoryBlock(long address, long size) {
   super(null, address, size);
 }
```
WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-03 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172019050
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
--- End diff --

I still suggest ```LongArrayMemoryBlock```.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87931/testReport)**
 for PR 19222 at commit 
[`4567781`](https://github.com/apache/spark/commit/4567781000618be81121140404df2a01430582ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >