[GitHub] spark pull request #20730: [SQL] [minor] XPathDouble prettyPrint should say ...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/20730 [SQL] [minor] XPathDouble prettyPrint should say 'double' not 'float' ## What changes were proposed in this pull request? It looks like this was incorrectly copied from `XPathFloat` in the class above. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark fix-typo-xpath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20730.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20730 commit c2f23350750019e49525eb5c133a9472cee0d08d Author: Eric LiangDate: 2018-03-04T07:55:47Z s/float/double --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1262/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87942/testReport)** for PR 19222 at commit [`291203c`](https://github.com/apache/spark/commit/291203cd9f473cd6a5a90858eeb6a19bb053fd5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172039217 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws OutOfMemoryError { @Override public void free(MemoryBlock memory) { -assert (memory.obj == null) : +assert(memory instanceof OffHeapMemoryBlock); +if (memory == OffHeapMemoryBlock.NULL) return; +assert (memory.getBaseObject() == null) : --- End diff -- I see. To remove this assertion would not change the behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172039153 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { --- End diff -- Thank you for your suggestion. I would like to hear other opinions, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1261/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #87941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87941/testReport)** for PR 19472 at commit [`158e37c`](https://github.com/apache/spark/commit/158e37c703a6b4c401178b12ba0adaa0cc6baeb1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Improve performance of UnsafeRow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87940/testReport)** for PR 19222 at commit [`9cbb876`](https://github.com/apache/spark/commit/9cbb8762e0cc3791580e0e316930479cc83b55d7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20729: [SPARK-23578][ML]Add multicolumn support for Bina...
Github user tengpeng closed the pull request at: https://github.com/apache/spark/pull/20729 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1260/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20729: [SPARK-23578][ML]Add multicolumn support for Bina...
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/20729 [SPARK-23578][ML]Add multicolumn support for Binarizer [Spark-20542] added an API that Bucketizer that can bin multiple columns. Based on this change, a multicolumn support is added for Binarizer. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tengpeng/spark Binarizer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20729.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20729 commit 9ca0f6eaf6744c090cab4ac6720cf11c9b83915e Author: gatorsmileDate: 2018-01-11T13:32:36Z [SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suite DataSourceWithHiveMetastoreCatalogSuite ## What changes were proposed in this pull request? The Spark 2.3 branch still failed due to the flaky test suite `DataSourceWithHiveMetastoreCatalogSuite `. https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.3-test-sbt-hadoop-2.6/ Although https://github.com/apache/spark/pull/20207 is unable to reproduce it in Spark 2.3, it sounds like the current DB of Spark's Catalog is changed based on the following stacktrace. Thus, we just need to reset it. ``` [info] DataSourceWithHiveMetastoreCatalogSuite: 02:40:39.486 ERROR org.apache.hadoop.hive.ql.parse.CalcitePlanner: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Table not found 't' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1594) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1545) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10077) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:694) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:683) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255) at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:683) at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:673) at org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1$$anonfun$apply$mcV$sp$3.apply$mcV$sp(HiveMetastoreCatalogSuite.scala:185) at org.apache.spark.sql.test.SQLTestUtilsBase$class.withTable(SQLTestUtils.scala:273) at org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite.withTable(HiveMetastoreCatalogSuite.scala:139) at org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1.apply$mcV$sp(HiveMetastoreCatalogSuite.scala:163) at org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1.apply(HiveMetastoreCatalogSuite.scala:163) at org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite$$anonfun$9$$anonfun$apply$1.apply(HiveMetastoreCatalogSuite.scala:163) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/20449 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172037938 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { + + private final long[] array; + + public OnHeapMemoryBlock(long[] obj, long offset, long size) { +super(obj, offset, size); +this.array = obj; +assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) : --- End diff -- Thank you for your explanation. Now, we focus on the path 3. I understand what you pointed out. How the following code should work? ``` val mb = new OnHeapMemoryBlock(new long[16]), Platform.LONG_ARRAY_OFFSET, 8); val mb1 = mb.subBlock(0, 96); ``` According to the name `subBlock()`, I understand that you would like to ensure `sub block's size <= super block's size` (`not (96 <= 8)`)` logically. I would like to allow us to the above code since the array actually reserved enough space (`96 <= 128`). This is because I have already seen such a code when we would use `MemoryBlock` for ` I. Add additional method `block()` (or different name) with this assertion. Keep `subBlock()` with a new assert that you proposed. 2. Use another name `block()` (or different name) instead of `subBlock()` with this assertion. WDYT? cc: @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172037418 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { + + private final long[] array; + + public OnHeapMemoryBlock(long[] obj, long offset, long size) { +super(obj, offset, size); +this.array = obj; +assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) : --- End diff -- Hi, @kiszk. I still doubt about this ```assert```'s correctness. Below is my understanding, please feel free to point my wrong spot out. This method ```public OnHeapMemoryBlock(long[] obj, long offset, long size)``` could be called from three paths: 1. called from ```public OnHeapMemoryBlock(long size)``` In this case, this ```assert``` certainly to be true, and ```offset``` is equal to ```Platform.LONG_ARRAY_OFFSET```. 2. called directly from other methods In this case, I think the caller will mostly set ```offset``` to be ```Platform.LONG_ARRAY_OFFSET```. And the ```assert``` is also right, but the result due to caller's settings. 3. called from ``` public MemoryBlock subBlock(long offset, long size)``` In this case, a sub block can only use the memory which a super block could use, which is super block's ```size```, rather than ```obj.length * 8L``` (as we reach agreement before). So, I think, here, the ```assert``` should be ```(offset - Platform.LONG_ARRAY_OFFSET + sub block's size <= super block's size)```. More precisely, we should ```assert(offset >= Platform.LONG_ARRAY_OFFSET)```. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20373 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87939/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20373 **[Test build #87939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87939/testReport)** for PR 20373 at commit [`2d19f0a`](https://github.com/apache/spark/commit/2d19f0abd35d064e26c47d13a4719a7354845d87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1259/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20373 **[Test build #87939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87939/testReport)** for PR 20373 at commit [`2d19f0a`](https://github.com/apache/spark/commit/2d19f0abd35d064e26c47d13a4719a7354845d87). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20373 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20373 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20728 Usually I leave it open for few days so that I or other reviewers can check this change. I or other reviewers will leave some review comments, or leave an approval on this PR if it looks good without additional changes. Will try to guide you explicitly here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user mstewart141 commented on the issue: https://github.com/apache/spark/pull/20728 what should next step be here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r172033021 --- Diff: R/pkg/R/column.R --- @@ -169,7 +169,7 @@ setMethod("alias", #' @note substr since 1.4.0 setMethod("substr", signature(x = "Column"), function(x, start, stop) { -jc <- callJMethod(x@jc, "substr", as.integer(start - 1), as.integer(stop - start + 1)) +jc <- callJMethod(x@jc, "substr", as.integer(start), as.integer(stop - start + 1)) --- End diff -- I think we should do two things: 1. add to the func doc that the `start` param should be 0-base and to add to the example with the result `collect(select(df, substr(df$a, 0, 5))) # this should give you...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r172033037 --- Diff: docs/sparkr.md --- @@ -663,3 +663,7 @@ You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-ma - The `stringsAsFactors` parameter was previously ignored with `collect`, for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It has been corrected. - For `summary`, option for statistics to compute has been added. Its output is changed from that from `describe`. - A warning can be raised if versions of SparkR package and the Spark JVM do not match. + +## Upgrading to Spark 2.4.0 + + - The `start` parameter of `substr` method was wrongly subtracted by one, previously. This can lead to inconsistent substring results and also does not match with the behaviour with `substr` in R. It has been corrected. --- End diff -- 2. in the migration guide we should give a concrete example with non-0 start index, eg. `substr(df$a, 1, 6)` should be changed to `substr(df$a, 0, 5)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87938/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20728 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20728 **[Test build #87938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87938/testReport)** for PR 20728 at commit [`3cd53f3`](https://github.com/apache/spark/commit/3cd53f39f23ebd1b9b4134a9ac22348b301f8bd4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18736: [SPARK-21481][ML] Add indexOf method for ml.feature.Hash...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18736 Closed as #18998 takes too long to wait. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18736: [SPARK-21481][ML] Add indexOf method for ml.featu...
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/18736 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 Colsed since its duplicate PR #20632 has been merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/17503 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20728 **[Test build #87938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87938/testReport)** for PR 20728 at commit [`3cd53f3`](https://github.com/apache/spark/commit/3cd53f39f23ebd1b9b4134a9ac22348b301f8bd4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20728 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user mstewart141 commented on the issue: https://github.com/apache/spark/pull/20728 cc @HyukjinKwon ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20728 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20728 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work wi...
GitHub user mstewart141 opened a pull request: https://github.com/apache/spark/pull/20728 [SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style type-annotated functions ## What changes were proposed in this pull request? Check python version to determine whether to use `inspect.getargspec` or inspect.getfullargspec` before applying `pandas_udf` core logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x. ## How was this patch tested? Locally, on python 2.7 and 3.6. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mstewart141/spark pandas_udf_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20728 commit 3cd53f39f23ebd1b9b4134a9ac22348b301f8bd4 Author: Michael (Stu) StewartDate: 2018-03-03T21:54:53Z [SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style type-annotated functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87937/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20705 **[Test build #87937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87937/testReport)** for PR 20705 at commit [`d9d2564`](https://github.com/apache/spark/commit/d9d2564923afad880374d355f2de92c2973e5514). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87933/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87933/testReport)** for PR 19222 at commit [`95ffd0f`](https://github.com/apache/spark/commit/95ffd0f4385423b87810977ba8d531eba644f2f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87936/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87936/testReport)** for PR 20716 at commit [`258bae4`](https://github.com/apache/spark/commit/258bae403b0b6e12fdfcad9ab1b805a8e61a8ac6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20705 **[Test build #87937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87937/testReport)** for PR 20705 at commit [`d9d2564`](https://github.com/apache/spark/commit/d9d2564923afad880374d355f2de92c2973e5514). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1258/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87936/testReport)** for PR 20716 at commit [`258bae4`](https://github.com/apache/spark/commit/258bae403b0b6e12fdfcad9ab1b805a8e61a8ac6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...
Github user animenon commented on a diff in the pull request: https://github.com/apache/spark/pull/20716#discussion_r172024390 --- Diff: python/pyspark/sql/dataframe.py --- @@ -612,9 +614,10 @@ def repartition(self, numPartitions, *cols): Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The resulting DataFrame is hash partitioned. -``numPartitions`` can be an int to specify the target number of partitions or a Column. -If it is a Column, it will be used as the first partitioning column. If not specified, -the default number of partitions is used. +:param numPartitions: +can be an int to specify the target number of partitions or a Column. --- End diff -- Cool, was maintaining consistency with other `:param` on page. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20705 For the above two JSON failures, they are `MetastoreDataSourcesSuite` and `PartitionedTablePerfStatsSuite`. They are not designed to `parquet`. They are general test cases. If we explicitly set it to `parquet`, we should change it when we test it to new data source again and again. > How about we explicitly set spark.sql.sources.default to parquet for both test cases in #20705 (comment) too? Instead, I think we had better file two JIRA issues as JSON improvement if that is feasible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172024213 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName --- End diff -- So far, the purpose of this PR is **setting once in `SQLConf.scala`** to order to test a new data source to find out the limitation instead of **touching every data suite**. BTW, `spark.sql.sources.default=parquet` doesn't help this existing code because the SQL has a fixed string `USING parquet`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172024043 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName --- End diff -- This is `SQLQuerySuite`. The test case is correctly testing its **purpose**. Every data source have its own capability and limitation. Your example is only `text` data source's limitation supporting `a single column schema`, isn't it? For the other `csv/json/orc/parquet` will pass this specific test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172023909 --- Diff: python/pyspark/sql/readwriter.py --- @@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, **options): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options +>>> spark.conf.set("spark.sql.sources.default", "parquet") --- End diff -- Yep. That was my first commit [here](https://github.com/apache/spark/pull/20705#discussion-diff-171729865R150). I'll rollback this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20716 BTW, it doesn't need a JIRA for minor fixes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20716#discussion_r172023219 --- Diff: python/pyspark/sql/dataframe.py --- @@ -612,9 +614,10 @@ def repartition(self, numPartitions, *cols): Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The resulting DataFrame is hash partitioned. -``numPartitions`` can be an int to specify the target number of partitions or a Column. -If it is a Column, it will be used as the first partitioning column. If not specified, -the default number of partitions is used. +:param numPartitions: +can be an int to specify the target number of partitions or a Column. --- End diff -- I think it's fine to: ``` :param numPartitions: can be an int to specify the target number of partitions or a Column. If it is a Column, it will be used as the first partitioning column. If not specified, ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20716#discussion_r172023339 --- Diff: python/pyspark/sql/dataframe.py --- @@ -892,6 +896,8 @@ def colRegex(self, colName): def alias(self, alias): """Returns a new :class:`DataFrame` with an alias set. +:param alias: string, alias names to be set for the DataFrame. --- End diff -- `alias names` -> `an alias name` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172022980 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2150,7 +2150,8 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { test("data source table created in InMemoryCatalog should be able to read/write") { withTable("tbl") { - sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") + val provider = spark.sessionState.conf.defaultDataSourceName --- End diff -- Hm .. how about just explicitly setting `spark.sql.sources.default` to `parquet` in all places rather than using the default? If it's set to, for example, `text`, this test becomes failed. I thought it's a bit odd a test it is dependent on a default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20705: [SPARK-23553][TESTS] Tests should not assume the ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20705#discussion_r172022384 --- Diff: python/pyspark/sql/readwriter.py --- @@ -147,6 +147,7 @@ def load(self, path=None, format=None, schema=None, **options): or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``). :param options: all other string options +>>> spark.conf.set("spark.sql.sources.default", "parquet") --- End diff -- Can we just call `format('parquet')` like the doctest for JSON below? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user animenon commented on the issue: https://github.com/apache/spark/pull/20716 @HyukjinKwon Found couple more, have updated the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87935/testReport)** for PR 20716 at commit [`91d8473`](https://github.com/apache/spark/commit/91d8473143bda28f073e21d0f54dbfa6f65f84b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87935/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20700 @rednaxelafx thanks, I integrated some of your changes into PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20705 For https://github.com/apache/spark/pull/20705#issuecomment-370119879, yup. JSON uses string for keys in MapType. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87935/testReport)** for PR 20716 at commit [`91d8473`](https://github.com/apache/spark/commit/91d8473143bda28f073e21d0f54dbfa6f65f84b9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87930/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87934/testReport)** for PR 20716 at commit [`98ef4a1`](https://github.com/apache/spark/commit/98ef4a1cee03798e73b2214f3f99e5daa62a5675). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87934/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #87930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87930/testReport)** for PR 19788 at commit [`7009a5e`](https://github.com/apache/spark/commit/7009a5e3440f83c0ca663d70cd33fd1720db3e95). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87934/testReport)** for PR 20716 at commit [`98ef4a1`](https://github.com/apache/spark/commit/98ef4a1cee03798e73b2214f3f99e5daa62a5675). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87933/testReport)** for PR 19222 at commit [`95ffd0f`](https://github.com/apache/spark/commit/95ffd0f4385423b87810977ba8d531eba644f2f5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172021036 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { + + private final long[] array; + + public OnHeapMemoryBlock(long[] obj, long offset, long size) { +super(obj, offset, size); +this.array = obj; +assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) : + "The size " + size + " and size " + size + " are larger than the array size " + + ((obj.length * 8L) - Platform.LONG_ARRAY_OFFSET); + } + + public OnHeapMemoryBlock(long size) { +this(new long[(int)((size + 7) / 8)], Platform.LONG_ARRAY_OFFSET, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OnHeapMemoryBlock(array, this.offset + offset, size); + } + + public long[] getLongArray() { return array; } + + /** + * Creates a memory block pointing to the memory used by the long array. + */ + public static OnHeapMemoryBlock fromArray(final long[] array) { +return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + } + + public static OnHeapMemoryBlock fromArray(final long[] array, long size) { +return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, size); + } + + @Override + public final int getInt(long offset) { +assert(offset + 4 - Platform.LONG_ARRAY_OFFSET <= array.length * 8L); --- End diff -- Thank you very much for your comment. I should have to remove these assertions due to performance reason as other classes did. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172021011 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { + + private final long[] array; + + public OnHeapMemoryBlock(long[] obj, long offset, long size) { +super(obj, offset, size); +this.array = obj; +assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) : --- End diff -- This condition is correct, but the message is not correct. Good catch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87932/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87932/testReport)** for PR 20716 at commit [`84d08e9`](https://github.com/apache/spark/commit/84d08e9bc9762b6d8c12a8779697c0f54d958156). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87931/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20716 **[Test build #87932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87932/testReport)** for PR 20716 at commit [`84d08e9`](https://github.com/apache/spark/commit/84d08e9bc9762b6d8c12a8779697c0f54d958156). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87931/testReport)** for PR 19222 at commit [`4567781`](https://github.com/apache/spark/commit/4567781000618be81121140404df2a01430582ea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87929/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #87929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87929/testReport)** for PR 20727 at commit [`93652b3`](https://github.com/apache/spark/commit/93652b349ca8107035465096370464053b558ef0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172019362 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { + + private final long[] array; + + public OnHeapMemoryBlock(long[] obj, long offset, long size) { +super(obj, offset, size); +this.array = obj; +assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) : --- End diff -- So, ```obj.length * 8L``` should also change to ```size```, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172019312 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { + + private final long[] array; + + public OnHeapMemoryBlock(long[] obj, long offset, long size) { +super(obj, offset, size); +this.array = obj; +assert(offset - Platform.LONG_ARRAY_OFFSET + size <= obj.length * 8L) : + "The size " + size + " and size " + size + " are larger than the array size " + + ((obj.length * 8L) - Platform.LONG_ARRAY_OFFSET); + } + + public OnHeapMemoryBlock(long size) { +this(new long[(int)((size + 7) / 8)], Platform.LONG_ARRAY_OFFSET, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OnHeapMemoryBlock(array, this.offset + offset, size); + } + + public long[] getLongArray() { return array; } + + /** + * Creates a memory block pointing to the memory used by the long array. + */ + public static OnHeapMemoryBlock fromArray(final long[] array) { +return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + } + + public static OnHeapMemoryBlock fromArray(final long[] array, long size) { +return new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, size); + } + + @Override + public final int getInt(long offset) { +assert(offset + 4 - Platform.LONG_ARRAY_OFFSET <= array.length * 8L); --- End diff -- So, in asserts in getXXX/putXXX methods, we should change ```array.length * 8L``` to ```size()```, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user animenon commented on the issue: https://github.com/apache/spark/pull/20716 @HyukjinKwon Sure, shall do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172018886 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { --- End diff -- >@Ngone51 could you please elaborate on your idea about MemoryLocation while we do not have MemoryLocation now? I think that it would be good to have a final method for getBaseObject for performance. Hi, @kiszk. Actually, this is a comment follow the discussion between @cloud-fan & @viirya : > @cloud-fan : we need some document about it. Now we have the general MemoryBlock abstraction, do we still need to expose the baseObj and baseOffset concept? @cloud-fan : BTW if we really need to expose them, shall we expose MemoryLocation instead? @viirya Sounds reasonable. Should we still need MemoryLocation to expose a logical memory location? ï¼I have no idea why those comments hideï¼ And my suggestion is something like: ``` public abstract class MemoryBlock { MemoryLocation location; ... } ``` rather than the original path: ``` public class MemoryBlock extends MemoryLocation { ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172019234 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws OutOfMemoryError { @Override public void free(MemoryBlock memory) { -assert (memory.obj == null) : +assert(memory instanceof OffHeapMemoryBlock); +if (memory == OffHeapMemoryBlock.NULL) return; +assert (memory.getBaseObject() == null) : --- End diff -- > its object may not be``` null``` Hi, @kiszk. Actually, I think, here, object must be ```null```. Because I only see one constructor of ```OffHeapMemoryBlock ```: ``` public OffHeapMemoryBlock(long address, long size) { super(null, address, size); } ``` WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r172019050 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +/** + * A consecutive block of memory with a long array on Java heap. + */ +public final class OnHeapMemoryBlock extends MemoryBlock { --- End diff -- I still suggest ```LongArrayMemoryBlock```. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87931/testReport)** for PR 19222 at commit [`4567781`](https://github.com/apache/spark/commit/4567781000618be81121140404df2a01430582ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org