[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147571183 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate the same type of MemoryBlock with new offset and size + */ + public abstract MemoryBlock allocate(long offset, long size); --- End diff -- I think that an issue is the name of this method. This `allocate()` method does not allocate actual memory region while `MemoryAllocator.allocate()` allocate a memory region. This `allocate()` method just allocates a `MemoryBlock()`. How about renaming `MemoryBlock.allocate()` to `MemoryBlock.allocateMemoryBlock()` instead of moving this to `MemoryAllocator`? What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19596 Yea definitely. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83178/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18544 **[Test build #83178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83178/testReport)** for PR 18544 at commit [`95a946e`](https://github.com/apache/spark/commit/95a946e75a256c5651f667667f8a1ef8514dbb83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18251 **[Test build #83180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83180/testReport)** for PR 18251 at commit [`d4e3021`](https://github.com/apache/spark/commit/d4e30212552900a45913c7f1866dbfbe49d57317). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147570235 --- Diff: project/MimaExcludes.scala --- @@ -36,6 +36,8 @@ object MimaExcludes { // Exclude rules for 2.3.x lazy val v23excludes = v22excludes ++ Seq( +// SPARK-18085: Better History Server scalability for many / large applications + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.status.api.v1.ExecutorSummary.executorLogs"), --- End diff -- ah i see: `-import scala.collection.Map`. But is this really necessary to break the compatibility? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user jmchung commented on the issue: https://github.com/apache/spark/pull/19567 Thanks @viirya, the title has been changed. Please correct me if the modified title is still inappropriate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19596 cc @rxin, do you think it's okay to expose PySpark's Catalog class API doc? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19601 **[Test build #83179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83179/testReport)** for PR 19601 at commit [`80b9e31`](https://github.com/apache/spark/commit/80b9e319211765807766e5cf70e995bdbbebf22e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19601 [SPARK-22383][SQL] Generate code to directly get value of primitive type array from ColumnVector for table cache ## What changes were proposed in this pull request? This PR generates the Java code to directly get a value for a primitive type array in ColumnVector without using an iterator for table cache (e.g. dataframe.cache). This PR improves runtime performance by eliminating data copy from column-oriented storage to InternalRow in a SpecificColumnarIterator iterator for primitive type. This is a follow-up PR of #18747. Benchmark result: **21.4x** ``` OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Filter for int primitive array with cache: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative InternalRow codegen 1368 / 1887 23.0 43.5 1.0X Filter for int primitive array with cache: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative ColumnVector codegen64 / 90488.1 2.0 1.0X ``` Benchmark program ``` intArrayBenchmark(sqlContext, 1024 * 1024 * 20) def intArrayBenchmark(sqlContext: SQLContext, values: Int, iters: Int = 20): Unit = { import sqlContext.implicits._ val benchmarkPT = new Benchmark("Filter for int primitive array with cache", values, iters) val df = sqlContext.sparkContext.parallelize(0 to ROWS, 1) .map(i => Array.range(i, values)).toDF("a").cache df.count // force to create df.cache val str = "ColumnVector" var c: Long = 0 benchmarkPT.addCase(s"$str codegen") { iter => c += df.filter(s"a[${values/2}] % 10 = 0").count } benchmarkPT.run() df.unpersist(true) System.gc() } ``` ## How was this patch tested? Added test cases into `ColumnVectorSuite`, `DataFrameTungstenSuite`, and `WholeStageCodegenSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-22383 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19601.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19601 commit 80b9e319211765807766e5cf70e995bdbbebf22e Author: Kazuaki IshizakiDate: 2017-10-29T03:28:06Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19598 @gatorsmile could you please review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19567: [SPARK-22291][SQL] PostgreSQL UUID[] to StringTyp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19567#discussion_r147569347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -456,8 +456,17 @@ object JdbcUtils extends Logging { case StringType => (array: Object) => -array.asInstanceOf[Array[java.lang.String]] - .map(UTF8String.fromString) +array match { + case _: Array[java.lang.String] => +array.asInstanceOf[Array[java.lang.String]] + .map(UTF8String.fromString) + case _: Array[java.util.UUID] => +array.asInstanceOf[Array[java.util.UUID]] --- End diff -- Ok. Then we remove this UUID pattern and only have used `Array[java.lang.Object]`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] PostgreSQL UUID[] to StringType: Conv...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19567 This is not just for UUID[] for now. We should update the PR title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147569230 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -0,0 +1,531 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status + +import java.util.Date + +import scala.collection.mutable.HashMap + +import org.apache.spark._ +import org.apache.spark.executor.TaskMetrics +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.status.api.v1 +import org.apache.spark.storage._ +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.kvstore.KVStore + +/** + * A Spark listener that writes application information to a data store. The types written to the + * store are defined in the `storeTypes.scala` file and are based on the public REST API. + */ +private class AppStatusListener(kvstore: KVStore) extends SparkListener with Logging { + + private var sparkVersion = SPARK_VERSION + private var appInfo: v1.ApplicationInfo = null + private var coresPerTask: Int = 1 + + // Keep track of live entities, so that task metrics can be efficiently updated (without + // causing too many writes to the underlying store, and other expensive operations). + private val liveStages = new HashMap[(Int, Int), LiveStage]() + private val liveJobs = new HashMap[Int, LiveJob]() + private val liveExecutors = new HashMap[String, LiveExecutor]() + private val liveTasks = new HashMap[Long, LiveTask]() + private val liveRDDs = new HashMap[Int, LiveRDD]() + + override def onOtherEvent(event: SparkListenerEvent): Unit = event match { +case SparkListenerLogStart(version) => sparkVersion = version +case _ => + } + + override def onApplicationStart(event: SparkListenerApplicationStart): Unit = { +assert(event.appId.isDefined, "Application without IDs are not supported.") + +val attempt = new v1.ApplicationAttemptInfo( + event.appAttemptId, + new Date(event.time), + new Date(-1), + new Date(event.time), + -1L, + event.sparkUser, + false, + sparkVersion) + +appInfo = new v1.ApplicationInfo( + event.appId.get, + event.appName, + None, + None, + None, + None, + Seq(attempt)) + +kvstore.write(new ApplicationInfoWrapper(appInfo)) + } + + override def onApplicationEnd(event: SparkListenerApplicationEnd): Unit = { +val old = appInfo.attempts.head +val attempt = new v1.ApplicationAttemptInfo( + old.attemptId, + old.startTime, + new Date(event.time), + new Date(event.time), + event.time - old.startTime.getTime(), + old.sparkUser, + true, + old.appSparkVersion) + +appInfo = new v1.ApplicationInfo( + appInfo.id, + appInfo.name, + None, + None, + None, + None, + Seq(attempt)) +kvstore.write(new ApplicationInfoWrapper(appInfo)) + } + + override def onExecutorAdded(event: SparkListenerExecutorAdded): Unit = { +// This needs to be an update in case an executor re-registers after the driver has +// marked it as "dead". +val exec = getOrCreateExecutor(event.executorId) +exec.host = event.executorInfo.executorHost +exec.isActive = true +exec.totalCores = event.executorInfo.totalCores +exec.maxTasks = event.executorInfo.totalCores / coresPerTask +exec.executorLogs = event.executorInfo.logUrlMap +update(exec) + } + + override def onExecutorRemoved(event: SparkListenerExecutorRemoved): Unit = { +liveExecutors.remove(event.executorId).foreach { exec => + exec.isActive = false + update(exec) +} + } + + override
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147569232 --- Diff: project/MimaExcludes.scala --- @@ -36,6 +36,8 @@ object MimaExcludes { // Exclude rules for 2.3.x lazy val v23excludes = v22excludes ++ Seq( +// SPARK-18085: Better History Server scalability for many / large applications + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.status.api.v1.ExecutorSummary.executorLogs"), --- End diff -- I removed an import and that changed the type of one of the fields. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147569226 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -129,9 +129,13 @@ class RDDDataDistribution private[spark]( val memoryUsed: Long, val memoryRemaining: Long, val diskUsed: Long, +@JsonDeserialize(contentAs = classOf[JLong]) --- End diff -- See jackson documentation. It tells jackson to deserialize the contents of a container as a specific type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/18544 fixed @gatorsmile . retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18544 **[Test build #83178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83178/testReport)** for PR 18544 at commit [`95a946e`](https://github.com/apache/spark/commit/95a946e75a256c5651f667667f8a1ef8514dbb83). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19576: [SPARK-19727][SQL][followup] Fix for round functi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19576 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19600: Added more information to Imputer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19600 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19576: [SPARK-19727][SQL][followup] Fix for round function that...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19576 Thanks! Merged to master/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19600: Added more information to Imputer
GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/19600 Added more information to Imputer Often times we want to impute custom values other than 'NaN'. My addition helps people locate this function without reading the API. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tengpeng/spark patch-5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19600.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19600 commit 2350fc1de3eaa0a3aafd7aba11c5d99b39d4c606 Author: tengpengDate: 2017-10-29T01:24:38Z Added more information to Imputer Often times we want to impute custom values other than 'NaN'. My addition helps people locate this function without reading the API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19576: [SPARK-19727][SQL][followup] Fix for round function that...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19576 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19576: [SPARK-19727][SQL][followup] Fix for round function that...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83176/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19576: [SPARK-19727][SQL][followup] Fix for round function that...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19576 **[Test build #83176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83176/testReport)** for PR 19576 at commit [`2fe8eda`](https://github.com/apache/spark/commit/2fe8eda698ec5c4bec549bf722e1ed4e96cf7313). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18937: [MINOR] Remove false comment from planStreamingAg...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18937 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18937 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83175/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18937 **[Test build #83175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83175/testReport)** for PR 18937 at commit [`28919cc`](https://github.com/apache/spark/commit/28919cc9dee8408612d94e2e03be5e5fbbc076e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19589: [SPARKR][SPARK-22344] Set java.io.tmpdir for SparkR test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19589 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83177/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19589: [SPARKR][SPARK-22344] Set java.io.tmpdir for SparkR test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19589 **[Test build #83177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83177/testReport)** for PR 19589 at commit [`9c8afea`](https://github.com/apache/spark/commit/9c8afea8d9e430f223678be0335f474f003565dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19589: [SPARKR][SPARK-22344] Set java.io.tmpdir for SparkR test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19589 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16812: [SPARK-19465][SQL] Added options for custom boolean valu...
Github user dhunziker commented on the issue: https://github.com/apache/spark/pull/16812 Oracle doesn't have boolean so it's usually modelled as char(1) with Y/N, Sybase doesn't have boolean either but bit which is 1/0. PosgreSQL supports a range of values (https://www.postgresql.org/docs/10/static/datatype-boolean.html). In regards to CSV parsers I found the same feature in SuperCSV and uniVocity (linked above). I think it's a very useful and common feature similar to custom date formats or null strings, but as mentioned before, not something that can't be worked around. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147566861 --- Diff: project/MimaExcludes.scala --- @@ -36,6 +36,8 @@ object MimaExcludes { // Exclude rules for 2.3.x lazy val v23excludes = v22excludes ++ Seq( +// SPARK-18085: Better History Server scalability for many / large applications + ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.status.api.v1.ExecutorSummary.executorLogs"), --- End diff -- what's going on here? I don't see this PR touch the code of `ExecutorSummary` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147566819 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -0,0 +1,531 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status + +import java.util.Date + +import scala.collection.mutable.HashMap + +import org.apache.spark._ +import org.apache.spark.executor.TaskMetrics +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.status.api.v1 +import org.apache.spark.storage._ +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.kvstore.KVStore + +/** + * A Spark listener that writes application information to a data store. The types written to the + * store are defined in the `storeTypes.scala` file and are based on the public REST API. + */ +private class AppStatusListener(kvstore: KVStore) extends SparkListener with Logging { + + private var sparkVersion = SPARK_VERSION + private var appInfo: v1.ApplicationInfo = null + private var coresPerTask: Int = 1 + + // Keep track of live entities, so that task metrics can be efficiently updated (without + // causing too many writes to the underlying store, and other expensive operations). + private val liveStages = new HashMap[(Int, Int), LiveStage]() + private val liveJobs = new HashMap[Int, LiveJob]() + private val liveExecutors = new HashMap[String, LiveExecutor]() + private val liveTasks = new HashMap[Long, LiveTask]() + private val liveRDDs = new HashMap[Int, LiveRDD]() + + override def onOtherEvent(event: SparkListenerEvent): Unit = event match { +case SparkListenerLogStart(version) => sparkVersion = version +case _ => + } + + override def onApplicationStart(event: SparkListenerApplicationStart): Unit = { +assert(event.appId.isDefined, "Application without IDs are not supported.") + +val attempt = new v1.ApplicationAttemptInfo( + event.appAttemptId, + new Date(event.time), + new Date(-1), + new Date(event.time), + -1L, + event.sparkUser, + false, + sparkVersion) + +appInfo = new v1.ApplicationInfo( + event.appId.get, + event.appName, + None, + None, + None, + None, + Seq(attempt)) + +kvstore.write(new ApplicationInfoWrapper(appInfo)) + } + + override def onApplicationEnd(event: SparkListenerApplicationEnd): Unit = { +val old = appInfo.attempts.head +val attempt = new v1.ApplicationAttemptInfo( + old.attemptId, + old.startTime, + new Date(event.time), + new Date(event.time), + event.time - old.startTime.getTime(), + old.sparkUser, + true, + old.appSparkVersion) + +appInfo = new v1.ApplicationInfo( + appInfo.id, + appInfo.name, + None, + None, + None, + None, + Seq(attempt)) +kvstore.write(new ApplicationInfoWrapper(appInfo)) + } + + override def onExecutorAdded(event: SparkListenerExecutorAdded): Unit = { +// This needs to be an update in case an executor re-registers after the driver has +// marked it as "dead". +val exec = getOrCreateExecutor(event.executorId) +exec.host = event.executorInfo.executorHost +exec.isActive = true +exec.totalCores = event.executorInfo.totalCores +exec.maxTasks = event.executorInfo.totalCores / coresPerTask +exec.executorLogs = event.executorInfo.logUrlMap +update(exec) + } + + override def onExecutorRemoved(event: SparkListenerExecutorRemoved): Unit = { +liveExecutors.remove(event.executorId).foreach { exec => + exec.isActive = false + update(exec) +} + } + +
[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19383#discussion_r147566828 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -129,9 +129,13 @@ class RDDDataDistribution private[spark]( val memoryUsed: Long, val memoryRemaining: Long, val diskUsed: Long, +@JsonDeserialize(contentAs = classOf[JLong]) --- End diff -- what does this mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19589: [SPARKR][SPARK-22344] Set java.io.tmpdir for SparkR test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19589 **[Test build #83177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83177/testReport)** for PR 19589 at commit [`9c8afea`](https://github.com/apache/spark/commit/9c8afea8d9e430f223678be0335f474f003565dd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19571: [SPARK-15474][SQL] Write and read back non-emtpy schema ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19571 Sorry I miss-understood the problem at the beginning. I thought the new orc version just changes the existing APIs a little, but it turns out the new orc version has a new set of read/write APIs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566301 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java --- @@ -33,15 +32,13 @@ private static final long WIDTH = 8; private final MemoryBlock memory; - private final Object baseObj; private final long baseOffset; private final long length; public LongArray(MemoryBlock memory) { assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size >= Integer.MAX_VALUE elements"; this.memory = memory; -this.baseObj = memory.getBaseObject(); this.baseOffset = memory.getBaseOffset(); --- End diff -- do we still need this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566248 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -418,7 +419,10 @@ public UTF8String getUTF8String(int ordinal) { final long offsetAndSize = getLong(ordinal); final int offset = (int) (offsetAndSize >> 32); final int size = (int) offsetAndSize; -return UTF8String.fromAddress(baseObject, baseOffset + offset, size); +MemoryBlock mb = (baseObject instanceof byte[]) ? + new ByteArrayMemoryBlock((byte[]) baseObject, baseOffset + offset, size) : + new LongArrayMemoryBlock((long[]) baseObject, baseOffset + offset, size); --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566242 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -230,7 +233,10 @@ public UTF8String getUTF8String(int ordinal) { final long offsetAndSize = getLong(ordinal); final int offset = (int) (offsetAndSize >> 32); final int size = (int) offsetAndSize; -return UTF8String.fromAddress(baseObject, baseOffset + offset, size); +MemoryBlock mb = (baseObject instanceof byte[]) ? + new ByteArrayMemoryBlock((byte[]) baseObject, baseOffset + offset, size) : + new LongArrayMemoryBlock((long[]) baseObject, baseOffset + offset, size); --- End diff -- shouldn't this logic belong to `UTF8String.fromAddress`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566212 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -50,11 +52,11 @@ // These are only updated by readExternal() or read() @Nonnull - private Object base; + private MemoryBlock base; private long offset; --- End diff -- I think `MemoryBlock` contains offset? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566203 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate the same type of MemoryBlock with new offset and size + */ + public abstract MemoryBlock allocate(long offset, long size); --- End diff -- It's weird to have this interface. I think `MemoryAllocator` is the right place to do it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566177 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { --- End diff -- shall we add a `free` interface? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566155 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -19,28 +19,48 @@ import org.apache.spark.unsafe.Platform; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.nio.ByteBuffer; + +import sun.nio.ch.DirectBuffer; + /** * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate off-heap memory. */ public class UnsafeMemoryAllocator implements MemoryAllocator { @Override - public MemoryBlock allocate(long size) throws OutOfMemoryError { -long address = Platform.allocateMemory(size); -MemoryBlock memory = new MemoryBlock(null, address, size); -if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) { - memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE); + public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError { +Object buffer = ByteBuffer.allocateDirect((int)size); --- End diff -- what's the different between `Platform.allocateMemory` and `ByteBuffer.allocateDirect`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566118 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + private Object directBuffer; + private final long address; + + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(null, 0, 0); + + public OffHeapMemoryBlock(Object directBuffer, long address, long size) { +super(null, address, size); +this.address = address; +this.directBuffer = directBuffer; --- End diff -- shall we just `assert directBuffer == null` and remove the `directBuffer` member? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566106 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + private Object directBuffer; --- End diff -- where do we use this `directBuffer`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566098 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + private Object directBuffer; + private final long address; + + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(null, 0, 0); --- End diff -- shall we call it `EMPTY`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566082 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { --- End diff -- BTW if we really need to expose them, shall we expose `MemoryLocation` instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147566059 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -73,6 +73,12 @@ public MemoryBlock allocate(long size) throws OutOfMemoryError { @Override public void free(MemoryBlock memory) { +assert(memory instanceof ByteArrayMemoryBlock || memory instanceof IntArrayMemoryBlock || --- End diff -- we expect to only see `LongArrayMemoryBlock` here, don't we? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147565964 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate the same type of MemoryBlock with new offset and size + */ + public abstract MemoryBlock allocate(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public abstract Object getObjectVolatile(long offset); --- End diff -- document? I don't understand what it does by looking at the method name --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147565937 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate the same type of MemoryBlock with new offset and size + */ + public abstract MemoryBlock allocate(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public abstract Object getObjectVolatile(long offset); + + public abstract void putObjectVolatile(long offset, Object value); + + public static final void copyMemory( --- End diff -- why these are static methods? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147565897 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { --- End diff -- we need some document about it. Now we have the general `MemoryBlock` abstraction, do we still need to expose the `baseObj` and `baseOffset` concept? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147565847 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; --- End diff -- wrong import order --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147565833 --- Diff: common/unsafe/src/main/java/org/apache/spark/sql/catalyst/expressions/HiveHasher.java --- @@ -38,6 +39,10 @@ public static int hashLong(long input) { return (int) ((input >>> 32) ^ input); } + public static int hashUnsafeBytesBlock(MemoryBlock base, long offset, int lengthInBytes) { --- End diff -- why we call it `hashUnsafeBytesBlock` while passing in a general `MemoryBlock`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19586: [SPARK-22367][CORE] Separate the serialization of class ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19586 looking at the `SerializationStream` interface, I think it's designed for read/write objects of different classes, so your optimization should not be applied there. Instead, I think we should introduce `SerializerInstance#serializeStreamForClass[T]`, which returns `ClassSpecificSerializationStream[T]` that is designed for writing objects of same class. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19586: [SPARK-22367][CORE] Separate the serialization of...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19586#discussion_r147565429 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -205,11 +205,45 @@ class KryoSerializationStream( private[this] var kryo: Kryo = serInstance.borrowKryo() + // This is only used when we write object and class separately. + var classWrote = false + override def writeObject[T: ClassTag](t: T): SerializationStream = { kryo.writeClassAndObject(output, t) --- End diff -- I was expecting kryo to buffer the distinct classes and only store an identifier/pointer for duplicated classes. Even if we write object and class every time, the overhead should be small. This is not true? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19586: [SPARK-22367][CORE] Separate the serialization of...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19586#discussion_r147565331 --- Diff: pom.xml --- @@ -133,7 +133,7 @@ 1.6.0 9.3.20.v20170531 3.1.0 -0.8.4 +0.9.2 --- End diff -- library upgrading deserves a separated PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19586: [SPARK-22367][CORE] Separate the serialization of...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19586#discussion_r147565323 --- Diff: pom.xml --- @@ -133,7 +133,7 @@ 1.6.0 9.3.20.v20170531 3.1.0 -0.8.4 +0.9.2 --- End diff -- please change it back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19576: [SPARK-19727][SQL][followup] Fix for round functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19576#discussion_r147565256 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala --- @@ -258,6 +258,18 @@ class MathFunctionsSuite extends QueryTest with SharedSQLContext { ) } + test("round/bround with table columns") { --- End diff -- All the tests in `MathExpressionsSuite` are testing about literals, I think it's better to improve `MathExpressionsSuite` for attributes in a new PR, instead of doing it in a folllow-up PR. BTW the original PR didn't add test in `MathExpressionsSuite` either. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19576: [SPARK-19727][SQL][followup] Fix for round function that...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19576 **[Test build #83176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83176/testReport)** for PR 19576 at commit [`2fe8eda`](https://github.com/apache/spark/commit/2fe8eda698ec5c4bec549bf722e1ed4e96cf7313). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18937 **[Test build #83175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83175/testReport)** for PR 18937 at commit [`28919cc`](https://github.com/apache/spark/commit/28919cc9dee8408612d94e2e03be5e5fbbc076e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18937 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83172/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19597 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19597 **[Test build #83172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83172/testReport)** for PR 19597 at commit [`59efcb8`](https://github.com/apache/spark/commit/59efcb8423b3341942c48ffdd82896ae369ce41b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV * Specialized version of `Param[Array[String]]` for Java. */ @DeveloperApi +private[ml] class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean) + extends Param[String](parent, name, doc, isValid) { + + private var options: Option[Array[String]] = None + + def this(parent: Params, name: String, doc: String) = +this(parent, name, doc, ParamValidators.alwaysTrue) + + /** construct a StringParam with limited options (case-insensitive) */ + def this(parent: Params, name: String, doc: String, options: Array[String]) = { +this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.", + s => options.exists(s.equalsIgnoreCase)) --- End diff -- Though `options.exists(s.equalsIgnoreCase))` and `options.map(_.toLowerCase(Locale.ROOT)).contains(s.toLowerCase(Locale.ROOT))` should be essentially the same. IMO we still need to stick to just one of them thoroughly, which also depends on how to do param comparison (switch) in Estimators. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562823 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -133,7 +134,7 @@ class NaiveBayes @Since("1.5.0") ( s" numClasses=$numClasses, but thresholds has length ${$(thresholds).length}") } -val modelTypeValue = $(modelType) +val modelTypeValue = $lc(modelType) --- End diff -- option 1: Compare StringParam and candidates after converting them to lower case. This requires that all the String Constants like `NaiveBayes.Multinomial` to be lower case. I don't like writing xxx.toLowerCase(Locale.ROOT) everywhere, so I created the $lc in Params to centralize it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562645 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String elasticNetParam, fitIntercept, maxIter, regParam, standardization, aggregationDepth) instr.logNumFeatures(numFeatures) -if (($(solver) == Auto && - numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) == Normal) { +if (($(solver).equalsIgnoreCase(Auto) && numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) + || $(solver).equalsIgnoreCase(Normal)) { --- End diff -- Option 2: Compare StringParam and candidates using `equalsIgnoreCase`. This may be harder for match cases. str match { case s if str.equalsIgnoreCase("HELLO") => } --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19599#discussion_r147562530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV * Specialized version of `Param[Array[String]]` for Java. --- End diff -- Eh, this need to be updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83171/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19597 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19597 **[Test build #83171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83171/testReport)** for PR 19597 at commit [`9568906`](https://github.com/apache/spark/commit/956890612db181f5f470593f10fe02048cecaf03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83173/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19598 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19598 **[Test build #83173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83173/testReport)** for PR 19598 at commit [`6057309`](https://github.com/apache/spark/commit/6057309a0cc98d0b480d468107fadd1b2a44a1c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18747: [SPARK-20822][SQL] Generate code to directly get value f...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18747 @cloud-fan I updated benchmark result (2.9x) in the description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83174/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19599 **[Test build #83174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83174/testReport)** for PR 19599 at commit [`f6b918e`](https://github.com/apache/spark/commit/f6b918eda3448df054edd81a077fa1a3d996d5c1). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19599 **[Test build #83174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83174/testReport)** for PR 19599 at commit [`f6b918e`](https://github.com/apache/spark/commit/f6b918eda3448df054edd81a077fa1a3d996d5c1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/19599 [SPARK-22381] [ML] Add StringParam that supports valid options ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-22381 During test with https://issues.apache.org/jira/browse/SPARK-22331, I found it might be a good idea to include the possible options in a StringParam. A `StringParam` extends `Param[String]` and allow user to specify the valid options in Array[String] (case insensitive). So far it can help achieve three goals: 1. Make the StringParam aware of its possible options and support native validations. 2. StringParam can list the supported options in exception message when user input wrong value. before: `$parent parameter $name given invalid value $value.` after: `$parent parameter $name given invalid value $value. Supported options: auto, normal, l-bfgs` 3. Allow automatic unit test coverage for case-insensitive String param. (built into checkParam) and IMO it also decrease the code redundancy and make it more maintainable. The StringParam is designed to be completely compatible with existing Param[String], just with the extra logic for supporting options, which means we don't need to convert all Param[String] to StringParam until we feel comfortable to do that. The PR contains the minimum changes required and use NaiveBayes and LinearRegression as examples. They're intentionally built with some difference to demo different options when comparing String Params. The PR intentionally leave SharedParamsCodeGen untouched for now, which can be easily added if StringParam sounds good. Also more unit tests can be added. ## How was this patch tested? Strengthen existing unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark stringparam Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19599 commit f6b918eda3448df054edd81a077fa1a3d996d5c1 Author: Yuhao YangDate: 2017-10-28T18:44:46Z add StringParam --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] PostgreSQL UUID[] to StringType: Conv...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19567 LGTM except one comment, good catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19567: [SPARK-22291][SQL] PostgreSQL UUID[] to StringTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19567#discussion_r147560424 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -456,8 +456,17 @@ object JdbcUtils extends Logging { case StringType => (array: Object) => -array.asInstanceOf[Array[java.lang.String]] - .map(UTF8String.fromString) +array match { --- End diff -- looking at how we handle StringType, we call `ResultSet.getString` even if the underling type is not string, but uuid, inet, cidr, etc. I think we shoud follow the behavior here, i.e. just call `array.asInstanceOf[Array[java.lang.Object]].map(obj => UTF8String.fromString(obj.toString))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18113: [SPARK-20890][SQL] Added min and max typed aggreg...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18113#discussion_r147559942 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/typedaggregators.scala --- @@ -26,43 +26,64 @@ import org.apache.spark.sql.expressions.Aggregator // This file defines internal implementations for aggregators. +class TypedSumDouble[IN](val f: IN => Double) + extends Aggregator[IN, java.lang.Double, java.lang.Double] { + + override def zero: java.lang.Double = null + override def reduce(b: java.lang.Double, a: IN): java.lang.Double = --- End diff -- OK let's follow the SQL semantic and deal with performance problem later. @setjet sorry for the late. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19587: [SPARK-22370][SQL][PYSPARK] Config values should ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19587 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19587: [SPARK-22370][SQL][PYSPARK] Config values should be capt...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19587 LGTM, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19598 **[Test build #83173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83173/testReport)** for PR 19598 at commit [`6057309`](https://github.com/apache/spark/commit/6057309a0cc98d0b480d468107fadd1b2a44a1c0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19598: [SPARK-22378]
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19598 [SPARK-22378] ## What changes were proposed in this pull request? This PR eliminates redundant null check in generated code for extracting an element from complex types. Since these code generation does not take care of `nullable` in `DataType` such as `ArrayType`, the generated code always has `isNullAt(index)`. This PR avoids to generate `isNullAt(index)` if `nullable` is false in `DataType`. ## How was this patch tested? Added test cases into `ComplexTypeSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-22378 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19598.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19598 commit 6057309a0cc98d0b480d468107fadd1b2a44a1c0 Author: Kazuaki IshizakiDate: 2017-10-28T16:53:14Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19597 **[Test build #83172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83172/testReport)** for PR 19597 at commit [`59efcb8`](https://github.com/apache/spark/commit/59efcb8423b3341942c48ffdd82896ae369ce41b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19597 **[Test build #83171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83171/testReport)** for PR 19597 at commit [`9568906`](https://github.com/apache/spark/commit/956890612db181f5f470593f10fe02048cecaf03). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19597 I think you can add it here instead https://github.com/apache/spark/blob/0e31e28d483918c1de26f78068e78c2ca3cf7f3c/dev/tox.ini#L19 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19597 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19597 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19597: [SPARK-22375][TEST] Test script can fail if eggs ...
GitHub user xynny opened a pull request: https://github.com/apache/spark/pull/19597 [SPARK-22375][TEST] Test script can fail if eggs are installed by set⦠â¦up.py during test process ## What changes were proposed in this pull request? Ignore the python/.eggs folder when running lint-python ## How was this patch tested? 1) put a bad python file in python/.eggs and ran the original script. results were: xins-MBP:spark xinlu$ dev/lint-python PEP8 checks failed. ./python/.eggs/worker.py:33:4: E121 continuation line under-indented for hanging indent ./python/.eggs/worker.py:34:5: E131 continuation line unaligned for hanging indent 2) test same situation with change: xins-MBP:spark xinlu$ dev/lint-python PEP8 checks passed. The sphinx-build command was not found. Skipping pydoc checks for now You can merge this pull request into a Git repository by running: $ git pull https://github.com/xynny/spark SPARK-22375 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19597.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19597 commit 956890612db181f5f470593f10fe02048cecaf03 Author: Xin LuDate: 2017-10-28T16:23:33Z [SPARK-22375][TEST] Test script can fail if eggs are installed by setup.py during test process --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19596 **[Test build #83170 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83170/testReport)** for PR 19596 at commit [`82fe78b`](https://github.com/apache/spark/commit/82fe78bd6f16fabcc82e0020e0d1b77bc3b5a18f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83170/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19596 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org