[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88159/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #88159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88159/testReport)** for PR 20727 at commit [`d6e9160`](https://github.com/apache/spark/commit/d6e91604585b22a27fbd0b7caa0a8e96d3725400). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][K8s][BUILD] Initialize BUILD_ARGS in docke...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/20791 LGTM! Thanks @jooseong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20793 Ah, results are different since the number of operations are different. It may be an issue like #20630. I am curious why test are failure when seed is changed. Of course, I understand the sequence of rand must be reproducable with certain seed value in a package or implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/20795 [SPARK-23486]cache the function name from the catalog for lookupFunctions ## What changes were proposed in this pull request? This PR will cache the function name from spark and external catalog, it is used by lookupFunctions in the analyzer, and it is cached for each query plan. The original problem is reported in the [ spark-19737](https://issues.apache.org/jira/browse/SPARK-19737) ## How was this patch tested? I did unit testing on local machine, it shows that the cache will be used if there multiple same functions in the same query. But I am not sure how I can add a test case into spark, can you advice? thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark spark-23486 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20795.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20795 commit 701100c11126d7437dc03ef20b484e84e2f9cb2a Author: Kevin YuDate: 2018-03-11T06:40:27Z cache the function name from the catalog for lookupFunction --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][K8s][BUILD] Initialize BUILD_ARGS in docke...
Github user jooseong commented on the issue: https://github.com/apache/spark/pull/20791 Added [K8s] into the PR title. Thanks for the review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20791 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88157/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20791 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20791 **[Test build #88157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88157/testReport)** for PR 20791 at commit [`096e992`](https://github.com/apache/spark/commit/096e99287a72b3ea164dbdf6c90edf4b256a2623). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88156/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88156/testReport)** for PR 20793 at commit [`bb40ef2`](https://github.com/apache/spark/commit/bb40ef2e8d337508d60903a6a824b5aa45d87326). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20793 Does `hashSeed` method produce same hash value after this change? ```scala scala> def hashSeed(seed: Long): Long = { | val bytes = ByteBuffer.allocate(java.lang.Long.SIZE).putLong(seed).array() | val lowBits = MurmurHash3.bytesHash(bytes) | val highBits = MurmurHash3.bytesHash(bytes, lowBits) | (highBits.toLong << 32) | (lowBits.toLong & 0xL) | } hashSeed: (seed: Long)Long scala> hashSeed(100) res3: Long = 852394178374189935 scala> def hashSeed2(seed: Long): Long = { | val bytes = ByteBuffer.allocate(java.lang.Long.BYTES).putLong(seed).array() | val lowBits = MurmurHash3.bytesHash(bytes) | val highBits = MurmurHash3.bytesHash(bytes, lowBits) | (highBits.toLong << 32) | (lowBits.toLong & 0xL) | } hashSeed2: (seed: Long)Long scala> hashSeed2(100) res7: Long = 1088402058313200430 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #88159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88159/testReport)** for PR 20727 at commit [`d6e9160`](https://github.com/apache/spark/commit/d6e91604585b22a27fbd0b7caa0a8e96d3725400). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1454/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88158/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #88158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88158/testReport)** for PR 20727 at commit [`97a8422`](https://github.com/apache/spark/commit/97a8422c63931ba1709523bb9bd1f60fffee597b). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1453/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #88158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88158/testReport)** for PR 20727 at commit [`97a8422`](https://github.com/apache/spark/commit/97a8422c63931ba1709523bb9bd1f60fffee597b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r173642056 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -42,7 +52,12 @@ class HadoopFileLinesReader( Array.empty) val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 0), 0) val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId) -val reader = new LineRecordReader() +val reader = if (lineSeparator != "\n") { + new LineRecordReader(lineSeparator.getBytes("UTF-8")) --- End diff -- OK. Let me try to address this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88155/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20785 **[Test build #88155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88155/testReport)** for PR 20785 at commit [`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 LGTM @tdas @zsxwing absent any objections from you in the next couple of days, I'll merge this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 @tdas any concerns? If @omuravskiy doesn't express any objections (since these tests are basically taken directly from his linked PR) in the next couple of days, I'm inclined to merge this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r173641331 --- Diff: external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -456,6 +455,60 @@ class DirectKafkaStreamSuite ssc.stop() } + test("backpressure.initialRate should honor maxRatePerPartition") { +backpressureTest(maxRatePerPartition = 1000, initialRate = 500, maxMessagesPerPartition = 250) + } + + test("use backpressure.initialRate with backpressure") { --- End diff -- Aren't the descriptions of these tests backwards, i.e. this the one testing that maxRatePerPartition is honored? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20767 Can you clarify why you want to allow only 1 cached consumer per topicpartition, closing any others at task end? It seems like opening and closing consumers would be less efficient than allowing a pool of more than one consumer per topicpartition. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20791 **[Test build #88157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88157/testReport)** for PR 20791 at commit [`096e992`](https://github.com/apache/spark/commit/096e99287a72b3ea164dbdf6c90edf4b256a2623). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88156/testReport)** for PR 20793 at commit [`bb40ef2`](https://github.com/apache/spark/commit/bb40ef2e8d337508d60903a6a824b5aa45d87326). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20791 @foxish --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20791 could you add [K8s] into PR title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20791 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20793 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r173639932 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala --- @@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val parameters: CaseInsensiti */ val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean + val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n") + require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty string.") } private[text] object TextOptions { val COMPRESSION = "compression" val WHOLETEXT = "wholetext" + val LINE_SEPARATOR = "lineSep" --- End diff -- One example might sound counterintuitive to you but it looks less consistent with other places at least I usually refer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r173639748 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -30,9 +31,19 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl /** * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], which are all of the lines * in that file. + * + * @param file A part (i.e. "block") of a single file that should be read line by line. + * @param lineSeparator A line separator that should be used for each line. If the value is `None`, + * it covers `\r`, `\r\n` and `\n`. + * @param conf Hadoop configuration */ class HadoopFileLinesReader( -file: PartitionedFile, conf: Configuration) extends Iterator[Text] with Closeable { +file: PartitionedFile, +lineSeparator: Option[String], +conf: Configuration) extends Iterator[Text] with Closeable { --- End diff -- Yup, I am sorry if I wasn't clear. I mean [the doc describes](https://hadoop.apache.org/docs/r2.7.1/api/index.html?org/apache/hadoop/io/Text.html): > This class stores text using standard UTF8 encoding. I was wondering if that's a official way to use `Text` because that sounds rather an informal workaround. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20785 **[Test build #88155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88155/testReport)** for PR 20785 at commit [`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1452/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20785 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173636109 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends Logging { } } - def releaseKafkaConsumer( - topic: String, - partition: Int, - kafkaParams: ju.Map[String, Object]): Unit = { -val groupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] -val topicPartition = new TopicPartition(topic, partition) -val key = CacheKey(groupId, topicPartition) - + private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = { synchronized { - val consumer = cache.get(key) - if (consumer != null) { -consumer.inuse = false - } else { -logWarning(s"Attempting to release consumer that does not exist") - } -} - } - /** - * Removes (and closes) the Kafka Consumer for the given topic, partition and group id. - */ - def removeKafkaConsumer( - topic: String, - partition: Int, - kafkaParams: ju.Map[String, Object]): Unit = { -val groupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] -val topicPartition = new TopicPartition(topic, partition) -val key = CacheKey(groupId, topicPartition) - -synchronized { - val removedConsumer = cache.remove(key) - if (removedConsumer != null) { -removedConsumer.close() + // If it has been marked for close, then do it any way + if (intConsumer.inuse && intConsumer.markedForClose) intConsumer.close() + intConsumer.inuse = false + + // Clear the consumer from the cache if this is indeed the consumer present in the cache + val key = new CacheKey(intConsumer.topicPartition, intConsumer.kafkaParams) + val cachedIntConsumer = cache.get(key) + if (cachedIntConsumer != null) { +if (cachedIntConsumer.eq(intConsumer)) { + // The released consumer is indeed the cached one. + cache.remove(key) +} else { + // The released consumer is not the cached one. Don't do anything. + // This should not happen as long as we maintain the invariant mentioned above. + logWarning( +s"Cached consumer not the same one as the one being release" + + s"\ncached = $cachedIntConsumer [${System.identityHashCode(cachedIntConsumer)}]" + + s"\nreleased = $intConsumer [${System.identityHashCode(intConsumer)}]") +} + } else { +// The released consumer is not in the cache. Don't do anything. +// This should not happen as long as we maintain the invariant mentioned above. +logWarning(s"Attempting to release consumer that is not in the cache") } } } /** * Get a cached consumer for groupId, assigned to topic and partition. * If matching consumer doesn't already exist, will be created using kafkaParams. + * The returned consumer must be released explicitly using [[KafkaDataConsumer.release()]]. + * + * Note: This method guarantees that the consumer returned is not currently in use by any one + * else. Within this guarantee, this will make a best effort attempt to re-use consumers by + * caching them and tracking when they are in use. */ - def getOrCreate( - topic: String, - partition: Int, - kafkaParams: ju.Map[String, Object]): CachedKafkaConsumer = synchronized { -val groupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] -val topicPartition = new TopicPartition(topic, partition) -val key = CacheKey(groupId, topicPartition) - -// If this is reattempt at running the task, then invalidate cache and start with -// a new consumer + def acquire( + topicPartition: TopicPartition, + kafkaParams: ju.Map[String, Object], + useCache: Boolean): KafkaDataConsumer = synchronized { +val key = new CacheKey(topicPartition, kafkaParams) +val existingInternalConsumer = cache.get(key) + +lazy val newInternalConsumer = new InternalKafkaConsumer(topicPartition, kafkaParams) + if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) { - removeKafkaConsumer(topic, partition, kafkaParams) - val consumer = new CachedKafkaConsumer(topicPartition, kafkaParams) - consumer.inuse = true - cache.put(key, consumer) - consumer -} else { - if
[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173636002 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends Logging { } } - def releaseKafkaConsumer( - topic: String, - partition: Int, - kafkaParams: ju.Map[String, Object]): Unit = { -val groupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] -val topicPartition = new TopicPartition(topic, partition) -val key = CacheKey(groupId, topicPartition) - + private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = { synchronized { - val consumer = cache.get(key) - if (consumer != null) { -consumer.inuse = false - } else { -logWarning(s"Attempting to release consumer that does not exist") - } -} - } - /** - * Removes (and closes) the Kafka Consumer for the given topic, partition and group id. - */ - def removeKafkaConsumer( - topic: String, - partition: Int, - kafkaParams: ju.Map[String, Object]): Unit = { -val groupId = kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String] -val topicPartition = new TopicPartition(topic, partition) -val key = CacheKey(groupId, topicPartition) - -synchronized { - val removedConsumer = cache.remove(key) - if (removedConsumer != null) { -removedConsumer.close() + // If it has been marked for close, then do it any way + if (intConsumer.inuse && intConsumer.markedForClose) intConsumer.close() --- End diff -- Is it possible we have the following condition - should intConsumer.close() be called ? !intConsumer.inuse && intConsumer.markedForClose --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r173633651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -30,9 +31,19 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl /** * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], which are all of the lines * in that file. + * + * @param file A part (i.e. "block") of a single file that should be read line by line. + * @param lineSeparator A line separator that should be used for each line. If the value is `None`, + * it covers `\r`, `\r\n` and `\n`. + * @param conf Hadoop configuration */ class HadoopFileLinesReader( -file: PartitionedFile, conf: Configuration) extends Iterator[Text] with Closeable { +file: PartitionedFile, +lineSeparator: Option[String], +conf: Configuration) extends Iterator[Text] with Closeable { --- End diff -- Some methods of Hadoop's Text have such assumption about UTF-8 encoding. In general a datasource could eliminate the restriction by using the Text class as container of raw bytes and calling methods like **getBytes()** and **getLength()**. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20794 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88153/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20794 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20794 **[Test build #88153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88153/testReport)** for PR 20794 at commit [`17ea399`](https://github.com/apache/spark/commit/17ea399162167092e0362f90b49a03397ae82afe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r173633462 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -42,7 +52,12 @@ class HadoopFileLinesReader( Array.empty) val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 0), 0) val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId) -val reader = new LineRecordReader() +val reader = if (lineSeparator != "\n") { + new LineRecordReader(lineSeparator.getBytes("UTF-8")) --- End diff -- Why do you think this class is responsible for converting string separator to array of bytes? Especially restriction by one charset is not clear. The purpose of the class is to provide the Iterator interface of records/lines to datasources. And this class doesn't have to know about datasource's charset. I would not stick on particular charset here and expose the separator parameter with `Option[Array[Byte]]` like the LineReader provides a constructor with `byte[] recordDelimiter`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88154/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88154/testReport)** for PR 20779 at commit [`603ce0f`](https://github.com/apache/spark/commit/603ce0fb29bfa5b5c0cfea69fb72e2a3128e772a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20727#discussion_r173632775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala --- @@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val parameters: CaseInsensiti */ val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean + val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n") + require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty string.") } private[text] object TextOptions { val COMPRESSION = "compression" val WHOLETEXT = "wholetext" + val LINE_SEPARATOR = "lineSep" --- End diff -- In the example above, the line is counterintuitive for me. I imagine a line in text files as a sequence of one or more characters, displayed within a single horizontal sequence. I would prefer the short name *recSep* or *recordSeparator* for long name. I guess when the option will be used, it will separate text not by new line chars like `'\n'`, `'\r\n'`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18994: [SPARK-21784][SQL] Adds support for defining information...
Github user ioana-delaney commented on the issue: https://github.com/apache/spark/pull/18994 @sureshthalamati Hi Suresh, We are planning to proceed with the performance improvements. Will you be able to continue working on this PR? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...
Github user matyix commented on the issue: https://github.com/apache/spark/pull/19775 For those who are still interested using Prometheus you can get the standalone package and source code from here: https://github.com/banzaicloud/spark-metrics . Happy monitoring, try to catch the issues and avoid those PagerDuty notifications beforehand :). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88152/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20785 **[Test build #88152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88152/testReport)** for PR 20785 at commit [`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20763 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20793 Good catch, LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20043 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1451/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88154/testReport)** for PR 20779 at commit [`603ce0f`](https://github.com/apache/spark/commit/603ce0fb29bfa5b5c0cfea69fb72e2a3128e772a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20779 Let me reduce the number of loops. Another option is to revert this change to use non-loop version that worked without an exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20794 **[Test build #88153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88153/testReport)** for PR 20794 at commit [`17ea399`](https://github.com/apache/spark/commit/17ea399162167092e0362f90b49a03397ae82afe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20794 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1450/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20794 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20794: [SPARK-23644][CORE][UI] Use absolute path for RES...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/20794 [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS ## What changes were proposed in this pull request? SHS is using a relative path for the REST API call to get the list of the application is a relative path call. In case of the SHS being consumed through a proxy, it can be an issue if the path doesn't end with a "/". Therefore, we should use an absolute path for the REST call as it is done for all the other resources. ## How was this patch tested? manual tests Before the change: ![screen shot 2018-03-10 at 4 22 02 pm](https://user-images.githubusercontent.com/8821783/37244190-8ccf9d40-2485-11e8-8fa9-345bc81472fc.png) After the change: ![screen shot 2018-03-10 at 4 36 34 pm 1](https://user-images.githubusercontent.com/8821783/37244201-a1922810-2485-11e8-8856-eeab2bf5e180.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23644 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20794 commit 17ea399162167092e0362f90b49a03397ae82afe Author: Marco GaidoDate: 2018-03-10T15:49:52Z [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r173625828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} + } +} + } + + // Formats excluding text and parquet allow writing empty data frames to files. + allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format => +test(s"SPARK-23372 writing empty dataframe and reading from it - $format") { + withTempPath { outputPath => + spark.emptyDataFrame.write.format(format).save(outputPath.toString) + intercept[AnalysisException] { +val df = spark.read.format(format).load(outputPath.toString) --- End diff -- Sorry if I misunderstood. The link is https://github.com/apache/spark/pull/20579#issuecomment-364994881. Is that the right link? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20793: [SPARK-23643] Shrinking the buffer in hashSeed up...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/20793 [SPARK-23643] Shrinking the buffer in hashSeed up to size of the seed parameter ## What changes were proposed in this pull request? The hashSeed method allocates 64 bytes instead of 8. Other bytes are always zeros. And they could be excluded from hash calculation because they don't differentiate inputs. ## How was this patch tested? By running the existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 hash-buff-size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20793 commit bb40ef2e8d337508d60903a6a824b5aa45d87326 Author: Maxim GekkDate: 2018-03-10T13:14:33Z Shrinking the buffer up to size of the long type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88150/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20701 **[Test build #88150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88150/testReport)** for PR 20701 at commit [`b3d0523`](https://github.com/apache/spark/commit/b3d0523e5eed89dc800d0678adde59eb4ac4343e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20525 late LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [WIP][SPARK-21030][PYTHON][SQL] Adds more types f...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r173623998 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed = [str, list, float, int] for p in parameters: -if not isinstance(p, str): +if not type(p) in allowed: --- End diff -- Didn't know that it was possible, nice! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1449/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20785 **[Test build #88152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88152/testReport)** for PR 20785 at commit [`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88149/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20717 **[Test build #88149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88149/testReport)** for PR 20717 at commit [`9e2d993`](https://github.com/apache/spark/commit/9e2d993d691ad37b230c9e14d16148b9dc9727e6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20779 I don't think so. There is an option to change the heap size for test execution, but I am not sure we are allowed/it is a good idea to do that. Let's hear others' opinion... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20779 Ah, I increased the heap size (4GB) in my environment with IntelliJ. Should we create a class like https://github.com/apache/spark/pull/20636? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88151/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20719 **[Test build #88151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88151/testReport)** for PR 20719 at commit [`2d64a90`](https://github.com/apache/spark/commit/2d64a9028ea138aa8b538da25637771543109076). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20790: AccumulatorV2 subclass isZero scaladoc fix
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20790 Wait .. I just found you opened a JIRA - SPARK-23642. Please link it by `[SPARK-23642][DOCS] ...`. see https://spark.apache.org/contributing.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20790: AccumulatorV2 subclass isZero scaladoc fix
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20790 Shall we fix the title to `[MINOR][DOCS] AccumulatorV2 ...` to be consistent with other PRs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20790: AccumulatorV2 subclass isZero scaladoc fix
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20790#discussion_r173621260 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -290,7 +290,8 @@ class LongAccumulator extends AccumulatorV2[jl.Long, jl.Long] { private var _count = 0L /** - * Adds v to the accumulator, i.e. increment sum by v and count by 1. + * Returns false if this accumulator has had any values added to it or the sum is non-zero. + * --- End diff -- I think this duplicates the doc from `AccumulatorV2.isZero`. Can we simply remove this wrong doc and revert other changes so that we can reuse inherited doc from `AccumulatorV2.isZero` in all places? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20771 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88146/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20771 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20771 **[Test build #88146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88146/testReport)** for PR 20771 at commit [`e725608`](https://github.com/apache/spark/commit/e725608d1b38a7a2b1a0677afca947cec6a12801). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1448/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org