[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127895248 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. + // We can't push down non-deterministic conditions. + case ExtractEquiJoinKeys(_, leftKeys, rightKeys, conditions, _, _) --- End diff -- Joining keys can only be equi-join. It is exactly the use case discussed in the dev mailling list. It's actually useful for the use cases. A general non-deterministic join condition pushdown doesn't make a lot of sense. The kind of predicates like `rand(1) > 0 && rand(11) < 0` can be a serious concern. The join results can be different before and after pushdown. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127894772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- Even if for equi join, how about `rand(a) = rand(b)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127894313 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- IIUC, for joining keys, it actually satisfies what you said: It's evaluated in the same order and in the same number as we don't push it down. I can't think an example it doesn't. So I may ask if you have an example for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127893995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. + // We can't push down non-deterministic conditions. + case ExtractEquiJoinKeys(_, leftKeys, rightKeys, conditions, _, _) --- End diff -- Supporting only equi-join does not sound reasonable here. The join condition can be any predicate. How about adding a SQLConf flag for controlling it? We can simply pushing it down no matter whether its semantics are the same or not, for making it consistent with Hive. By default, turn that flag off. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127893543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- The major point here is the non-deterministic join condition push-down is safe only when the results are the exactly same before and after the push down. After we push it down, basically, it will be evaluated for each row of that side. Will it be evaluated in the same order and in the same number if we do not push it down? We can find many different scenarios to break it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18668 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/18668 [SPARK-21451][SQL]get `spark.hadoop.*` properties from sysProps to hiveconf ## What changes were proposed in this pull request? get `spark.hadoop.*` properties from sysProps to hiveconf ## How was this patch tested? UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-21451 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18668 commit 89d9b86616196fde5d0b3a08fb284e6af6afe588 Author: Kent Yao Date: 2017-07-18T06:41:24Z HiveConf in SparkSQLCLIDriver doesn't respect spark.hadoop.some.hive.variables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127893174 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- We use `ExtractEquiJoinKeys` to extract joining keys. You can check it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18555 @gatorsmile Could you please review this code again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18656 Will CodegenFallback be used in wholestage codegen? I think it's not supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #79697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79697/testReport)** for PR 12646 at commit [`9bb80ea`](https://github.com/apache/spark/commit/9bb80eaf8e0b4339850d8c48e221c8ad1e477552). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127892847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- What is the join key? Any definition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18655 **[Test build #79696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79696/testReport)** for PR 18655 at commit [`8ffedda`](https://github.com/apache/spark/commit/8ffedda9f05d379d700aef95dca049a751374f87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127891910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- For different joining type, I think the joining keys are used to find matching/not matching rows. Currently I can't think of the case we can't push down non-deterministic joining keys. Maybe you can also show an example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18656 Hi, @cloud-fan, @vanzin , could you help to take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Hi @MLnick , @srowen . My test showing: pq.poll is not significantly faster than pq.toArray.sortBy, but significantly faster than pq.toArray.sorted. Seems not each pq.toArray.sorted (such as used in topByKey) can be replaced by pq.toArray.sortBy, so use pq.poll to replace pq.toArray.sorted will benefit. You can compare the performance of pq.sorted, pq.sortBy, and pq.poll using: https://github.com/apache/spark/pull/18624 The performance of pq.toArray.sortBy is about the same as pq.poll, and about 20% improvement comparing pq.toArray.sorted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79695/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127888746 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempPath { dir => --- End diff -- More clear :) No need to create source files in real. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Creat CachedBatchColumnVector to abst...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18468 `ArrowColumnVector` is also a wrapper for arrow vector, and it doesn't introduce vector type stuff. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 @cloud-fan Thank you for your comments. Based on [this discussion](https://github.com/apache/spark/pull/18468#discussion_r125395003), I introduced `VectorType`. I have just seen @ueshin 's `ArrowColumnVector` implementation. I will update `CachedBatchColumnVector` based on your comments and @ueshin 's implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r127885748 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -277,11 +290,13 @@ final class ShuffleBlockFetcherIterator( } else if (size < 0) { throw new BlockException(blockId, "Negative block size " + size) } - if (curRequestSize >= targetRequestSize) { + if (curRequestSize >= targetRequestSize || + curBlocks.size >= maxBlocksInFlightPerAddress) { --- End diff -- We may have a lot of adjacent fetch requests in the queue, shall we shuffle the request queue before fetching? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79694/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)** for PR 18654 at commit [`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18649: [SPARK-21395][SQL] Spark SQL hive-thriftserver doesn't r...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18649 cc @jerryshao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18468 I think this PR doesn't have a good abstraction of the problem. For table cache, our goal is not making the comressed data a `ColumnVector`, but having an efficient way to convert the compressed data(byte array) to `ColumnVector`. I think the most efficient way is to not do conversion at all, but having a wrapper, i.e. having a `class CachedBatchColumnVector(data: Array[Byte])`, which implements various `getXXX` methods by doing decompression. Then we don't need to introduce the `VectorType` concept and change `ColumnVector`. @kiszk what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18634 @cloud-fan @jiangxb1987 Thanks for help! I will refine and post the result of manual test late today :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18634#discussion_r127882623 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with SharedSQLContext { spark.catalog.dropTempView("nums") } + test("window function: mutiple window expressions specified by range in a single expression") { +val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 2)).toDF("x", "y") +nums.createOrReplaceTempView("nums") --- End diff -- And this test case doesn't cover when CurrentRow is not in the window frame. We'd better add that senario. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18634 @jinxing64 I think this patch is straightforward, can you do a manual test, which OOM before and works after this PR? We can put the test in PR description so that other people can try it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18634#discussion_r127882430 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with SharedSQLContext { spark.catalog.dropTempView("nums") } + test("window function: mutiple window expressions specified by range in a single expression") { +val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 2)).toDF("x", "y") +nums.createOrReplaceTempView("nums") --- End diff -- BTW this test is not very related to this PR, just adds test coverage for range window frame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18634#discussion_r127882358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with SharedSQLContext { spark.catalog.dropTempView("nums") } + test("window function: mutiple window expressions specified by range in a single expression") { +val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 2)).toDF("x", "y") +nums.createOrReplaceTempView("nums") --- End diff -- wrap your test with `withTempView`, which can drop the view automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18649: [SPARK-21395][SQL] Spark SQL hive-thriftserver doesn't r...
Github user debugger87 commented on the issue: https://github.com/apache/spark/pull/18649 @cloud-fan Any suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18655 Thank you for your comments. I agree that we should split this into smaller PRs. I'll push another commit to remove `ArrowColumnVector` from this as soon as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 ping @ueshin @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18655 yea let's put `ArrowColumnVector` and its tests in a new PR and merge that first. `ArrowWriter` will also be used for pandas UDF, see https://issues.apache.org/jira/browse/SPARK-21190 for more details, so it makes sense to move it to a separated file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/18660 Also merged to branch-2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/18660 thanks @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18667: Fix the simpleString used in error messages
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18667 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18667: Fix the simpleString used in error messages
GitHub user fxbonnet opened a pull request: https://github.com/apache/spark/pull/18667 Fix the simpleString used in error messages ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fxbonnet/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18667.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18667 commit e31555ba0b297054c504d3e2eaac20befb10738d Author: Francois-Xavier Bonnet Date: 2017-07-18T04:19:17Z Fix the simpleString used in error messages --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r127879502 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --- @@ -792,6 +793,76 @@ class ArrowConvertersSuite extends SharedSQLContext with BeforeAndAfterAll { collectAndValidate(df, json, "binaryData.json") } + test("date type conversion") { +val json = + s""" + |{ + | "schema" : { + |"fields" : [ { + | "name" : "date", + | "type" : { + |"name" : "date", + |"unit" : "DAY" + | }, + | "nullable" : true, + | "children" : [ ], + | "typeLayout" : { + |"vectors" : [ { + | "type" : "VALIDITY", + | "typeBitWidth" : 1 + |}, { + | "type" : "DATA", + | "typeBitWidth" : 32 + |} ] + | } + |} ] + | }, + | "batches" : [ { + |"count" : 4, + |"columns" : [ { + | "name" : "date", + | "count" : 4, + | "VALIDITY" : [ 1, 1, 1, 1 ], + | "DATA" : [ -1, 0, 16533, 16930 ] + |} ] + | } ] + |} + """.stripMargin + +val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US) +val d1 = new Date(-1) // "1969-12-31 13:10:15.000 UTC" +val d2 = new Date(0) // "1970-01-01 13:10:15.000 UTC" +val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime) +val d4 = new Date(sdf.parse("2016-05-09 12:01:01.000 UTC").getTime) + +// Date is created unaware of timezone, but DateTimeUtils force defaultTimeZone() + assert(DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d2)).getTime == d2.getTime) --- End diff -- We handle `DateType` value as days from `1970-01-01` internally. When converting from/to `Date` to/from internal value, we assume the `Date` instance contains the timestamp of `00:00:00` time of the day in `TimeZone.getDefault()` timezone, which is the offset of the timezone. e.g. in JST (GMT+09:00): ``` scala> TimeZone.setDefault(TimeZone.getTimeZone("JST")) scala> Date.valueOf("1970-01-01").getTime() res6: Long = -3240 ``` whereas in PST (GMT-08:00): ``` scala> TimeZone.setDefault(TimeZone.getTimeZone("PST")) scala> Date.valueOf("1970-01-01").getTime() res8: Long = 2880 ``` We use `DateTimeUtils.defaultTimeZone()` to adjust the offset. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18660: [SPARK-21445] Make IntWrapper and LongWrapper in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18660 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18660 thanks, merging to master! @brkyvz I think it's fine, this bug is very obvious. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/18660 I couldn't write an easy reproduction for the bug :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18583: [SPARK-21332][SQL] Incorrect result type inferred...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18583 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18583: [SPARK-21332][SQL] Incorrect result type inferred for so...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18583 Thanks! Merging to master/2.2/2.1/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127876852 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempPath { dir => --- End diff -- Could we maybe just do as below? ```scala withTempPath { path => spark.range(100).repartition(10).where("id = 50").write.parquet(path) val partFiles = path.listFiles() .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) assert(partFiles.length === 2) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18662 Merged to master. Thanks for the quick reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18662: [SPARK-21444] Be more defensive when removing bro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18662 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127875986 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- I meant joining keys. I am not sure if `a = c && rand(b) < 0` is a joining key? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18666 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18666: [SPARK-21449][SQL][Hive]Close HiveClient's Sessio...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/18666 [SPARK-21449][SQL][Hive]Close HiveClient's SessionState to delete residual dirs ## What changes were proposed in this pull request? When sparkSession.stop() is called, close the hive client too. ## How was this patch tested? manully You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-21449 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18666.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18666 commit cac9fe7a627911079e55d5704fcf1b49228c5147 Author: Kent Yao Date: 2017-07-18T03:22:17Z Hive client's SessionState was not closed properly in HiveExternalCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18663 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79692/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18663 **[Test build #79692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79692/testReport)** for PR 18663 at commit [`1496b78`](https://github.com/apache/spark/commit/1496b78d2bcd2003b23307f767c57c0dc2818e16). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r127874833 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -32,40 +34,45 @@ private[ml] trait DifferentiableRegularization[T] extends DiffFunction[T] { } /** - * A Breeze diff function for computing the L2 regularized loss and gradient of an array of + * A Breeze diff function for computing the L2 regularized loss and gradient of a vector of * coefficients. * * @param regParam The magnitude of the regularization. * @param shouldApply A function (Int => Boolean) indicating whether a given index should have *regularization applied to it. - * @param featuresStd Option indicating whether the regularization should be scaled by the standard - *deviation of the features. + * @param applyFeaturesStd Option for a function which maps coefficient index (column major) to the + * feature standard deviation. If `None`, no standardization is applied. */ private[ml] class L2Regularization( -val regParam: Double, +override val regParam: Double, shouldApply: Int => Boolean, -featuresStd: Option[Array[Double]]) extends DifferentiableRegularization[Array[Double]] { +applyFeaturesStd: Option[Int => Double]) extends DifferentiableRegularization[Vector] { - override def calculate(coefficients: Array[Double]): (Double, Array[Double]) = { -var sum = 0.0 -val gradient = new Array[Double](coefficients.length) -coefficients.indices.filter(shouldApply).foreach { j => - val coef = coefficients(j) - featuresStd match { -case Some(stds) => - val std = stds(j) - if (std != 0.0) { -val temp = coef / (std * std) -sum += coef * temp -gradient(j) = regParam * temp - } else { -0.0 + override def calculate(coefficients: Vector): (Double, Vector) = { +coefficients match { + case dv: DenseVector => +var sum = 0.0 +val gradient = new Array[Double](dv.size) +dv.values.indices.filter(shouldApply).foreach { j => + val coef = coefficients(j) + applyFeaturesStd match { +case Some(getStd) => + val std = getStd(j) + if (std != 0.0) { +val temp = coef / (std * std) +sum += coef * temp +gradient(j) = regParam * temp + } else { +0.0 + } +case None => + sum += coef * coef + gradient(j) = coef * regParam --- End diff -- Trivial, to match `regParam * temp` above, how about using `regParam * coef`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18665: [SPARK-21446] [SQL] Fix setAutoCommit never executed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18665 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18665: [SPARK-21446] [SQL] Fix setAutoCommit never execu...
GitHub user DFFuture opened a pull request: https://github.com/apache/spark/pull/18665 [SPARK-21446] [SQL] Fix setAutoCommit never executed ## What changes were proposed in this pull request? JIRA Issue: https://issues.apache.org/jira/browse/SPARK-21446 options.asConnectionProperties can not have fetchsize,because fetchsize belongs to Spark-only options, and Spark-only options have been excluded in connection properities. So change properties of beforeFetch from options.asConnectionProperties.asScala.toMap to options.asProperties.asScala.toMap ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/DFFuture/spark sparksql_pg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18665.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18665 commit 9ba431a838a16a8371b3d3f6ef028158576f85d2 Author: DFFuture Date: 2017-07-18T00:36:06Z asConnectionProperties can not have fetchsize, change it to asProperties --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127874260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- The join type also matters. For example, are we able to push it to the left side for the right outer join? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18652#discussion_r127874213 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1912,6 +1913,26 @@ class Analyzer( nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) }.copy(child = newChild) + case j: Join if j.condition.isDefined && !j.condition.get.deterministic => +j match { + // We can push down non-deterministic joining keys. --- End diff -- `a = c && rand(3) * b < 0 ` Are we able to push down the second one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18662 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79691/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18662 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18662 **[Test build #79691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79691/testReport)** for PR 18662 at commit [`a5ebcac`](https://github.com/apache/spark/commit/a5ebcac4ceb14eb8342ce085965b370186b4aba9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r127873828 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -598,8 +598,23 @@ class LogisticRegression @Since("1.2.0") ( val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam) val bcFeaturesStd = instances.context.broadcast(featuresStd) -val costFun = new LogisticCostFun(instances, numClasses, $(fitIntercept), - $(standardization), bcFeaturesStd, regParamL2, multinomial = isMultinomial, +val getAggregatorFunc = new LogisticAggregator(bcFeaturesStd, numClasses, $(fitIntercept), + multinomial = isMultinomial)(_) +val getFeaturesStd = (j: Int) => if (j >= 0 && j < numCoefficientSets * numFeatures) { + featuresStd(j / numCoefficientSets) +} else { + 0.0 +} + +val regularization = if (regParamL2 != 0.0) { + val shouldApply = (idx: Int) => idx >= 0 && idx < numFeatures * numCoefficientSets --- End diff -- It seems that the `regularization` contains `intercept`, right? However, the comment in [LogisticRegression.scala: 1903L](https://github.com/apache/spark/pull/18305/files#diff-3734f1689cb8a80b07974eb93de0795dL1903) is: > // We do not apply regularization to the intercepts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)** for PR 18654 at commit [`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127872988 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { +override def accept(dir: File, name: String): Boolean = { + !name.startsWith(".") && !name.startsWith("_") +} + }) + assert(allFiles.length == 10) + + withTempDir { dst_dir => +dst_dir.delete() +df.where("id = 50").write.parquet(dst_dir.toString) +val allFiles = dst_dir.listFiles(new FilenameFilter { + override def accept(dir: File, name: String): Boolean = { +!name.startsWith(".") && !name.startsWith("_") + } +}) +// First partition file and the data file --- End diff -- Can't agree more, firstly I try to implement like this but the `FileFormatWriter.write` can only see the iterator of each task self. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18660 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18660 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79689/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18660 **[Test build #79689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79689/testReport)** for PR 18660 at commit [`d220290`](https://github.com/apache/spark/commit/d2202903518b3dfa0f4a719a0b9cb5431088ed66). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` public static class LongWrapper implements Serializable ` * ` public static class IntWrapper implements Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18664 **[Test build #79693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79693/testReport)** for PR 18664 at commit [`69e1e21`](https://github.com/apache/spark/commit/69e1e21bf4bebc7bea6bd9322e4300df71a90b18). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18664 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18664 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79693/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18661: [SPARK-21409][SS] Follow up PR to allow different...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18661 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/18661 Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127869754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { +override def accept(dir: File, name: String): Boolean = { + !name.startsWith(".") && !name.startsWith("_") +} + }) + assert(allFiles.length == 10) + + withTempDir { dst_dir => +dst_dir.delete() +df.where("id = 50").write.parquet(dst_dir.toString) --- End diff -- I mean.. for example, if we happen to have a single partition in the `df` in any event, I guess this test will become invalid ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18661 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79690/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18661 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18661 **[Test build #79690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79690/testReport)** for PR 18661 at commit [`351c207`](https://github.com/apache/spark/commit/351c20704e5ba2577bd18a5a9dd2f577141c453a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait StateStoreCustomMetric ` * `case class StateStoreCustomSizeMetric(name: String, desc: String) extends StateStoreCustomMetric` * `case class StateStoreCustomTimingMetric(name: String, desc: String) extends StateStoreCustomMetric` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79687/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127868378 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { +override def accept(dir: File, name: String): Boolean = { + !name.startsWith(".") && !name.startsWith("_") +} + }) + assert(allFiles.length == 10) + + withTempDir { dst_dir => +dst_dir.delete() +df.where("id = 50").write.parquet(dst_dir.toString) --- End diff -- I was thinking just in order to make sure the (previous) number of files written out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18654 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18654 **[Test build #79687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79687/testReport)** for PR 18654 at commit [`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18631: [SPARK-21410][CORE] Create less partitions for Ra...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18631 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18631: [SPARK-21410][CORE] Create less partitions for RangePart...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18631 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127867549 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { +override def accept(dir: File, name: String): Boolean = { + !name.startsWith(".") && !name.startsWith("_") +} + }) + assert(allFiles.length == 10) + + withTempDir { dst_dir => +dst_dir.delete() +df.where("id = 50").write.parquet(dst_dir.toString) +val allFiles = dst_dir.listFiles(new FilenameFilter { + override def accept(dir: File, name: String): Boolean = { +!name.startsWith(".") && !name.startsWith("_") + } +}) +// First partition file and the data file --- End diff -- Ideally we only need the first partition file if all other partitions are empty, but this is hard to do right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127867486 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { +override def accept(dir: File, name: String): Boolean = { + !name.startsWith(".") && !name.startsWith("_") +} + }) + assert(allFiles.length == 10) + + withTempDir { dst_dir => +dst_dir.delete() +df.where("id = 50").write.parquet(dst_dir.toString) --- End diff -- why we need repartition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127867380 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { --- End diff -- +1 for the shorter one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127867341 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127867290 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -236,7 +236,10 @@ object FileFormatWriter extends Logging { committer.setupTask(taskAttemptContext) val writeTask = - if (description.partitionColumns.isEmpty && description.bucketIdExpression.isEmpty) { + if (sparkPartitionId != 0 && !iterator.hasNext) { --- End diff -- cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127867254 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -236,7 +236,10 @@ object FileFormatWriter extends Logging { committer.setupTask(taskAttemptContext) val writeTask = - if (description.partitionColumns.isEmpty && description.bucketIdExpression.isEmpty) { + if (sparkPartitionId != 0 && !iterator.hasNext) { --- End diff -- This is a little hacky but is the simplest fix I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18632#discussion_r127866899 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int numFields) { this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields); this.fixedSize = nullBitsSize + 8 * numFields; this.startingOffset = holder.cursor; +holder.reset(); --- End diff -- I not very sure about this, but what if this writer is for inner struct? Then the buffer holder is shared between many writers and we should only reset once. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79684/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18633 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18632 OK to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18633 **[Test build #79684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79684/testReport)** for PR 18633 at commit [`95988c1`](https://github.com/apache/spark/commit/95988c112905018d20c6d78a2ab688164735ede6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r127866465 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java --- @@ -121,4 +122,29 @@ public void udf6Test() { Row result = spark.sql("SELECT returnOne()").head(); Assert.assertEquals(1, result.getInt(0)); } + + public static class randUDFTest implements UDF1 { --- End diff -- `RandUDFTest`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r127866406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction protected[sql] ( udf } } + + /** + * Updates UserDefinedFunction to non-deterministic. + * + * @since 2.3.0 + */ + def nonDeterministic(): UserDefinedFunction = { --- End diff -- not a big deal, let's keep it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r127866355 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala --- @@ -69,7 +69,7 @@ class SQLContextSuite extends SparkFunSuite with SharedSparkContext { // UDF should not be shared def myadd(a: Int, b: Int): Int = a + b -session1.udf.register[Int, Int, Int]("myadd", myadd) +session1.udf.register[Int, Int, Int]("myadd", myadd _) --- End diff -- this sounds like a source code compatibility issue, can we look into it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...
Github user gczsjdy commented on the issue: https://github.com/apache/spark/pull/18632 @cloud-fan @viirya @gatorsmile Could you please help me review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/18654#discussion_r127865091 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.io.{File, FilenameFilter} + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.test.SharedSQLContext + +class FileFormatWriterSuite extends QueryTest with SharedSQLContext { + + test("empty file should be skipped while write to file") { +withTempDir { dir => + dir.delete() + spark.range(1).repartition(10).write.parquet(dir.toString) + val df = spark.read.parquet(dir.toString) + val allFiles = dir.listFiles(new FilenameFilter { +override def accept(dir: File, name: String): Boolean = { + !name.startsWith(".") && !name.startsWith("_") +} + }) + assert(allFiles.length == 10) --- End diff -- OK, I'll remove this assert and leave a note. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/18654 Yep, empty result dir need this meta, otherwise will throw the exception: ``` org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:187) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:571) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:555) ... 48 elided ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org