[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16867 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/16867 [SPARK-16929] Improve performance when check speculatable tasks. ## What changes were proposed in this pull request? When check speculatable tasks in `TaskSetManager`, current code scan all task infos and sort durations of successful tasks in O(NlogN) time complexity. Since during the checkin g process, `TaskSchedulerImpl`'s synchronized lock is acquired, so it might cause performance degradation when check a large scale task set, say hundreds of thousands. This change uses a `TreeSet` to cache the successful task infos and compare the median duration with running tasks, avoiding scanning all task infos. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-16929 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16867 commit 1169d118662a9bfdabe88238352fe834a28aee14 Author: jinxingDate: 2017-02-07T02:35:10Z [SPARK-16929] Improve performance when check speculatable tasks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16809 I just found refresh table related to table insertion when `DataFrameWriter.saveAsTable` with overwrite mode, and `InsetIntoHiveTable`. `InsertHadoopFsRelation` need to refresh table? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/16862 @liancheng FYI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][test-maven][test-hadoop2.6]Avoid usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16859 **[Test build #3570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3570/testReport)** for PR 16859 at commit [`1c88474`](https://github.com/apache/spark/commit/1c8847494c29d4b51182ecfeebb5cc85e000e7a1). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class TransportChannelHandler extends ChannelInboundHandlerAdapter ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16776 **[Test build #72634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72634/testReport)** for PR 16776 at commit [`4db82b4`](https://github.com/apache/spark/commit/4db82b45ce061a131ece96f1ca554bc9e5423d46). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16866: [SPARK-19529] TransportClientFactory.createClient() shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16866 **[Test build #72633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72633/testReport)** for PR 16866 at commit [`c1c4553`](https://github.com/apache/spark/commit/c1c4553e32826453ed39eaaefd1cd92ef0e36382). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user watermen commented on the issue: https://github.com/apache/spark/pull/16677 @viirya We'd better don't modify the API and in `TaskMetrics` already has `resultSize`, we can add `resultNum` like it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72630/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72630/testReport)** for PR 16715 at commit [`1b70b91`](https://github.com/apache/spark/commit/1b70b919edea26321f21220f11d520d4f4f98ede). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16866: [SPARK-19529] TransportClientFactory.createClient...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/16866 [SPARK-19529] TransportClientFactory.createClient() shouldn't call awaitUninterruptibly() ## What changes were proposed in this pull request? This patch replaces a single `awaitUninterruptibly()` call with a plain `await()` call in Spark's common network layer in order to fix a bug which may cause tasks to be uncancellable. In Spark's Netty RPC layer, `TransportClientFactory.createClient()` calls `awaitUninterruptibly()` on a Netty future while waiting for a connection to be established. This creates problem when a Spark task is interrupted while blocking in this call (which can happen in the event of a slow connection which will eventually time out). This has bad impacts on task cancellation when `interruptOnCancel = true`. As an example of the impact of this problem, I experienced significant numbers of uncancellable "zombie tasks" on a production cluster where several tasks were blocked trying to connect to a dead shuffle server and then continued running as zombies after I cancelled the associated Spark stage. The zombie tasks ran for several minutes with the following stack: ``` java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:460) io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:607) io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:301) org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:224) org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) => holding Monitor(java.lang.Object@1849476028}) org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105) org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:114) org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:169) org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala: 350) org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:286) org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:120) org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45) org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:169) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) org.apache.spark.rdd.RDD.iterator(RDD.scala:287) [...] ``` As far as I can tell, `awaitUninterruptibly()` might have been used in order to avoid having to declare that methods throw `InterruptedException` (this code is written in Java, hence the need to use checked exceptions). This patch simply replaces this with a regular, interruptible `await()` call,. This required several interface changes to declare a new checked exception (these are internal interfaces, though, and this change doesn't significantly impact binary compatibility). An alternative approach would be to wrap `InterruptedException` into `IOException` in order to avoid having to change interfaces. The problem with this approach is that the `network-shuffle` project's `RetryingBlockFetcher` code treats `IOExceptions` as transitive failures when deciding whether to retry fetches, so throwing a wrapped `IOException` might cause an interrupted shuffle fetch to be retried, further prolonging the lifetime of a cancelled zombie task. ## How was this patch tested? Manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-19529 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16866 commit c1c4553e32826453ed39eaaefd1cd92ef0e36382 Author: Josh RosenDate: 2017-02-09T07:25:29Z Use await() instead of awaitUninterruptibly() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To
[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16865#discussion_r100245982 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1004,7 +1016,8 @@ object CodeGenerator extends Logging { * weak keys/values and thus does not respond to memory pressure. */ private val cache = CacheBuilder.newBuilder() -.maximumSize(100) +.maximumWeight(10 * 1024 * 1024) --- End diff -- Not sure if this is a proper number. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16865 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16865 **[Test build #72632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72632/testReport)** for PR 16865 at commit [`e6e2a8d`](https://github.com/apache/spark/commit/e6e2a8dd95512047346b939fc305dfaaef67f592). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16865 [SPARK-19530][SQL] Use guava weigher for code cache eviction ## What changes were proposed in this pull request? We use guava cache to cache compiled codes for codegen. Currently we use number of entries (100 as maximum now) in the cache to determine when to evict older entries. However, the number of entries can't respond well to actually memory usage of cache entries. As we heavily rely codegen now and the generated codes can be large, we shouldn't use maximum of entries. This patch turns to use `Weigher` in guava. We use the size of bytecode as the weight of an entry. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 use-weight-for-code-cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16865 commit e6e2a8dd95512047346b939fc305dfaaef67f592 Author: Liang-Chi HsiehDate: 2017-02-09T07:18:23Z Use weight for code cache. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user vitillo commented on the issue: https://github.com/apache/spark/pull/16857 @zsxwing Since I can't access the build results you could please tell me why the patch fails to build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16785: [SPARK-19443][SQL] The function to generate constraints ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16785 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 Could somebody help review this PR? I think this will make gathering the estimation results in Scala much easier. This will also be helpful in constructing the tests. For example, the GLM tests with weights can be simplified a lot if we have all results in arrays and SEs etc are aligned with coefficients (current GLM tests with weight force no intercept to avoid this nuisance). @sethah @imatiach-msft @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16785: [SPARK-19443][SQL] The function to generate constraints ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72625/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16785 **[Test build #72625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72625/testReport)** for PR 16785 at commit [`8c98a5c`](https://github.com/apache/spark/commit/8c98a5c3ab1477408988c8cb682733e65dd554fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16750 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72626/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16750 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16750 **[Test build #72626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72626/testReport)** for PR 16750 at commit [`ffc4912`](https://github.com/apache/spark/commit/ffc4912e17cc900fc9d7ceefd0f66461109728e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r100241493 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -776,20 +778,21 @@ private[hive] class HiveClientImpl( client.dropDatabase(db, true, false, true) } } +} +private[hive] object HiveClientImpl { + private lazy val shimForHiveExecution = IsolatedClientLoader.hiveVersion( --- End diff -- let me remove it , thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72631/testReport)** for PR 16787 at commit [`99d5bb2`](https://github.com/apache/spark/commit/99d5bb20a3f98220e8370c94b3620e9b2c6c61f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16787 thanks! @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72627/testReport)** for PR 16787 at commit [`b20d14f`](https://github.com/apache/spark/commit/b20d14fb6e70aaf6c4e09c644dd8ec6b8b5569dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 OK. I'll try it immediately. Thank U very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16674: [SPARK-19331][SQL][TESTS] Improve the test covera...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16674#discussion_r100238713 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala --- @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import org.apache.spark.sql.{AnalysisException, Row, SaveMode, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType} +import org.apache.spark.sql.execution.SQLViewSuite +import org.apache.spark.sql.hive.test.{TestHive, TestHiveSingleton} +import org.apache.spark.sql.types.StructType + +/** + * A test suite for Hive view related functionality. + */ +class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton { + protected override val spark: SparkSession = TestHive.sparkSession + + override def beforeAll(): Unit = { +super.beforeAll() +// Create a simple table with two columns: id and id1 +spark.range(1, 10).selectExpr("id", "id id1").write.format("json").saveAsTable("jt") + } + + override def afterAll(): Unit = { +try { + spark.sql(s"DROP TABLE IF EXISTS jt") +} finally { + super.afterAll() +} + } + + import testImplicits._ + + test("create a permanent/temp view using a hive, built-in, and permanent user function") { +val permanentFuncName = "myUpper" +val permanentFuncClass = + classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName +val builtInFuncNameInLowerCase = "abs" +val builtInFuncNameInMixedCase = "aBs" +val hiveFuncName = "histogram_numeric" + +withUserDefinedFunction(permanentFuncName -> false) { + sql(s"CREATE FUNCTION $permanentFuncName AS '$permanentFuncClass'") + withTable("tab1") { +(1 to 10).map(i => (s"$i", i)).toDF("str", "id").write.saveAsTable("tab1") +Seq("VIEW", "TEMPORARY VIEW").foreach { viewMode => + withView("view1") { +sql( + s""" + |CREATE $viewMode view1 + |AS SELECT + |$permanentFuncName(str), + |$builtInFuncNameInLowerCase(id), + |$builtInFuncNameInMixedCase(id) as aBs, + |$hiveFuncName(id, 5) over() + |FROM tab1 + """.stripMargin) +checkAnswer(sql("select count(*) FROM view1"), Row(10)) + } +} + } +} + } + + test("create a permanent/temp view using a temporary function") { +val tempFunctionName = "temp" +val functionClass = + classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName +withUserDefinedFunction(tempFunctionName -> true) { + sql(s"CREATE TEMPORARY FUNCTION $tempFunctionName AS '$functionClass'") + withView("view1", "tempView1") { +withTable("tab1") { + (1 to 10).map(i => s"$i").toDF("id").write.saveAsTable("tab1") + + // temporary view + sql(s"CREATE TEMPORARY VIEW tempView1 AS SELECT $tempFunctionName(id) from tab1") + checkAnswer(sql("select count(*) FROM tempView1"), Row(10)) + + // permanent view + val e = intercept[AnalysisException] { +sql(s"CREATE VIEW view1 AS SELECT $tempFunctionName(id) from tab1") + }.getMessage + assert(e.contains("Not allowed to create a permanent view `view1` by referencing " + +s"a temporary function `$tempFunctionName`")) +} + } +} + } + + test("create hive view for json table") { +// json table is not hive-compatible, make sure the new flag fix it. +withView("testView") { +
[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16854 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72623/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16854 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16809 where do we refresh table for table insertion? will we fresh twice(table and path)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16854 **[Test build #72623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72623/testReport)** for PR 16854 at commit [`a7e8c2b`](https://github.com/apache/spark/commit/a7e8c2bfaf98c27885907caa21cce7e93d4afd1b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UnivocityParser(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]insert data to a not exist location da...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16672 ping @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 You might be able to make it by forcefully pushing the new changes by `git push -f origin NEW_BRANCH:REMOTE_BRANCH ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72630/testReport)** for PR 16715 at commit [`1b70b91`](https://github.com/apache/spark/commit/1b70b919edea26321f21220f11d520d4f4f98ede). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 No worry, open/submit a new PR. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16787 Late in the east coast. Will review it tomorrow. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Oh, I See, I miss a step âgit remote add upstream ...â. But now, I have delete my repository in my profile. So this PR canât know which repository should be associated. So, do u have a method to help me cover this problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72628/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72628/testReport)** for PR 16715 at commit [`b45ec0a`](https://github.com/apache/spark/commit/b45ec0ab118545383526ffa80fa873a4ccc33307). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 You do not need to do the step 1 every time. You might miss the following two steps when you want to resolve your conflicts. > git fetch upstream > git merge upstream/master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100236493 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- Actually all the "magic keys" in the options used by `DataFrameWriter` are public APIs, they are not going to change and users need to know about them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Here's how I create a PR: 1.fork the master of Apache; 2.create a new branch in my master branch 3.select my new branch menu and create a new PR. 4.edit my new branch code. 5.commit and push. Can u point lost or mistake steps for me, Thank u for your guidancesï¼ @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100236345 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- > e.g. calling the save method adds a "path" key to the option map, but is that key name a public API? yes, it is. e.g. `df.write.format("parquet").option("path", some_path).save()`, the `path` is a "magic key" and we've exposed it to users, so `path` is a public API and if we change it, we will break existing applications. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r100235734 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -815,7 +819,20 @@ private[hive] class HiveClientImpl( Option(hc.getComment).map(field.withComment).getOrElse(field) } - private def toHiveTable(table: CatalogTable): HiveTable = { + private def toInputFormat(name: String) = +Utils.classForName(name).asInstanceOf[Class[_ <: org.apache.hadoop.mapred.InputFormat[_, _]]] + + private def toOutputFormat(name: String) = +Utils.classForName(name) + .asInstanceOf[Class[_ <: org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]] + + /** Converts the native table metadata representation format CatalogTable to Hive's Table. --- End diff -- style: ``` /** * doc */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r100235680 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -776,20 +778,21 @@ private[hive] class HiveClientImpl( client.dropDatabase(db, true, false, true) } } +} +private[hive] object HiveClientImpl { + private lazy val shimForHiveExecution = IsolatedClientLoader.hiveVersion( --- End diff -- is this still needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 You might not be familiar with the Github/Git. How about submitting a new PR? : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrar...
Github user tdas closed the pull request at: https://github.com/apache/spark/pull/16850 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 My master branch with the master of Apache is not synchronized, and then I did the pull operation, my master branch still not synchronized, and finally I removed my remote repository. But I do not know how to associate a new branch with this PR? I Think I made a misopreation. @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16736 @gatorsmile @cloud-fan thank you for the time and efforts you've put in reviewing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16674: [SPARK-19331][SQL][TESTS] Improve the test covera...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16674#discussion_r100235360 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala --- @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import org.apache.spark.sql.{AnalysisException, Row, SaveMode, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType} +import org.apache.spark.sql.execution.SQLViewSuite +import org.apache.spark.sql.hive.test.{TestHive, TestHiveSingleton} +import org.apache.spark.sql.types.StructType + +/** + * A test suite for Hive view related functionality. + */ +class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton { + protected override val spark: SparkSession = TestHive.sparkSession + + override def beforeAll(): Unit = { +super.beforeAll() +// Create a simple table with two columns: id and id1 +spark.range(1, 10).selectExpr("id", "id id1").write.format("json").saveAsTable("jt") + } + + override def afterAll(): Unit = { +try { + spark.sql(s"DROP TABLE IF EXISTS jt") +} finally { + super.afterAll() +} + } + + import testImplicits._ + + test("create a permanent/temp view using a hive, built-in, and permanent user function") { +val permanentFuncName = "myUpper" +val permanentFuncClass = + classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName +val builtInFuncNameInLowerCase = "abs" +val builtInFuncNameInMixedCase = "aBs" +val hiveFuncName = "histogram_numeric" + +withUserDefinedFunction(permanentFuncName -> false) { + sql(s"CREATE FUNCTION $permanentFuncName AS '$permanentFuncClass'") + withTable("tab1") { +(1 to 10).map(i => (s"$i", i)).toDF("str", "id").write.saveAsTable("tab1") +Seq("VIEW", "TEMPORARY VIEW").foreach { viewMode => + withView("view1") { +sql( + s""" + |CREATE $viewMode view1 + |AS SELECT + |$permanentFuncName(str), + |$builtInFuncNameInLowerCase(id), + |$builtInFuncNameInMixedCase(id) as aBs, + |$hiveFuncName(id, 5) over() + |FROM tab1 + """.stripMargin) +checkAnswer(sql("select count(*) FROM view1"), Row(10)) + } +} + } +} + } + + test("create a permanent/temp view using a temporary function") { +val tempFunctionName = "temp" +val functionClass = + classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName +withUserDefinedFunction(tempFunctionName -> true) { + sql(s"CREATE TEMPORARY FUNCTION $tempFunctionName AS '$functionClass'") + withView("view1", "tempView1") { +withTable("tab1") { + (1 to 10).map(i => s"$i").toDF("id").write.saveAsTable("tab1") + + // temporary view + sql(s"CREATE TEMPORARY VIEW tempView1 AS SELECT $tempFunctionName(id) from tab1") + checkAnswer(sql("select count(*) FROM tempView1"), Row(10)) + + // permanent view + val e = intercept[AnalysisException] { +sql(s"CREATE VIEW view1 AS SELECT $tempFunctionName(id) from tab1") + }.getMessage + assert(e.contains("Not allowed to create a permanent view `view1` by referencing " + +s"a temporary function `$tempFunctionName`")) +} + } +} + } + + test("create hive view for json table") { +// json table is not hive-compatible, make sure the new flag fix it. +withView("testView") { +
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 At least, `spark-master-test-maven-hadoop-2.6` goes green. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16736 LGTM Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 Your master is clean (i.e., exactly identical to the upstream/master), right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16837 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16803: [SPARK-19458][BUILD]load hive jars from local repo which...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16803 @dongjoon-hyun @srowen could you help to review this? thanks very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16837 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16775 ping @mengxr @jkbradley @liancheng @MLnick May you take a look at this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16803: [SPARK-19458][BUILD]load hive jars from local repo which...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16803 **[Test build #72629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72629/testReport)** for PR 16803 at commit [`51b8f5e`](https://github.com/apache/spark/commit/51b8f5e4f75fcba524df8240c2384ff204fe93cc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16760 Many thanks @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16760: [SPARK-18872][SQL][TESTS] New test cases for EXIS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16760 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16803: [SPARK-19458][BUILD]load hive jars from local repo which...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16803 if we not set ivy.jars.repos , it will use default ${user.home}/.m2 repo, and if we set ivy.jars.path which has download, it will can alos load from this path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16760 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16785 since this change is related to SQL, cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16785 I don't find a way to improve `getAliasedConstraints` significantly by re-writing its logic. The current way to improve its performance is using parallel collection to do the transformation in parallel. It can cut the running time by half (see benchmark in the pr description), but the running time (13.5 secs) is still too long compared with 1.6. We may consider #16775 which is an another solution to fix this issue by checkpointing datasets for pipelines of long stages, or both of them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72628/testReport)** for PR 16715 at commit [`b45ec0a`](https://github.com/apache/spark/commit/b45ec0ab118545383526ffa80fa873a4ccc33307). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r100232773 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions.{Expression, GetStructField} +import org.apache.spark.sql.types.StructField + +/** + * A Scala extractor that extracts the child expression and struct field from a [[GetStructField]]. + * This is in contrast to the [[GetStructField]] case class extractor which returns the field + * ordinal instead of the field itself. + */ +private[planning] object GetStructField2 { --- End diff -- `GetStructFieldObject` or `GetStructFieldExtractor`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72616/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72616/testReport)** for PR 16787 at commit [`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100232628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- In Spark SQL, for metadata-like info, we store it as a key-value map. For example, [MetadataBuilder](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala) is used for this purpose. So far, the solution proposed in this PR is not good to me. I do not think it is a good design. Even if we add a structured type, this could be possibly changed in the future. If you want to introduce an external public interface (like our data source APIs), we need a careful design. This should be done in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @watermen Thanks for the review. What is the advantage of adding it in `TaskMetrics` instead of `MapStatus`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72627/testReport)** for PR 16787 at commit [`b20d14f`](https://github.com/apache/spark/commit/b20d14fb6e70aaf6c4e09c644dd8ec6b8b5569dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72620/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #72620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72620/testReport)** for PR 16386 at commit [`f71a465`](https://github.com/apache/spark/commit/f71a465cf07fb9c043b2ccd86fa57e8e8ea9dc00). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100231056 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- > is like metadata It is metadata, but that doesn't mean it doesn't have meaning and thus doesn't need structure. Some of the metadata currently models the "where" the data is being written. Internally it doesn't really matter much how much it's handled (it's an "implementation detail"), but, for someone building an application that uses this information, knowing that a particular key means "where the data will end up" *is* very important, and a structured type with proper, documented fields helps that. We just happen to want that information, and we could use it either way, but that's beside the point. I'm arguing that there's value in exposing this data in a more structured manner than just an opaque map. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72619/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72619/testReport)** for PR 16736 at commit [`f29c9d7`](https://github.com/apache/spark/commit/f29c9d77a683c1a63abac92f19210eadcb68682e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16760 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72621/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16760 **[Test build #72621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72621/testReport)** for PR 16760 at commit [`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100229660 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- Based on my understanding, the extra information we pass to QueryExecutionListener is like metadata. It is just for helping users understand the context. I still do not understand why we need to define a class/trait for it. This extra class/trait looks weird for this goal, unless you have some applications that are built on this class/trait. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r100229358 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions.{Expression, GetStructField} +import org.apache.spark.sql.types.StructField + +/** + * A Scala extractor that extracts the child expression and struct field from a [[GetStructField]]. + * This is in contrast to the [[GetStructField]] case class extractor which returns the field + * ordinal instead of the field itself. + */ +private[planning] object GetStructField2 { --- End diff -- How about `GetStructFieldObject`? Or `GetStructFieldRef`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFr...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16854#discussion_r100229312 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -361,6 +362,41 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { } /** + * Loads an `Dataset[String]` storing CSV rows and returns the result as a `DataFrame`. + * + * Unless the schema is specified using `schema` function, this function goes through the + * input once to determine the input schema. + * + * @param csvDataset input Dataset with one CSV row per record + * @since 2.2.0 + */ + def csv(csvDataset: Dataset[String]): DataFrame = { +val parsedOptions: CSVOptions = new CSVOptions(extraOptions.toMap) --- End diff -- Just to help review, there is a similar code path in https://github.com/apache/spark/blob/3d314d08c9420e74b4bb687603cdd11394eccab5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L105-L125 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r100229300 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.planning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +/** + * A Scala extractor that builds a [[StructField]] from a Catalyst complex type + * extractor. This is like the opposite of [[ExtractValue#apply]]. + */ +object SelectedField { + def unapply(expr: Expression): Option[StructField] = { +// If this expression is an alias, work on its child instead +val unaliased = expr match { + case Alias(child, _) => child + case expr => expr +} +selectField(unaliased, None) + } + + /** + * Converts some chain of complex type extractors into a [[StructField]]. + * + * @param expr the top-level complex type extractor + * @param fieldOpt the subfield of [[expr]], where relevent + */ + private def selectField(expr: Expression, fieldOpt: Option[StructField]): Option[StructField] = +expr match { + case AttributeReference(name, _, nullable, _) => +fieldOpt.map(field => StructField(name, StructType(Array(field)), nullable)) + case GetArrayItem(GetStructField2(child, field @ StructField(name, + ArrayType(_, arrayNullable), fieldNullable, _)), _) => +val childField = fieldOpt.map(field => StructField(name, ArrayType( + StructType(Array(field)), arrayNullable), fieldNullable)).getOrElse(field) +selectField(child, Some(childField)) + case GetArrayStructFields(child, --- End diff -- I've spent some time this week developing a few different solutions to this problem, however none of them are very easy to understand or verify. I'm going to spend some more time working on a simpler solution before posting something back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16750 **[Test build #72626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72626/testReport)** for PR 16750 at commit [`ffc4912`](https://github.com/apache/spark/commit/ffc4912e17cc900fc9d7ceefd0f66461109728e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16785 **[Test build #72625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72625/testReport)** for PR 16785 at commit [`8c98a5c`](https://github.com/apache/spark/commit/8c98a5c3ab1477408988c8cb682733e65dd554fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72624/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72622/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org