[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13045#issuecomment-220242854 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58850/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13045#issuecomment-220242851 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13045#issuecomment-220242701 **[Test build #58850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58850/consoleFull)** for PR 13045 at commit [`9eb6f40`](https://github.com/apache/spark/commit/9eb6f4063adaf7cda79cdf0bf2ac11414ca5c1d2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13135 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220241901 LGTM too. Merged to master/branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220241617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58844/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220241614 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220241461 **[Test build #58844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58844/consoleFull)** for PR 13188 at commit [`e584575`](https://github.com/apache/spark/commit/e584575bb786e77b7ea1d6de3f80ec556011d291). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` | and i_class in('personal', 'portable', 'reference', 'self-help')` * ` | and i_class in('accessories', 'classical', 'fragrances', 'pants')` * ` |and i_class in('personal', 'portable', 'refernece', 'self-help')` * ` |and i_class in('accessories', 'classical', 'fragrances', 'pants')` * ` | and i_class in('wallpaper', 'parenting', 'musical'))` * ` |and i_class in('womens', 'birdal', 'pants'))` * ` i_class IN ('personal', 'portable', 'reference', 'self-help') AND` * `i_class IN ('accessories', 'classical', 'fragrances', 'pants') AND` * ` AND i_class IN ('personal', 'portable', 'refernece', 'self-help')` * ` AND i_class IN ('accessories', 'classical', 'fragrances', 'pants')` * ` i_class IN ('computers', 'stereo', 'football'))` * ` i_class IN ('shirts', 'birdal', 'dresses')))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220238181 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58848/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220238179 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220238104 **[Test build #58848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58848/consoleFull)** for PR 13189 at commit [`8db358f`](https://github.com/apache/spark/commit/8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerDriverAccumUpdates(executionId: Long, accumUpdates: Seq[AccumulableInfo])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15398][ML] Update the warning message t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13190#issuecomment-220237794 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58851/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15398][ML] Update the warning message t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13190#issuecomment-220237792 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15398][ML] Update the warning message t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13190#issuecomment-220237703 **[Test build #58851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58851/consoleFull)** for PR 13190 at commit [`c6f3244`](https://github.com/apache/spark/commit/c6f324459204ab791ea1b7fa409080105a0301ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on the pull request: https://github.com/apache/spark/pull/10125#issuecomment-220237290 @cloud-fan I tried, and it still fail. It didn't go through the createDataFrame you added in SparkSession. It went with this createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame -> val rows = SQLContext.beansToRows(data.asScala.iterator, beanInfo, attrSeq) the beanToRows will create internal rows and it is from SQLContext. Should we add RowEncoder into the beansToRows call or leave the code as it is ? Thanks. here is the trace scala.MatchError: 1234567 (of class java.math.BigInteger) at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326) at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:892) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:892) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:890) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.toStream(Iterator.scala:1322) at scala.collection.AbstractIterator.toStream(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298) at scala.collection.AbstractIterator.toSeq(Iterator.scala:1336) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373) at test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromLocalJavaBeans(JavaDataFrameSuite.java:200) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-220237304 @felixcheung, this issue seems to relate to system2() only. However, let's wait for HyukjinKwon's test result. @HyukjinKwon, great, go ahead please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15398][ML] Update the warning message t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13190#issuecomment-220236395 **[Test build #58851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58851/consoleFull)** for PR 13190 at commit [`c6f3244`](https://github.com/apache/spark/commit/c6f324459204ab791ea1b7fa409080105a0301ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15398][ML] Update the warning message t...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/13190 [SPARK-15398][ML] Update the warning message to recommend ML usage ## What changes were proposed in this pull request? MLlib are not recommended to use, and some methods are even deprecated. Update the warning message to recommend ML usage. ``` def showWarning() { System.err.println( """WARN: This is a naive implementation of Logistic Regression and is given as an example! |Please use either org.apache.spark.mllib.classification.LogisticRegressionWithSGD or |org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS |for more conventional use. """.stripMargin) } ``` To ``` def showWarning() { System.err.println( """WARN: This is a naive implementation of Logistic Regression and is given as an example! |Please use org.apache.spark.ml.classification.LogisticRegression |for more conventional use. """.stripMargin) } ``` ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark update_recd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13190.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13190 commit c6f324459204ab791ea1b7fa409080105a0301ee Author: Zheng RuiFeng Date: 2016-05-19T06:00:59Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63826284 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/TPCDSQueryBenchmark.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet.tpcds + +import java.io.File + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.Benchmark + +/** + * Benchmark to measure TPCDS query performance. + * To run this: + * spark-submit --class --jars + */ +object TPCDSQueryBenchmark { + val conf = new SparkConf() + conf.set("spark.sql.parquet.compression.codec", "snappy") + conf.set("spark.sql.shuffle.partitions", "4") + conf.set("spark.driver.memory", "3g") + conf.set("spark.executor.memory", "3g") + conf.set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString) + + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) --- End diff -- Hi, @sameeragarwal ! This PR looks great. By the way, could you update line 36~44 with new `SparkSession` builder pattern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220233909 @cloud-fan . Now, it's ready again. Could you merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13045#issuecomment-220233152 **[Test build #58850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58850/consoleFull)** for PR 13045 at commit [`9eb6f40`](https://github.com/apache/spark/commit/9eb6f4063adaf7cda79cdf0bf2ac11414ca5c1d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220233150 **[Test build #58848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58848/consoleFull)** for PR 13189 at commit [`8db358f`](https://github.com/apache/spark/commit/8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220233159 **[Test build #58849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58849/consoleFull)** for PR 13186 at commit [`ac3aa33`](https://github.com/apache/spark/commit/ac3aa334b59d430ea7c239c706ed7e490af5f0b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232881 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232882 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58845/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232874 **[Test build #58845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58845/consoleFull)** for PR 13173 at commit [`ca22d71`](https://github.com/apache/spark/commit/ca22d7102537bd7411f37aa957f877802ebd6d17). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220232797 cc @andrewor14 @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/13189 [SPARK-14670][SQL][WIP] allow updating driver side sql metrics ## What changes were proposed in this pull request? On the SparkUI right now we have this SQLTab that displays accumulator values per operator. However, it only displays metrics updated on the executors, not on the driver. It is useful to also include driver metrics, e.g. broadcast time. This is a different version from https://github.com/apache/spark/pull/12427. This PR sends driver side accumulator updates right after the updating happens, not at the end of execution. But it has some drawback: 1. If there is no update, we won't send zero value updates, and in web UI the operator will be empty, no metrics info in displayed. 2. We need to trigger the event explicitly, not as simply as just update the accumulator. 3. maybe hard to use it inside whole stage codegen. ## How was this patch tested? TODO (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13189 commit 8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb Author: Wenchen Fan Date: 2016-05-19T05:36:34Z allow updating driver side sql metrics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220232622 **[Test build #58847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58847/consoleFull)** for PR 13186 at commit [`5bcef84`](https://github.com/apache/spark/commit/5bcef84700bd4ec51097e58bea099ded54334a59). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220232037 **[Test build #58845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58845/consoleFull)** for PR 13173 at commit [`ca22d71`](https://github.com/apache/spark/commit/ca22d7102537bd7411f37aa957f877802ebd6d17). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13090#issuecomment-220232042 **[Test build #58846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58846/consoleFull)** for PR 13090 at commit [`698c261`](https://github.com/apache/spark/commit/698c2619dc71650ef0faac278014b539387fb273). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220231882 Doesn't seem to be a valid MiMA check failure. Actually the tool crashed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/13173#issuecomment-220231892 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220231390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220231329 **[Test build #58843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58843/consoleFull)** for PR 13139 at commit [`e0079d0`](https://github.com/apache/spark/commit/e0079d03f279dc68eb19faed6d5cb6823802051a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220231389 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220231132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220231131 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230990 **[Test build #58841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58841/consoleFull)** for PR 13122 at commit [`84aa14a`](https://github.com/apache/spark/commit/84aa14a5deda14083520e8e23f83cdb7f5bbb2bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220230863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58840/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220230862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220230908 **[Test build #58844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58844/consoleFull)** for PR 13188 at commit [`e584575`](https://github.com/apache/spark/commit/e584575bb786e77b7ea1d6de3f80ec556011d291). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13188 [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark queries for SparkSQL ## What changes were proposed in this pull request? Now that SparkSQL supports all TPC-DS queries, this patch adds all 99 benchmark queries inside SparkSQL. ## How was this patch tested? Benchmark only You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark tpcds-all Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13188 commit e584575bb786e77b7ea1d6de3f80ec556011d291 Author: Sameer Agarwal Date: 2016-05-03T00:28:12Z Add all TPCDS queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220230733 **[Test build #58840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58840/consoleFull)** for PR 13187 at commit [`9b07d09`](https://github.com/apache/spark/commit/9b07d09301e9c6695e3586e06852f679594d988d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-15078] Add all TPCDS 1.4 benchmark...
Github user sameeragarwal closed the pull request at: https://github.com/apache/spark/pull/12854 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58839/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220230302 **[Test build #58843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58843/consoleFull)** for PR 13139 at commit [`e0079d0`](https://github.com/apache/spark/commit/e0079d03f279dc68eb19faed6d5cb6823802051a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220230146 **[Test build #58839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58839/consoleFull)** for PR 13122 at commit [`0702178`](https://github.com/apache/spark/commit/0702178a3c485aa316d5b03b3aefb2ea4a228cc2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220229725 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220229727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58842/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220229641 **[Test build #58842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58842/consoleFull)** for PR 13139 at commit [`ce7c55e`](https://github.com/apache/spark/commit/ce7c55e14a76dc85bca51a2563d770e3eac3a2a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r63823201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1648,16 +1648,56 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } /** + * Statically evaluate an expression containing one or more aggregates on an empty input. + */ + private def evalOnZeroTups(expr : Expression) : Option[Any] = { +// AggregateExpressions are Unevaluable, so we need to replace all aggregates +// in the expression with the value they would return for zero input tuples. +val rewrittenExpr = expr transform { + case a @ AggregateExpression(aggFunc, _, _, resultId) => +val resultLit = aggFunc.defaultResult match { + case Some(lit) => lit + case None => Literal.default(NullType) +} +Alias(resultLit, "aggVal") (exprId = resultId) +} +Option(rewrittenExpr.eval()) + } + + /** * Construct a new child plan by left joining the given subqueries to a base plan. */ private def constructLeftJoins( child: LogicalPlan, subqueries: ArrayBuffer[ScalarSubquery]): LogicalPlan = { subqueries.foldLeft(child) { case (currentChild, ScalarSubquery(query, conditions, _)) => +val aggOutputExpr = query.asInstanceOf[Aggregate].aggregateExpressions.head --- End diff -- Sorry, didn't see your reply before I posted mine. I must not have refreshed my browser. Thanks for the info on the possible cases. I'm testing the updated static evaluation code now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220228793 **[Test build #58842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58842/consoleFull)** for PR 13139 at commit [`ce7c55e`](https://github.com/apache/spark/commit/ce7c55e14a76dc85bca51a2563d770e3eac3a2a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-220228661 @yanboliang @MLnick Thanks for the feedback. For now, I've just addressed the comment about the optimization section. I'll address the other comments in my next commit (very soon!). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r63823104 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,197 @@ regression model and extracting model summary statistics. +## Generalized linear regression + +When working with data that has a relatively small number of features (< 4096), Spark's GeneralizedLinearRegression interface +allows for flexible specification of [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, GLMs are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). + +$$ +Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right) +$$ + +An exponential family distribution is any probability distribution of the form + +$$ +f\left(y|\theta, \phi, w\right) = \exp{\left(\frac{y\theta - b(\theta)}{\phi/w} - c(y, \phi)\right)} +$$ + +where the parameter of interest $\theta_i$ is related to the expected value of the response variable +$\mu_i$ by + +$$ +\theta_i = h(\mu_i) +$$ + +Here, $h(\mu_i)$ is defined by the form of the exponential family distribution used. GLMs also allow specification +of a link function, which defines the relationship between the expected value of the response variable $\mu_i$ +and the so called _linear predictor_ $\eta_i$: + +$$ +g(\mu_i) = \eta_i = \vec{x_i}^T \cdot \vec{\beta} +$$ + +Often, the link function is chosen such that $h(\mu) = g(\mu)$, which yields a simplified relationship +between the parameter of interest $\theta$ and the linear predictor $\eta$. In this case, the link +function $g(\mu)$ is said to be the "canonical" link function. + +$$ +\theta_i = h(g^{-1}(\eta_i)) = \eta_i +$$ + +A GLM finds the regression coefficients $\vec{\beta}$ which maximize the likelihood function. + +$$ +\min_{\vec{\beta}} \mathcal{L}(\vec{\theta}|\vec{y},X) = +\prod_{i=1}^{N} \exp{\left(\frac{y_i\theta_i - b(\theta_i)}{\phi/w_i} - c(y_i, \phi)\right)} +$$ + +where the parameter of interest $\theta_i$ is related to the regression coefficients $\vec{\beta}$ +by + +$$ +\theta_i = h(g^{-1}(\vec{x_i} \cdot \vec{\beta})) +$$ + +Spark's generalized linear regression interface also provides summary statistics for diagnosing the +fit of GLM models, including residuals, p-values, deviances, the Akaike information criterion, and +others. + +### Available families + + + + + + PDF + Response Type + Supported Links + + + + Gaussian + $\frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{(x - \mu)^2}{2\sigma^2}\right)$ + Continuous + Identity*, Log, Inverse + + + Binomial + $\binom{n}{k}p^k (1-p)^{n-k}$ + Binary + Logit*, Probit, CLogLog + + + Poisson + $\frac{\lambda^k e^{-\lambda}}{k!}$ + Count + Log*, Identity, Sqrt + + + Gamma + $\frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}$ + Continuous + Inverse*, Idenity, Log + +* Canonical Link + + + +### Optimization --- End diff -- So, I went ahead and added some more detail on the optimization routine. I made an effort to stress the limitations on numFeatures and to give some explanation as to why. Could you take a look at it? I didn't generate the docs to make sure it looks alright just yet, but I wanted to get that up so it could be reviewed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220228251 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220228252 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58835/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220228173 **[Test build #58835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58835/consoleFull)** for PR 13186 at commit [`23b43d4`](https://github.com/apache/spark/commit/23b43d4c837d762461dd56a62b85cb998919e0ef). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822575 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- Looks like it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- yes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- Is `capacity` number of row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13167 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220226195 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/10061#discussion_r63822163 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -96,6 +100,7 @@ private[spark] object JsonProtocol { executorMetricsUpdateToJson(metricsUpdate) case blockUpdated: SparkListenerBlockUpdated => throw new MatchError(blockUpdated) // TODO(ekl) implement this + case _ => parse(mapper.writeValueAsString(event)) --- End diff -- > Events are a public API, and they should be carefully crafted, since changing them affects user applications (including event logs). If there is unnecessary information in the event, then it's a bug in the event definition, not here. Yea. I totally agree. However, my concern is that having this line at here will make the developer harder to spot issues during the development. Since the serialization works automatically, we are not making a self-review on what will be serialized and what methods will be called during serialization a mandatory step, which makes the auditing work much harder. Although it introduces more work to the developer to make every event explicitly handled, when we review the pull request, we can clearly know what will be serialized and how a event is serialized when a pull request is submitted. What do you think? btw, if I am missing any context, please let me know :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220225651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58836/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220225648 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220225586 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220225588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58837/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220225530 **[Test build #58836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58836/consoleFull)** for PR 13167 at commit [`a97e358`](https://github.com/apache/spark/commit/a97e3586b7b856d5a62981ff459f48da8d1128bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220225490 **[Test build #58837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58837/consoleFull)** for PR 12719 at commit [`0cb1136`](https://github.com/apache/spark/commit/0cb11361ff70d88ae09a4fd31154999fc9c3efae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-220224055 @sun-rui @felixcheung Let me try to build and run all tests for R first in Windows and then will try to correct and add each test one by one. This will take a bit of time and I might have to ask a lot of questions but anyway I will try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63820600 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220223044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58838/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220223043 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220222980 **[Test build #58838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58838/consoleFull)** for PR 13135 at commit [`9ec58e6`](https://github.com/apache/spark/commit/9ec58e6368d848b90b94145a1bb1354587898d82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13181#issuecomment-220222603 Hi @marmbrus , it seems okay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13187#issuecomment-220222494 **[Test build #58840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58840/consoleFull)** for PR 13187 at commit [`9b07d09`](https://github.com/apache/spark/commit/9b07d09301e9c6695e3586e06852f679594d988d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220222493 **[Test build #58841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58841/consoleFull)** for PR 13122 at commit [`84aa14a`](https://github.com/apache/spark/commit/84aa14a5deda14083520e8e23f83cdb7f5bbb2bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63820108 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- This class is for the compatibility purpose. Let's leave it as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13187 [SPARK-15322][SQL][FOLLOW-UP] Update deprecated accumulator usage into accumulatorV2 ## What changes were proposed in this pull request? This PR corrects another case that uses deprecated `accumulableCollection` to use `listAccumulator`, which seems the previous PR missed. Since `ArrayBuffer[InternalRow]` is `java.util.List[InternalRow]`, it seems reasonable to replace the usage. ## How was this patch tested? Related existing tests `InMemoryColumnarQuerySuite` and `CachedTableSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-15322 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13187.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13187 commit 9b07d09301e9c6695e3586e06852f679594d988d Author: hyukjinkwon Date: 2016-05-19T03:50:37Z Use list accumulator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13135#issuecomment-220222031 **[Test build #58838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58838/consoleFull)** for PR 13135 at commit [`9ec58e6`](https://github.com/apache/spark/commit/9ec58e6368d848b90b94145a1bb1354587898d82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13122#issuecomment-220222027 **[Test build #58839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58839/consoleFull)** for PR 13122 at commit [`0702178`](https://github.com/apache/spark/commit/0702178a3c485aa316d5b03b3aefb2ea4a228cc2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13122#discussion_r63819835 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -234,6 +234,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { ) } + test("unsupported operations") { --- End diff -- @hvanhovell The latest changes added the test cases for the unsupported operations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15130][PySpark][ML][DOCS] pyspark expos...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12914#issuecomment-220219840 @jkbradley @yanboliang @holdenk @sethah let's discuss the issue of defaults in param doc (refer https://github.com/apache/spark/pull/13148#discussion_r63600571) on this PR since it is pertinent. Here, Holden raises 2 issues: 1. The Scaladoc contains default values for many params (sometimes in shared traits). In addition the Scala `Param` itself has the self-contained `doc` field (typically not containing defaults, since the built-in doc shows current and default in `explainParam`). 2. The PyDoc only contains the `Param` `doc` field. (By the way, (1) implies that in cases where the default param value in the trait is overridden, the Scaladoc is incorrect, but that is another issue). The result of (2) is that the HTML API doc doesn't look great, e.g. https://cloud.githubusercontent.com/assets/1036807/15381231/0a937dde-1d7e-11e6-885c-b120679f84ee.png";> Also, nowhere in the PyDoc are the defaults listed, while in the Scaladoc they are. I agree that it would be nice to have the defaults listed in the PyDoc in some way. 1. One solution is the original approach here, where defaults are put in the Param doc in a standard way, but stripped out during `explainParams`. This works but IMO is more prone to breaking in future if people forget to do things in exactly the correct format. It also doesn't directly solve the problem of the API doc looking ugly; 2. Another solution is the current approach here, where the attributes are turned into properties with a docstring (possibly including the default) - this does solve the problem of nice display in the API doc. The downside here is the potentially fairly large change to make everything a property, and the code duplication introduced (though kept to a minimum) and extra boilerplate when adding new params that could be more error-prone; 3. A third solution is what I've done [here](https://github.com/mlnick/spark/tree/sphinx-doc-params) as a PoC, which basically adds the built-in doc as the instance docstring for each Python `Param`. Then we override the `AttributeDocumenter` in Sphinx to handle it. The result displays nicely in the API doc (the same as the property approach, but no defaults are added). The other thing that changes is the `__init__` docstring is brought back (for some reason the current docs are not showing that), which means that the defaults are essentially documented there for each class. In a way this seems more "Pythonic" to me (i.e. Python users are accustomed to seeing the default arg values in constructer doc, e.g. sciki-learn). 4. Another option is to do nothing (for now at least), except bring back the `__init__` docstring. This keeps the ugly-looking `Param` doc, but at least shows the default args for each class, and is the current behavior. We can do something like (1) or (3) later (but maybe not (2) during Spark 2.x as it may be too large a change). 5. A final option is to perhaps document defaults elsewhere (such as the setter for the param which is usually implemented in the class or a model trait in Scala). Let's decide on an approach and make it consistent across the board. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user zzcclp commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220218127 ï¼ zsxwing will this pr be merged into branch 1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13185 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220217816 Didn't merge to 1.6 due to the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/13185#issuecomment-220217602 Thanks. Merging to master, 2.0 and 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217482 Oh, amazing. According to the last Jenkins results. The seven test failures in `catalyst` are all of them. ``` [info] *** 7 TESTS FAILED *** [error] Failed: Total 1656, Failed 7, Errors 0, Passed 1649, Ignored 1 [error] Failed tests: [error] org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite [error] org.apache.spark.sql.catalyst.expressions.CastSuite [error] (catalyst/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 222 s, completed May 18, 2016 8:11:07 PM ``` Anyway, I will handle them in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217398 **[Test build #58837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58837/consoleFull)** for PR 12719 at commit [`0cb1136`](https://github.com/apache/spark/commit/0cb11361ff70d88ae09a4fd31154999fc9c3efae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220217381 **[Test build #58835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58835/consoleFull)** for PR 13186 at commit [`23b43d4`](https://github.com/apache/spark/commit/23b43d4c837d762461dd56a62b85cb998919e0ef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13167#issuecomment-220217395 **[Test build #58836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58836/consoleFull)** for PR 13167 at commit [`a97e358`](https://github.com/apache/spark/commit/a97e3586b7b856d5a62981ff459f48da8d1128bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13156#discussion_r63817417 --- Diff: sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -58,4 +58,16 @@ class HiveContext private[hive]( sparkSession.sharedState.asInstanceOf[HiveSharedState] } + /** + * Invalidate and refresh all the cached the metadata of the given table. For performance reasons, + * Spark SQL or the external data source library it uses might cache certain metadata about a + * table, such as the location of blocks. When those change outside of Spark SQL, users should + * call this function to invalidate the cache. + * + * @since 1.3.0 + */ + def refreshTable(tableName: String): Unit = { --- End diff -- if `invalidateTable` has different meaning than `refreshTable`, should we also add it to `HiveContext`? cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217295 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58834/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217294 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217222 **[Test build #58834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58834/consoleFull)** for PR 12719 at commit [`d8257ee`](https://github.com/apache/spark/commit/d8257eef75433fe25aa4fd9c8c387933f23cfd20). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217246 I removed the last test commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/13186 [SPARK-15397] [SQL] fix string udf locate as hive ## What changes were proposed in this pull request? in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, `locate("aa", "aaa", 0)` would yield 1, `locate("aa", "aaa", 1)` would yield 2 and `locate("aa", "aaa", 2)` would yield 0. This results from the different understanding of the third parameter in udf `locate`. It means the starting index and starts from 1, so when we use 0, the return would always be 0. ## How was this patch tested? tested with modified `StringExpressionsSuite` and `StringFunctionsSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark locate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13186.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13186 commit 23b43d4c837d762461dd56a62b85cb998919e0ef Author: Daoyuan Wang Date: 2016-05-18T11:30:07Z fix string udf locate as hive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220216995 Thank you for understanding. I'll try to handle those test issues in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org