[GitHub] spark issue #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConv...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15254 +1 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #66237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66237/consoleFull)** for PR 15324 at commit [`52a974d`](https://github.com/apache/spark/commit/52a974dd30574247238749b59e226d549e90744f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15231: [SPARK-17658][SPARKR] read.df/write.df API taking path o...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15231 I tested this, and this is the message I see: ``` 16/10/02 05:43:17 ERROR RBackendHandler: getSQLDataType on org.apache.spark.sql.api.r.SQLUtils failed Error in value[[3L]](cond) : Invalid type unknown ``` I think we should lose the first part, "in value[[3L]](cond)"? Perhaps we have the function name instead "getSQLDataType" instead? Also I think it'd be important to differentiate where the message is coming from, so how about this add that to the stop calls, so how about something like ``` Error in getSQLDataType : illegal argument - Invalid type unknown ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15293: [SPARK-17718] [Update MLib Classification Documen...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15293#discussion_r81464884 --- Diff: docs/mllib-linear-methods.md --- @@ -78,6 +78,10 @@ methods `spark.mllib` supports: +A binary label y is denoted as either +1 (positive) or â1 (negative), which is --- End diff -- This duplicates an existing statement below. The idea was to move it u here rather than copy it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/14861#discussion_r81464644 --- Diff: python/pyspark/context.py --- @@ -762,13 +762,16 @@ def accumulator(self, value, accum_param=None): SparkContext._next_accum_id += 1 return Accumulator(SparkContext._next_accum_id - 1, value, accum_param) -def addFile(self, path): +def addFile(self, path, recursive=False): """ Add a file to be downloaded with this Spark job on every node. The C{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI. +A directory can be given if the recursive option is set to true. +Currently directories are onlysupported for Hadoop-supported filesystems. --- End diff -- Minor nit: typo (onlysupported needs a space) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66236/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #66236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66236/consoleFull)** for PR 15324 at commit [`08b0baf`](https://github.com/apache/spark/commit/08b0baf1d00aeb2d6abf8c758d6c566f298548c3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14897: [SPARK-17338][SQL] add global temp view
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14897#discussion_r81463019 --- Diff: docs/sql-programming-guide.md --- @@ -220,6 +220,40 @@ The `sql` function enables applications to run SQL queries programmatically and +## Global Temporary View + +Temporay views in Spark SQL are session-scoped and will disappear if the session that creates it +terminates. If you want to have a temporary view that is shared among all sessions and keep alive +until the Spark application terminiates, you can create a global temporary view. Global temporary +view is tied to a system preserved database `global_temp`, and we must use the qualified name to +refer it, e.g. `SELECT * FROM global_temp.view1`. + + + +{% include_example global_temp_view scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} + + + +{% include_example global_temp_view java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %} + + + +{% include_example global_temp_view python/sql/basic.py %} + + + + +{% highlight sql %} + +CREATE GLOBAL TEMPORARY VIEW temp_view AS SELECT a + 1, b * 2 FROM tbl + +SELECT * FROM global_temp.temp_view + +{% endhighlight %} + + --- End diff -- We need one more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15319 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15319 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66234/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15319 **[Test build #66234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66234/consoleFull)** for PR 15319 at commit [`9639c71`](https://github.com/apache/spark/commit/9639c71862d1e7783bc3ca4d750d68e7aa35be92). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15309: [SPARK-17736] [Documentation][SparkR] [Update R R...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15309#discussion_r81462880 --- Diff: docs/README.md --- @@ -21,6 +21,8 @@ installed. Also install the following libraries: # Following is needed only for generating API docs $ sudo pip install sphinx $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", "testthat"), repos="http://cran.stat.ucla.edu/";)' +$ sudo Rscript -e 'install.packages(c("rmarkdown"), repos="http://cran.stat.ucla.edu/";)' +$ sudo pip install pandoc pandoc-citeproc --- End diff -- `pandoc` itself is not a python package but it appears the python package `pypandoc` manages it. Not sure if `pypandoc` works with `pandoc-citeproc` though --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #66236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66236/consoleFull)** for PR 15324 at commit [`08b0baf`](https://github.com/apache/spark/commit/08b0baf1d00aeb2d6abf8c758d6c566f298548c3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #66235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66235/consoleFull)** for PR 15324 at commit [`4d8a025`](https://github.com/apache/spark/commit/4d8a025198fae1febdcc6351d6d48e4666cc4e65). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15324 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66235/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15322 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15322 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66232/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15322 **[Test build #66232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66232/consoleFull)** for PR 15322 at commit [`b25c849`](https://github.com/apache/spark/commit/b25c84949edf5cf224e8ca93c18734805760dc11). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15090 LGTM except the above minor comments. Test cases mentioned above need to be added to `sql/hive/`, since the correctness could be affected by the behaviors of Hive metastore . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15324 **[Test build #66235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66235/consoleFull)** for PR 15324 at commit [`4d8a025`](https://github.com/apache/spark/commit/4d8a025198fae1febdcc6351d6d48e4666cc4e65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15293: [SPARK-17718] [Update MLib Classification Documen...
Github user jagadeesanas2 commented on a diff in the pull request: https://github.com/apache/spark/pull/15293#discussion_r81462301 --- Diff: docs/mllib-linear-methods.md --- @@ -78,6 +78,10 @@ methods `spark.mllib` supports: +A binary label y is denoted as either +1 (positive) or â1 (negative), which is --- End diff -- As mentioned in the JIRA, i simply added detailed documentation to avoid future confusion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/15324 [SPARK-16872][ML] Gaussian Naive Bayes Classifier ## What changes were proposed in this pull request? implement Gaussian NB in ML ## How was this patch tested? local test in spark-shell, comparing to Scikit-Learn add unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark gnb_1001 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15324.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15324 commit 8e7e2d5e021b3ac14314a87bd7d894a8189edabf Author: Zheng RuiFeng Date: 2016-10-02T02:31:28Z create pr commit 4d8a025198fae1febdcc6351d6d48e4666cc4e65 Author: Zheng RuiFeng Date: 2016-10-02T02:37:54Z fix nit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15309: [SPARK-17736] [Documentation][SparkR] [Update R R...
Github user jagadeesanas2 commented on a diff in the pull request: https://github.com/apache/spark/pull/15309#discussion_r81462265 --- Diff: docs/README.md --- @@ -21,6 +21,8 @@ installed. Also install the following libraries: # Following is needed only for generating API docs $ sudo pip install sphinx $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", "testthat"), repos="http://cran.stat.ucla.edu/";)' +$ sudo Rscript -e 'install.packages(c("rmarkdown"), repos="http://cran.stat.ucla.edu/";)' +$ sudo pip install pandoc pandoc-citeproc --- End diff -- @felixcheung i agree, if we are installing manually on Ubuntu/Debain we need to install using below command ``sudo apt-get install pandoc pandoc-citeproc`` similarly for Fedora/Red Hat: ``sudo yum install pandoc`` Arch: ``sudo pacman -S pandoc`` Mac OS X with Homebrew: ``brew install pandoc pandoc-citeproc Caskroom/cask/mactex`` Machine with Haskell: ``cabal-install pandoc`` @srowen TTBOMK, as it's python package, it can be manage via pip also https://pypi.python.org/pypi/pypandoc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15311: [SPARK-17721][MLlib][backport] Fix for multiplying trans...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15311 Perfect, thanks! Merging now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14326: [SPARK-3181] [ML] Implement RobustRegression with huber ...
Github user tewf commented on the issue: https://github.com/apache/spark/pull/14326 Could we instead implement a more general Robust Linear Model [M-estimator](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html) type like is done in [statsmodels RLM](http://statsmodels.sourceforge.net/0.6.0/rlm.html), see [RLM.py](http://statsmodels.sourceforge.net/0.6.0/_modules/statsmodels/robust/robust_linear_model.html#RLM)? The Huber loss would then be one of the M-estimators, maybe the default as done in statsmodels. I think that the [IterativelyReweightedLeastSquares](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala) was made and intended to aid in developing a robust M-Estimator framework. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15323 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66233/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15323 **[Test build #66233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66233/consoleFull)** for PR 15323 at commit [`2e80037`](https://github.com/apache/spark/commit/2e800378121c7ffb2ee53c630c597acca95493a3). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15323 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15323 **[Test build #66233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66233/consoleFull)** for PR 15323 at commit [`2e80037`](https://github.com/apache/spark/commit/2e800378121c7ffb2ee53c630c597acca95493a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15319 **[Test build #66234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66234/consoleFull)** for PR 15319 at commit [`9639c71`](https://github.com/apache/spark/commit/9639c71862d1e7783bc3ca4d750d68e7aa35be92). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsaf...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/15323 [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe] ## What changes were proposed in this pull request? This PR removes the `CreateNamedStruct` and `CreateNamedStructUnsafe` expressions. This only for simplification purposes. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-17757 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15323 commit 2e800378121c7ffb2ee53c630c597acca95493a3 Author: Herman van Hovell Date: 2016-10-02T01:48:05Z Remove CreateNamedStruct and CreateNamedStructUnsafe expressions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15090 Another test case for Unicode column names in ANALYZE COLUMN: ```Scala // scalastyle:off // non ascii characters are not allowed in the source code, so we disable the scalastyle. val colName1 = "`å1`" val colName2 = "`å2`" // scalastyle:on withTable(table) { sql(s"CREATE TABLE $table ($colName1 int, $colName2 double) USING PARQUET") sql(s"INSERT INTO $table SELECT 1, 3.0") sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS $colName2, $colName1") ... }``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15322 @hvanhovell , it looks great. Most of cases are passed. I'm wondering if we can support the followings, too? ```scala sql("SELECT CASE 'a'='a' WHEN TRUE THEN 1 END").show ``` Since the followings are passed, the above one is a minor one. ```scala sql("SELECT CASE ('a'='a') WHEN TRUE THEN 1 END").show sql("SELECT CASE 1=1 WHEN TRUE THEN 1 END").show sql("SELECT CASE 1='a' WHEN TRUE THEN 1 END").show ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15090 Could you add a positive test case when we turn on the case sensitivity? The scenario is like: ``` withTable(table) { withSQLConf("spark.sql.caseSensitive" -> "true") { sql(s"CREATE TABLE $table (c1 int, C1 double) USING PARQUET") sql(s"INSERT INTO $table SELECT 1, 3.0") sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS c1, C1") } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/15321 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15322 Thank you for pinging me! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user wgtmac commented on the issue: https://github.com/apache/spark/pull/15321 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15322 **[Test build #66232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66232/consoleFull)** for PR 15322 at commit [`b25c849`](https://github.com/apache/spark/commit/b25c84949edf5cf224e8ca93c18734805760dc11). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15322 cc @dongjoon-hyun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15322: [SPARK-17753][SQL] Allow a complex expression as ...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/15322 [SPARK-17753][SQL] Allow a complex expression as the input a value based case statement ## What changes were proposed in this pull request? We currently only allow relatively simple expressions as the input for a value based case statement. Expressions like `case (a > 1) or (b = 2) when true then 1 when false then 0 end` currently fail. This PR adds support for such expressions. ## How was this patch tested? Added a test to the ExpressionParserSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-17753 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15322.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15322 commit b25c84949edf5cf224e8ca93c18734805760dc11 Author: Herman van Hovell Date: 2016-10-02T01:02:56Z Allow a complex expression as the input a value based case statement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r81460994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import scala.collection.mutable + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, LogicalPlan, Statistics} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types._ + + +/** + * Analyzes the given columns of the given table to generate statistics, which will be used in + * query optimizations. + */ +case class AnalyzeColumnCommand( +tableIdent: TableIdentifier, +columnNames: Seq[String]) extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val sessionState = sparkSession.sessionState +val db = tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase) +val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db)) +val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB)) + +relation match { + case catalogRel: CatalogRelation => +updateStats(catalogRel.catalogTable, + AnalyzeTableCommand.calculateTotalSize(sessionState, catalogRel.catalogTable)) + + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateStats(logicalRel.catalogTable.get, logicalRel.relation.sizeInBytes) + + case otherRelation => +throw new AnalysisException("ANALYZE TABLE is not supported for " + + s"${otherRelation.nodeName}.") +} + +def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit = { + val (rowCount, columnStats) = computeColStats(sparkSession, relation) + val statistics = Statistics( +sizeInBytes = newTotalSize, +rowCount = Some(rowCount), +colStats = columnStats ++ catalogTable.stats.map(_.colStats).getOrElse(Map())) + sessionState.catalog.alterTable(catalogTable.copy(stats = Some(statistics))) + // Refresh the cached data source table in the catalog. + sessionState.catalog.refreshTable(tableIdentWithDB) +} + +Seq.empty[Row] + } + + def computeColStats( + sparkSession: SparkSession, + relation: LogicalPlan): (Long, Map[String, ColumnStat]) = { + +// check correctness of column names +val attributesToAnalyze = mutable.MutableList[Attribute]() +val duplicatedColumns = mutable.MutableList[String]() +val resolver = sparkSession.sessionState.conf.resolver +columnNames.foreach { col => + val exprOption = relation.output.find(attr => resolver(attr.name, col)) + val expr = exprOption.getOrElse(throw new AnalysisException(s"Invalid column name: $col.")) + // do deduplication + if (!attributesToAnalyze.contains(expr)) { +attributesToAnalyze += expr + } else { +duplicatedColumns += col + } +} +if (duplicatedColumns.nonEmpty) { + logWarning(s"Duplicated columns ${duplicatedColumns.mkString("(", ", ", ")")} detected " + +s"when analyzing columns ${columnNames.mkString("(", ", ", ")")}, ignoring them.") --- End diff -- How about this? ```Scala logWarning("A duplicate column name was detected in `ANALYZE TABLE` statement. " +
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Thank you for review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15318 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15318 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66231/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15318 **[Test build #66231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66231/consoleFull)** for PR 15318 at commit [`94ae569`](https://github.com/apache/spark/commit/94ae56926c291050ae5c2be4c6f66c2ba84d150e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15318 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15318 **[Test build #66229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66229/consoleFull)** for PR 15318 at commit [`c574559`](https://github.com/apache/spark/commit/c574559d47b21987deb11a52e7f842650681619e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15318 **[Test build #66231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66231/consoleFull)** for PR 15318 at commit [`94ae569`](https://github.com/apache/spark/commit/94ae56926c291050ae5c2be4c6f66c2ba84d150e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 I addressed the comments. For the PR description, it looks okay because we can not enumerate all cases there with the same reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15311: [SPARK-17721][MLlib][backport] Fix for multiplying trans...
Github user bwahlgreen commented on the issue: https://github.com/apache/spark/pull/15311 there ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Ya. I agree. We can revisit if the existing behavior has some issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15318 IMO, you do not need to fix the existing behavior. Maybe you also can check the original PR that delivered the feature of Temporal Interval and check the design and coverage. I quickly went over the SQL-99. Temporal operations are many. I am not sure whether we cover all the issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 BTW, Spark accepts the four cases and return one normalized case, 'INTERVAL 1 DAYS'. I think we don't need to change the current behavior here. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 I see. > Please try all the variants Our Apache Spark supports all the following 4 cases. - interval 1 day - interval 1 days - interval '1' day - interval '1' days ```scala scala> sql("select current_timestamp + INTERVAL 1 DAY, current_timestamp + INTERVAL 1 DAYS, current_timestamp + INTERVAL '1' DAY, current_timestamp + INTERVAL '1' DAYS").show +++++ |CAST(current_timestamp() + interval 1 days AS TIMESTAMP)|CAST(current_timestamp() + interval 1 days AS TIMESTAMP)|CAST(current_timestamp() + interval 1 days AS TIMESTAMP)|CAST(current_timestamp() + interval 1 days AS TIMESTAMP)| +++++ |2016-10-02 14:49:...| 2016-10-02 14:49:...| 2016-10-02 14:49:...|2016-10-02 14:49:...| +++++ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 I'll update the testcases and PR description. Thank you again, @gatorsmile ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15318 It sounds this is database vendor-specific. Please try all the variants. If the queries work in Spark SQL, add these queries into your test cases for SQL generation. BTW, the PR description can be updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 To sum up, if you agree, I will changed the followings. - Use `day` instead of `days` (I need to find where it is.) - Use string type, `1`, instead of integer type, 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Maybe, could you try with '1' instead of 1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 For Postgres, it works like this. ``` select ts + interval '1' day, ts - interval '2' day from dates ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 There are a few things are mixed. First of all, when I checked in MySQL. It supports `select ts + interval 1 day, ts - interval 2 day from dates`. Second, more important, as you see, `days` is the generated string from the current Spark. I think we had better to change that into `day`. Which version of Hive do you mean? For `INTERVAL`, Spark 1.6.2 and hive 1.2 does not support that. BTW, I'm willingly to fix anything. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15318 ``` db2 => select ts + interval 1 days, ts - interval 2 days from dates; SQL0104N An unexpected token "1" was found following "select ts + interval". Expected tokens may include: "". SQLSTATE=42601 db2 => select ts + interval 1 day, ts - interval 2 day from dates SQL0104N An unexpected token "1" was found following "select ts + interval". Expected tokens may include: "". SQLSTATE=42601 ``` Also tried it in Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 I think you mean 'DAYS' is wrong right? I think the following should work there. ```sql select ts + interval 1 day, ts - interval 2 day from dates ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15318#discussion_r81458744 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/ExpressionSQLBuilderSuite.scala --- @@ -119,4 +121,18 @@ class ExpressionSQLBuilderSuite extends SQLBuilderTest { s"(PARTITION BY `a`, `b` ORDER BY `c` ASC NULLS FIRST, `d` DESC NULLS LAST $frame)" ) } + + test("interval arithmetic") { +val interval = Literal(new CalendarInterval(0, 864L)) --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15318 I have a question about the SQL statement: ```SQL select ts + interval 1 days, ts - interval 2 days from dates ``` Is it a valid SQL statement? I tried it in Hive and DB2. Both do not accept it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66230/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15321 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15321 **[Test build #66230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66230/consoleFull)** for PR 15321 at commit [`14fb9a0`](https://github.com/apache/spark/commit/14fb9a0b9fbc41fbbfbba5daab4e8998eaa857fc). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15090 **[Test build #66227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66227/consoleFull)** for PR 15090 at commit [`734abad`](https://github.com/apache/spark/commit/734abad045a5378d14489a4e956b7a8e1c95a811). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66228/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #66228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66228/consoleFull)** for PR 14638 at commit [`74c3e81`](https://github.com/apache/spark/commit/74c3e8113846521d84948c8a101aa5219593a58a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15321 **[Test build #66230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66230/consoleFull)** for PR 15321 at commit [`14fb9a0`](https://github.com/apache/spark/commit/14fb9a0b9fbc41fbbfbba5daab4e8998eaa857fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15321 CC @wgtmac and @ajbozarth for a look, in case I'm missing something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server su...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/15321 [SPARK-17671] [WEBUI] Spark 2.0 history server summary page is slow even set spark.history.ui.maxApplications ## What changes were proposed in this pull request? Return Iterator of applications internally in history server, for consistency and performance. See https://github.com/apache/spark/pull/15248 for some back-story. The code called by and calling HistoryServer.getApplicationList wants an Iterator, but this method materializes an Iterable, which potentially causes a performance problem. It's simpler too to make this internal method also pass through an Iterator. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-17671 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15321.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15321 commit 14fb9a0b9fbc41fbbfbba5daab4e8998eaa857fc Author: Sean Owen Date: 2016-10-01T20:25:42Z Return Iterator of applications internally in history server, for consistency and performance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Thank you, @rxin and @gatorsmile . Finally, I added that correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15318 **[Test build #66229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66229/consoleFull)** for PR 15318 at commit [`c574559`](https://github.com/apache/spark/commit/c574559d47b21987deb11a52e7f842650681619e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15309: [SPARK-17736] [Documentation][SparkR] [Update R R...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15309#discussion_r81457695 --- Diff: docs/README.md --- @@ -21,6 +21,8 @@ installed. Also install the following libraries: # Following is needed only for generating API docs $ sudo pip install sphinx $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", "testthat"), repos="http://cran.stat.ucla.edu/";)' +$ sudo Rscript -e 'install.packages(c("rmarkdown"), repos="http://cran.stat.ucla.edu/";)' +$ sudo pip install pandoc pandoc-citeproc --- End diff -- (And I say this without knowing much about it --) Isn't this a Python package? would it be preferable to manage via pip if so, since that's cross-platform I think? does the Ubuntu package just install the same thing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15299 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15299 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15302 Hi, @hvanhovell . Could you review this PR about 'ALTER TABLE DROP PARTITION'? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15302 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15302 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66225/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15302 **[Test build #66225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66225/consoleFull)** for PR 15302 at commit [`eca9c86`](https://github.com/apache/spark/commit/eca9c8676f8a5500b4ddcacb5758bf33f0deb47e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15319 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15319 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66224/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15319 **[Test build #66224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66224/consoleFull)** for PR 15319 at commit [`e5912f8`](https://github.com/apache/spark/commit/e5912f86c94ff4d6303c9d0a9b80a30d30b99e3d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Oh, I completely forgot about that testsuite. What a shame on me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Thank you, @gatorsmile ! I'll add there, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #66228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66228/consoleFull)** for PR 14638 at commit [`74c3e81`](https://github.com/apache/spark/commit/74c3e8113846521d84948c8a101aa5219593a58a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15318 `LogicalPlanToSQLSuite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14638 Rebased to the master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14623 Hi, @rxin . Do you think Apache Spark needs `window_functions.sql` in `SQLQueryTestSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14426 Hi, @rxin . Could you give me some guide for this `Broadcast Hint for SQL Queries` if you have sometime? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r81456177 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import scala.collection.mutable + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, LogicalPlan, Statistics} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types._ + + +/** + * Analyzes the given columns of the given table to generate statistics, which will be used in + * query optimizations. + */ +case class AnalyzeColumnCommand( +tableIdent: TableIdentifier, +columnNames: Seq[String]) extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val sessionState = sparkSession.sessionState +val db = tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase) +val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db)) +val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB)) + +relation match { + case catalogRel: CatalogRelation => +updateStats(catalogRel.catalogTable, + AnalyzeTableCommand.calculateTotalSize(sessionState, catalogRel.catalogTable)) + + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateStats(logicalRel.catalogTable.get, logicalRel.relation.sizeInBytes) + + case otherRelation => +throw new AnalysisException("ANALYZE TABLE is not supported for " + + s"${otherRelation.nodeName}.") +} + +def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit = { + val (rowCount, columnStats) = computeColStats(sparkSession, relation) + val statistics = Statistics( +sizeInBytes = newTotalSize, +rowCount = Some(rowCount), +colStats = columnStats ++ catalogTable.stats.map(_.colStats).getOrElse(Map())) + sessionState.catalog.alterTable(catalogTable.copy(stats = Some(statistics))) + // Refresh the cached data source table in the catalog. + sessionState.catalog.refreshTable(tableIdentWithDB) +} + +Seq.empty[Row] + } + + def computeColStats( + sparkSession: SparkSession, + relation: LogicalPlan): (Long, Map[String, ColumnStat]) = { + +// check correctness of column names +val attributesToAnalyze = mutable.MutableList[Attribute]() +val duplicatedColumns = mutable.MutableList[String]() +val resolver = sparkSession.sessionState.conf.resolver +columnNames.foreach { col => + val exprOption = relation.output.find(attr => resolver(attr.name, col)) + val expr = exprOption.getOrElse(throw new AnalysisException(s"Invalid column name: $col.")) + // do deduplication + if (!attributesToAnalyze.contains(expr)) { +attributesToAnalyze += expr + } else { +duplicatedColumns += col + } +} +if (duplicatedColumns.nonEmpty) { + logWarning(s"Duplicated columns ${duplicatedColumns.mkString("(", ", ", ")")} detected " + +s"when analyzing columns ${columnNames.mkString("(", ", ", ")")}, ignoring them.") +} + +// Collect statistics per column. +// The first element in the result will be the overall row count, the following eleme