date:20161001

[GitHub] spark issue #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConv...

2016-10-01 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15254
  
+1 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #66237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66237/consoleFull)**
 for PR 15324 at commit 
[`52a974d`](https://github.com/apache/spark/commit/52a974dd30574247238749b59e226d549e90744f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15231: [SPARK-17658][SPARKR] read.df/write.df API taking path o...

2016-10-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15231
  
I tested this, and this is the message I see:
```
16/10/02 05:43:17 ERROR RBackendHandler: getSQLDataType on 
org.apache.spark.sql.api.r.SQLUtils failed
Error in value[[3L]](cond) : Invalid type unknown
```

I think we should lose the first part, "in value[[3L]](cond)"? Perhaps we 
have the function name instead "getSQLDataType" instead?

Also I think it'd be important to differentiate where the message is coming 
from, so how about this add that to the stop calls, so how about something like

```
Error in getSQLDataType : illegal argument - Invalid type unknown
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15293: [SPARK-17718] [Update MLib Classification Documen...

2016-10-01 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15293#discussion_r81464884
  
--- Diff: docs/mllib-linear-methods.md ---
@@ -78,6 +78,10 @@ methods `spark.mllib` supports:
   
 
 
+A binary label y is denoted as either +1 (positive) or â1 (negative), 
which is
--- End diff --

This duplicates an existing statement below. The idea was to move it u here 
rather than copy it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...

2016-10-01 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14861#discussion_r81464644
  
--- Diff: python/pyspark/context.py ---
@@ -762,13 +762,16 @@ def accumulator(self, value, accum_param=None):
 SparkContext._next_accum_id += 1
 return Accumulator(SparkContext._next_accum_id - 1, value, 
accum_param)
 
-def addFile(self, path):
+def addFile(self, path, recursive=False):
 """
 Add a file to be downloaded with this Spark job on every node.
 The C{path} passed can be either a local file, a file in HDFS
 (or other Hadoop-supported filesystems), or an HTTP, HTTPS or
 FTP URI.
 
+A directory can be given if the recursive option is set to true.
+Currently directories are onlysupported for Hadoop-supported 
filesystems.
--- End diff --

Minor nit: typo (onlysupported needs a space)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66236/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #66236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66236/consoleFull)**
 for PR 15324 at commit 
[`08b0baf`](https://github.com/apache/spark/commit/08b0baf1d00aeb2d6abf8c758d6c566f298548c3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14897: [SPARK-17338][SQL] add global temp view

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14897#discussion_r81463019
  
--- Diff: docs/sql-programming-guide.md ---
@@ -220,6 +220,40 @@ The `sql` function enables applications to run SQL 
queries programmatically and
 
 
 
+## Global Temporary View
+
+Temporay views in Spark SQL are session-scoped and will disappear if the 
session that creates it
+terminates. If you want to have a temporary view that is shared among all 
sessions and keep alive
+until the Spark application terminiates, you can create a global temporary 
view. Global temporary
+view is tied to a system preserved database `global_temp`, and we must use 
the qualified name to
+refer it, e.g. `SELECT * FROM global_temp.view1`.
+
+
+
+{% include_example global_temp_view 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
+
+
+
+{% include_example global_temp_view 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
+
+
+
+{% include_example global_temp_view python/sql/basic.py %}
+
+
+
+
+{% highlight sql %}
+
+CREATE GLOBAL TEMPORARY VIEW temp_view AS SELECT a + 1, b * 2 FROM tbl
+
+SELECT * FROM global_temp.temp_view
+
+{% endhighlight %}
+
+
--- End diff --

We need one more 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15319
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66234/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15319
  
**[Test build #66234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66234/consoleFull)**
 for PR 15319 at commit 
[`9639c71`](https://github.com/apache/spark/commit/9639c71862d1e7783bc3ca4d750d68e7aa35be92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15309: [SPARK-17736] [Documentation][SparkR] [Update R R...

2016-10-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15309#discussion_r81462880
  
--- Diff: docs/README.md ---
@@ -21,6 +21,8 @@ installed. Also install the following libraries:
 # Following is needed only for generating API docs
 $ sudo pip install sphinx
 $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", 
"testthat"), repos="http://cran.stat.ucla.edu/";)'
+$ sudo Rscript -e 'install.packages(c("rmarkdown"), 
repos="http://cran.stat.ucla.edu/";)'
+$ sudo pip install pandoc pandoc-citeproc
--- End diff --

`pandoc` itself is not a python package but it appears the python package 
`pypandoc` manages it.
Not sure if `pypandoc` works with `pandoc-citeproc` though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #66236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66236/consoleFull)**
 for PR 15324 at commit 
[`08b0baf`](https://github.com/apache/spark/commit/08b0baf1d00aeb2d6abf8c758d6c566f298548c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #66235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66235/consoleFull)**
 for PR 15324 at commit 
[`4d8a025`](https://github.com/apache/spark/commit/4d8a025198fae1febdcc6351d6d48e4666cc4e65).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66235/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15322
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15322
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66232/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15322
  
**[Test build #66232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66232/consoleFull)**
 for PR 15322 at commit 
[`b25c849`](https://github.com/apache/spark/commit/b25c84949edf5cf224e8ca93c18734805760dc11).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15090
  
LGTM except the above minor comments. 

Test cases mentioned above need to be added to `sql/hive/`, since the 
correctness could be affected by the behaviors of Hive metastore .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15324
  
**[Test build #66235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66235/consoleFull)**
 for PR 15324 at commit 
[`4d8a025`](https://github.com/apache/spark/commit/4d8a025198fae1febdcc6351d6d48e4666cc4e65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15293: [SPARK-17718] [Update MLib Classification Documen...

2016-10-01 Thread jagadeesanas2

Github user jagadeesanas2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15293#discussion_r81462301
  
--- Diff: docs/mllib-linear-methods.md ---
@@ -78,6 +78,10 @@ methods `spark.mllib` supports:
   
 
 
+A binary label y is denoted as either +1 (positive) or â1 (negative), 
which is
--- End diff --

As mentioned in the JIRA, i simply added detailed documentation to avoid 
future confusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-01 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/15324

[SPARK-16872][ML] Gaussian Naive Bayes Classifier

## What changes were proposed in this pull request?
implement Gaussian NB in ML


## How was this patch tested?
local test in spark-shell, comparing to Scikit-Learn
add unit test




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark gnb_1001

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15324


commit 8e7e2d5e021b3ac14314a87bd7d894a8189edabf
Author: Zheng RuiFeng 
Date:   2016-10-02T02:31:28Z

create pr

commit 4d8a025198fae1febdcc6351d6d48e4666cc4e65
Author: Zheng RuiFeng 
Date:   2016-10-02T02:37:54Z

fix nit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15309: [SPARK-17736] [Documentation][SparkR] [Update R R...

2016-10-01 Thread jagadeesanas2

Github user jagadeesanas2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15309#discussion_r81462265
  
--- Diff: docs/README.md ---
@@ -21,6 +21,8 @@ installed. Also install the following libraries:
 # Following is needed only for generating API docs
 $ sudo pip install sphinx
 $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", 
"testthat"), repos="http://cran.stat.ucla.edu/";)'
+$ sudo Rscript -e 'install.packages(c("rmarkdown"), 
repos="http://cran.stat.ucla.edu/";)'
+$ sudo pip install pandoc pandoc-citeproc
--- End diff --

@felixcheung i agree, if we are installing manually on Ubuntu/Debain we 
need to install using below command
``sudo apt-get install pandoc pandoc-citeproc``

similarly for
Fedora/Red Hat: ``sudo yum install pandoc``
Arch: ``sudo pacman -S pandoc``
Mac OS X with Homebrew: ``brew install pandoc pandoc-citeproc 
Caskroom/cask/mactex``
Machine with Haskell: ``cabal-install pandoc``

@srowen  
TTBOMK, as it's python package, it can be manage via pip also
https://pypi.python.org/pypi/pypandoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15311: [SPARK-17721][MLlib][backport] Fix for multiplying trans...

2016-10-01 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15311
  
Perfect, thanks!  Merging now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14326: [SPARK-3181] [ML] Implement RobustRegression with huber ...

2016-10-01 Thread tewf

Github user tewf commented on the issue:

https://github.com/apache/spark/pull/14326
  
 Could we instead implement a more general Robust Linear Model 
[M-estimator](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html)
 type like is done in [statsmodels 
RLM](http://statsmodels.sourceforge.net/0.6.0/rlm.html), see 
[RLM.py](http://statsmodels.sourceforge.net/0.6.0/_modules/statsmodels/robust/robust_linear_model.html#RLM)?
  The Huber loss would then be one of the M-estimators, maybe the default as 
done in statsmodels.

I think that the 
[IterativelyReweightedLeastSquares](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala)
 was made and intended to aid in developing a robust M-Estimator framework.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15323
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66233/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15323
  
**[Test build #66233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66233/consoleFull)**
 for PR 15323 at commit 
[`2e80037`](https://github.com/apache/spark/commit/2e800378121c7ffb2ee53c630c597acca95493a3).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15323
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15323
  
**[Test build #66233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66233/consoleFull)**
 for PR 15323 at commit 
[`2e80037`](https://github.com/apache/spark/commit/2e800378121c7ffb2ee53c630c597acca95493a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15319
  
**[Test build #66234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66234/consoleFull)**
 for PR 15319 at commit 
[`9639c71`](https://github.com/apache/spark/commit/9639c71862d1e7783bc3ca4d750d68e7aa35be92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15323: [SPARK-17757][SQL] Remove CreateNamedStruct[Unsaf...

2016-10-01 Thread hvanhovell

GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/15323

[SPARK-17757][SQL] Remove CreateNamedStruct[Unsafe]

## What changes were proposed in this pull request?
This PR removes the `CreateNamedStruct` and `CreateNamedStructUnsafe` 
expressions. This only for simplification purposes.

## How was this patch tested?
Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-17757

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15323


commit 2e800378121c7ffb2ee53c630c597acca95493a3
Author: Herman van Hovell 
Date:   2016-10-02T01:48:05Z

Remove CreateNamedStruct and CreateNamedStructUnsafe expressions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15090
  
Another test case for Unicode column names in ANALYZE COLUMN:

```Scala
// scalastyle:off
// non ascii characters are not allowed in the source code, so we 
disable the scalastyle.
val colName1 = "`å1`"
val colName2 = "`å2`"
// scalastyle:on
withTable(table) {
  sql(s"CREATE TABLE $table ($colName1 int, $colName2 double) USING 
PARQUET")
  sql(s"INSERT INTO $table SELECT 1, 3.0")
  sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS $colName2, 
$colName1")
  ...
}```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15322
  
@hvanhovell , it looks great.

Most of cases are passed. I'm wondering if we can support the followings, 
too?
```scala
sql("SELECT CASE 'a'='a' WHEN TRUE THEN 1 END").show
```

Since the followings are passed, the above one is a minor one.
```scala
sql("SELECT CASE ('a'='a') WHEN TRUE THEN 1 END").show
sql("SELECT CASE 1=1 WHEN TRUE THEN 1 END").show
sql("SELECT CASE 1='a' WHEN TRUE THEN 1 END").show
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15090
  
Could you add a positive test case when we turn on the case sensitivity? 
The scenario is like:
```
withTable(table) {
  withSQLConf("spark.sql.caseSensitive" -> "true") {
sql(s"CREATE TABLE $table (c1 int, C1 double) USING PARQUET")
sql(s"INSERT INTO $table SELECT 1, 3.0")
sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS c1, C1")
  }
}
``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/15321
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15322
  
Thank you for pinging me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread wgtmac

Github user wgtmac commented on the issue:

https://github.com/apache/spark/pull/15321
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15322
  
**[Test build #66232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66232/consoleFull)**
 for PR 15322 at commit 
[`b25c849`](https://github.com/apache/spark/commit/b25c84949edf5cf224e8ca93c18734805760dc11).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15322: [SPARK-17753][SQL] Allow a complex expression as the inp...

2016-10-01 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15322
  
cc @dongjoon-hyun 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15322: [SPARK-17753][SQL] Allow a complex expression as ...

2016-10-01 Thread hvanhovell

GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/15322

[SPARK-17753][SQL] Allow a complex expression as the input a value based 
case statement

## What changes were proposed in this pull request?
We currently only allow relatively simple expressions as the input for a 
value based case statement. Expressions like `case (a > 1) or (b = 2) when true 
then 1 when false then 0 end` currently fail. This PR adds support for such 
expressions.

## How was this patch tested?
Added a test to the ExpressionParserSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-17753

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15322


commit b25c84949edf5cf224e8ca93c18734805760dc11
Author: Herman van Hovell 
Date:   2016-10-02T01:02:56Z

Allow a complex expression as the input a value based case statement




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r81460994
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table to generate statistics, 
which will be used in
+ * query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
+}
+
+Seq.empty[Row]
+  }
+
+  def computeColStats(
+  sparkSession: SparkSession,
+  relation: LogicalPlan): (Long, Map[String, ColumnStat]) = {
+
+// check correctness of column names
+val attributesToAnalyze = mutable.MutableList[Attribute]()
+val duplicatedColumns = mutable.MutableList[String]()
+val resolver = sparkSession.sessionState.conf.resolver
+columnNames.foreach { col =>
+  val exprOption = relation.output.find(attr => resolver(attr.name, 
col))
+  val expr = exprOption.getOrElse(throw new 
AnalysisException(s"Invalid column name: $col."))
+  // do deduplication
+  if (!attributesToAnalyze.contains(expr)) {
+attributesToAnalyze += expr
+  } else {
+duplicatedColumns += col
+  }
+}
+if (duplicatedColumns.nonEmpty) {
+  logWarning(s"Duplicated columns ${duplicatedColumns.mkString("(", ", 
", ")")} detected " +
+s"when analyzing columns ${columnNames.mkString("(", ", ", ")")}, 
ignoring them.")
--- End diff --

How about this?
```Scala
  logWarning("A duplicate column name was detected in `ANALYZE TABLE` 
statement. " +

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
Thank you for review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15318
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15318
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66231/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15318
  
**[Test build #66231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66231/consoleFull)**
 for PR 15318 at commit 
[`94ae569`](https://github.com/apache/spark/commit/94ae56926c291050ae5c2be4c6f66c2ba84d150e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15318
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66229/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15318
  
**[Test build #66229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66229/consoleFull)**
 for PR 15318 at commit 
[`c574559`](https://github.com/apache/spark/commit/c574559d47b21987deb11a52e7f842650681619e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15318
  
**[Test build #66231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66231/consoleFull)**
 for PR 15318 at commit 
[`94ae569`](https://github.com/apache/spark/commit/94ae56926c291050ae5c2be4c6f66c2ba84d150e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
I addressed the comments. For the PR description, it looks okay because we 
can not enumerate all cases there with the same reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15311: [SPARK-17721][MLlib][backport] Fix for multiplying trans...

2016-10-01 Thread bwahlgreen

Github user bwahlgreen commented on the issue:

https://github.com/apache/spark/pull/15311
  
there ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
Ya. I agree. We can revisit if the existing behavior has some issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15318
  
IMO, you do not need to fix the existing behavior.

Maybe you also can check the original PR that delivered the feature of 
Temporal Interval and check the design and coverage. I quickly went over the 
SQL-99. Temporal operations are many. I am not sure whether we cover all the 
issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
BTW, Spark accepts the four cases and return one normalized case, 'INTERVAL 
1 DAYS'. I think we don't need to change the current behavior here. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
I see.
> Please try all the variants

Our Apache Spark supports all the following 4 cases.
- interval 1 day
- interval 1 days
- interval '1' day
- interval '1' days

```scala
scala> sql("select current_timestamp + INTERVAL 1 DAY, current_timestamp + 
INTERVAL 1 DAYS, current_timestamp + INTERVAL '1' DAY, current_timestamp + 
INTERVAL '1' DAYS").show

+++++
|CAST(current_timestamp() + interval 1 days AS 
TIMESTAMP)|CAST(current_timestamp() + interval 1 days AS 
TIMESTAMP)|CAST(current_timestamp() + interval 1 days AS 
TIMESTAMP)|CAST(current_timestamp() + interval 1 days AS TIMESTAMP)|

+++++
|2016-10-02 14:49:...|  
  2016-10-02 14:49:...|
2016-10-02 14:49:...|2016-10-02 14:49:...|

+++++
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
I'll update the testcases and PR description. Thank you again, @gatorsmile !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15318
  
It sounds this is database vendor-specific. Please try all the variants. If 
the queries work in Spark SQL, add these queries into your test cases for SQL 
generation. 

BTW, the PR description can be updated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
To sum up, if you agree, I will changed the followings.
- Use `day` instead of `days` (I need to find where it is.)
- Use string type, `1`,  instead of integer type, 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
Maybe, could you try with '1' instead of 1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
For Postgres, it works like this.
```
select ts + interval '1' day, ts - interval '2' day from dates
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
There are a few things are mixed.

First of all, when I checked in MySQL. It supports `select ts + interval 1 
day, ts - interval 2 day from dates`.
Second, more important, as you see, `days` is the generated string from the 
current Spark. I think we had better to change that into `day`.

Which version of Hive do you mean? For `INTERVAL`, Spark 1.6.2 and hive 1.2 
does not support that.

BTW, I'm willingly to fix anything. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15318
  
```
db2 => select ts + interval 1 days, ts - interval 2 days from dates;
SQL0104N  An unexpected token "1" was found following "select ts + 
interval".  
Expected tokens may include:  "".  SQLSTATE=42601
db2 => select ts + interval 1 day, ts - interval 2 day from dates
SQL0104N  An unexpected token "1" was found following "select ts + 
interval".  
Expected tokens may include:  "".  SQLSTATE=42601
```

Also tried it in Hive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
I think you mean 'DAYS' is wrong right? I think the following should work 
there.
```sql
select ts + interval 1 day, ts - interval 2 day from dates
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL ...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/15318#discussion_r81458744
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/catalyst/ExpressionSQLBuilderSuite.scala
 ---
@@ -119,4 +121,18 @@ class ExpressionSQLBuilderSuite extends SQLBuilderTest 
{
   s"(PARTITION BY `a`, `b` ORDER BY `c` ASC NULLS FIRST, `d` DESC 
NULLS LAST $frame)"
 )
   }
+
+  test("interval arithmetic") {
+val interval = Literal(new CalendarInterval(0, 864L))
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15318
  
I have a question about the SQL statement: 
```SQL
select ts + interval 1 days, ts - interval 2 days from dates
```

Is it a valid SQL statement? I tried it in Hive and DB2. Both do not accept 
it. 





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15321
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66230/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15321
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15321
  
**[Test build #66230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66230/consoleFull)**
 for PR 15321 at commit 
[`14fb9a0`](https://github.com/apache/spark/commit/14fb9a0b9fbc41fbbfbba5daab4e8998eaa857fc).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66227/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #66227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66227/consoleFull)**
 for PR 15090 at commit 
[`734abad`](https://github.com/apache/spark/commit/734abad045a5378d14489a4e956b7a8e1c95a811).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66228/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #66228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66228/consoleFull)**
 for PR 14638 at commit 
[`74c3e81`](https://github.com/apache/spark/commit/74c3e8113846521d84948c8a101aa5219593a58a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15321
  
**[Test build #66230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66230/consoleFull)**
 for PR 15321 at commit 
[`14fb9a0`](https://github.com/apache/spark/commit/14fb9a0b9fbc41fbbfbba5daab4e8998eaa857fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server summary p...

2016-10-01 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15321
  
CC @wgtmac and @ajbozarth for a look, in case I'm missing something. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15321: [SPARK-17671] [WEBUI] Spark 2.0 history server su...

2016-10-01 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/15321

[SPARK-17671] [WEBUI] Spark 2.0 history server summary page is slow even 
set spark.history.ui.maxApplications

## What changes were proposed in this pull request?

Return Iterator of applications internally in history server, for 
consistency and performance. See https://github.com/apache/spark/pull/15248 for 
some back-story.

The code called by and calling HistoryServer.getApplicationList wants an 
Iterator, but this method materializes an Iterable, which potentially causes a 
performance problem. It's simpler too to make this internal method also pass 
through an Iterator.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-17671

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15321


commit 14fb9a0b9fbc41fbbfbba5daab4e8998eaa857fc
Author: Sean Owen 
Date:   2016-10-01T20:25:42Z

Return Iterator of applications internally in history server, for 
consistency and performance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
Thank you, @rxin and @gatorsmile .
Finally, I added that correctly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15318
  
**[Test build #66229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66229/consoleFull)**
 for PR 15318 at commit 
[`c574559`](https://github.com/apache/spark/commit/c574559d47b21987deb11a52e7f842650681619e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15309: [SPARK-17736] [Documentation][SparkR] [Update R R...

2016-10-01 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15309#discussion_r81457695
  
--- Diff: docs/README.md ---
@@ -21,6 +21,8 @@ installed. Also install the following libraries:
 # Following is needed only for generating API docs
 $ sudo pip install sphinx
 $ sudo Rscript -e 'install.packages(c("knitr", "devtools", "roxygen2", 
"testthat"), repos="http://cran.stat.ucla.edu/";)'
+$ sudo Rscript -e 'install.packages(c("rmarkdown"), 
repos="http://cran.stat.ucla.edu/";)'
+$ sudo pip install pandoc pandoc-citeproc
--- End diff --

(And I say this without knowing much about it --) Isn't this a Python 
package? would it be preferable to manage via pip if so, since that's 
cross-platform I think? does the Ubuntu package just install the same thing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-10-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15299


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-10-01 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15299
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15302
  
Hi, @hvanhovell .
Could you review this PR about 'ALTER TABLE DROP PARTITION'?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15302
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15302
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66225/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15302
  
**[Test build #66225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66225/consoleFull)**
 for PR 15302 at commit 
[`eca9c86`](https://github.com/apache/spark/commit/eca9c8676f8a5500b4ddcacb5758bf33f0deb47e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15319
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15319
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66224/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15319
  
**[Test build #66224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66224/consoleFull)**
 for PR 15319 at commit 
[`e5912f8`](https://github.com/apache/spark/commit/e5912f86c94ff4d6303c9d0a9b80a30d30b99e3d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
Oh, I completely forgot about that testsuite. What a shame on me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15318
  
Thank you, @gatorsmile ! I'll add there, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #66228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66228/consoleFull)**
 for PR 14638 at commit 
[`74c3e81`](https://github.com/apache/spark/commit/74c3e8113846521d84948c8a101aa5219593a58a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15318
  
`LogicalPlanToSQLSuite`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14638
  
Rebased to the master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14623
  
Hi, @rxin .
Do you think Apache Spark needs `window_functions.sql` in 
`SQLQueryTestSuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-10-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14426
  
Hi, @rxin .
Could you give me some guide for this `Broadcast Hint for SQL Queries` if 
you have sometime?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-10-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r81456177
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table to generate statistics, 
which will be used in
+ * query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
+}
+
+Seq.empty[Row]
+  }
+
+  def computeColStats(
+  sparkSession: SparkSession,
+  relation: LogicalPlan): (Long, Map[String, ColumnStat]) = {
+
+// check correctness of column names
+val attributesToAnalyze = mutable.MutableList[Attribute]()
+val duplicatedColumns = mutable.MutableList[String]()
+val resolver = sparkSession.sessionState.conf.resolver
+columnNames.foreach { col =>
+  val exprOption = relation.output.find(attr => resolver(attr.name, 
col))
+  val expr = exprOption.getOrElse(throw new 
AnalysisException(s"Invalid column name: $col."))
+  // do deduplication
+  if (!attributesToAnalyze.contains(expr)) {
+attributesToAnalyze += expr
+  } else {
+duplicatedColumns += col
+  }
+}
+if (duplicatedColumns.nonEmpty) {
+  logWarning(s"Duplicated columns ${duplicatedColumns.mkString("(", ", 
", ")")} detected " +
+s"when analyzing columns ${columnNames.mkString("(", ", ", ")")}, 
ignoring them.")
+}
+
+// Collect statistics per column.
+// The first element in the result will be the overall row count, the 
following eleme

1 2 3 >

1 - 100 of 209 matches

Mail list logo