date:20160514

[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13081#issuecomment-219268076
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58610/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13081#issuecomment-219268075
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13081#issuecomment-219268048
  
**[Test build #58610 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58610/consoleFull)**
 for PR 13081 at commit 
[`ac371dc`](https://github.com/apache/spark/commit/ac371dc988aeaf37c88162b346f304bf7b01639f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13121#issuecomment-219265969
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58609/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13121#issuecomment-219265968
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13121#issuecomment-219265948
  
**[Test build #58609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58609/consoleFull)**
 for PR 13121 at commit 
[`41efcb0`](https://github.com/apache/spark/commit/41efcb038358ad14c57212d7110fa88e355238c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13081#issuecomment-219265793
  
**[Test build #58610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58610/consoleFull)**
 for PR 13081 at commit 
[`ac371dc`](https://github.com/apache/spark/commit/ac371dc988aeaf37c88162b346f304bf7b01639f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14130] [SQL] Throw exceptions for ALTER...

2016-05-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12714#discussion_r63285806
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -179,6 +173,11 @@ unsupportedHiveNativeCommands
 | kw1=ALTER kw2=TABLE tableIdentifier kw3=TOUCH
 | kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=COMPACT
 | kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=CONCATENATE
+| kw1=START kw2=TRANSACTION
+| kw1=COMMIT
+| kw1=ROLLBACK
+| kw1=DFS
--- End diff --

We still need to ban the related CLI commands in CLI Driver. Let me fix 
them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13121#issuecomment-219263713
  
**[Test build #58609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58609/consoleFull)**
 for PR 13121 at commit 
[`41efcb0`](https://github.com/apache/spark/commit/41efcb038358ad14c57212d7110fa88e355238c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command

2016-05-14 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/13121

[SPARK-15330] [SQL] Implement Reset Command

 What changes were proposed in this pull request?
Like `Set` Command in Hive, `Reset` is also supported by Hive. See the 
link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli 

Below is the related Hive JIRA: 
https://issues.apache.org/jira/browse/HIVE-3202

This PR is to implement such a command for resetting the SQL-related 
configuration to the default values. One of the use case shown in HIVE-3202 is 
listed below:

> For the purpose of optimization we set various configs per query. It's 
worthy but all those configs should be reset every time for next query.


 How was this patch tested?
Added a test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark resetCommand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13121


commit 657ff6d57ebe9e72521d8b4c2393414e0c7a386c
Author: gatorsmile 
Date:   2016-05-15T02:08:34Z

implement reset

commit 3e93b62b1aadb76e2d178adfa6655db3edded7e8
Author: gatorsmile 
Date:   2016-05-15T02:58:46Z

fix spark-sql cli

commit 41efcb038358ad14c57212d7110fa88e355238c6
Author: gatorsmile 
Date:   2016-05-15T03:05:38Z

improve the comments.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-14 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63284859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -129,6 +129,23 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   }
 
   /**
+   * Set this Decimal to the given BigInteger value. Will have precision 
38 and scale 0.
+   */
+  def set(BigIntVal: BigInteger): Decimal = {
--- End diff --

I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15318][ML][Example]:spark.ml Collaborat...

2016-05-14 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13110#discussion_r63284035
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala ---
@@ -28,7 +28,7 @@ object ALSExample {
 
   // $example on$
   case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: 
Long)
-  object Rating {
+  object RatingUtil {
--- End diff --

I can move it into the main. I think it is not necessary. I will make the 
change and test it.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15269][SQL] Set provided path to Catalo...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13120#issuecomment-219253785
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15269][SQL] Set provided path to Catalo...

2016-05-14 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/13120#issuecomment-219253654
  
cc @liancheng @yhuai @gatorsmile Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15269][SQL] Set provided path to Catalo...

2016-05-14 Thread xwu0226

GitHub user xwu0226 opened a pull request:

https://github.com/apache/spark/pull/13120

[SPARK-15269][SQL] Set provided path to CatalogTable.storage.locationURI 
when creating external non-hive compatible table

## What changes were proposed in this pull request?
### Symptom
```
scala> 
spark.range(1).write.json("/home/xwu0226/spark-test/data/spark-15269")
Datasource.write -> Path: file:/home/xwu0226/spark-test/data/spark-15269


scala> spark.sql("create table spark_15269 using json options(PATH 
'/home/xwu0226/spark-test/data/spark-15269')")
16/05/11 14:51:00 WARN CreateDataSourceTableUtils: Couldn't find 
corresponding Hive SerDe for data source provider json. Persisting data source 
relation `spark_15269` into Hive metastore in Spark SQL specific format, which 
is NOT compatible with Hive.
going through newSparkSQLSpecificMetastoreTable()
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("drop table spark_15269")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("create table spark_15269 using json as select 1 as a")
org.apache.spark.sql.AnalysisException: path 
file:/user/hive/warehouse/spark_15269 already exists.;
  at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:88)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:62)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:60)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
...
```
The 2nd creation of the table fails complaining about the path exists. 

### Root cause:
When the first table is created as external table with the data source 
path, but as json, `createDataSourceTables `considers it as non-hive compatible 
table because `json `is not a Hive SerDe. Then, 
`newSparkSQLSpecificMetastoreTable`is invoked to create the `CatalogTable 
`before asking HiveClient to create the metastore table. In this call, 
`locationURI `is not set. So when we convert `CatalogTable` to HiveTable before 
passing to Hive Metastore, hive table's data location is not set. Then, Hive 
metastore implicitly creates a data location as /tableName, 
which is, `file:/user/hive/warehouse/spark_15269` in the above case. 

When dropping the table, hive does not delete this implicitly created path 
because the table is external.

when we create the 2nd table with select and without a path, the table is 
created as managed table, provided a default path in the options as following:
```
val optionsWithPath =
  if (!new CaseInsensitiveMap(options).contains("path")) {
isExternal = false
options + ("path" -> 
sessionState.catalog.defaultTablePath(tableIdent))
  } else {
options
  }
```
This default path happens to be the hive's warehouse directory + the table 
name, which is the same as the one hive metastore implicitly created earlier 
for the 1st table.  So when trying to write the provided data to this data 
source table by `InsertIntoHadoopFsRelation`, which complains about the path 
existence since the SaveMode is SaveMode.ErrorIfExists.

### Solution:
When creating an external datasource table that is non-hive compatible, 
make sure we set the provided path to `CatalogTable.storage.locationURI`, so we 
avoid hive metastore from implicitly creating a data location for the table.

## How was this patch tested?
Testcase is added. And run regtest. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xwu0226/spark SPARK-15269

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13120


commit 21d188321284a86176927445fd1703353e0add09
Author: xin Wu 
Date:   2016-05-08T07:06:36Z

spark-15206 add testcases for distinct aggregate in having clause following 
up PR12974

commit e43d56ab260633d7c2af54a6960cec7eadff34c4
Author: xin Wu 
Date:   2016-05-08T07:09:44Z

Revert "spark-15206 add testcases for distinct aggregate in having clause 
following up PR12974"

This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe.

commit f9f1f1f36f3759eecfb6070b2372462ee454b700
Author: xin Wu 
Date:   2016-05-13T00:39:45Z

SPARK-15269: set locationUFI to the non-hive compatible metastore table

commit 58ad82db21f90b571d70371ff25c167ecda17720
Author: xin Wu 
Date:   2016-05-14T20:16:11Z

SPARK-15269: only for

[GitHub] spark pull request: [SPARK-15328][MLLIB][ML] Word2Vec import for o...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13119#issuecomment-219237391
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15328][MLLIB][ML] Word2Vec import for o...

2016-05-14 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/13119

[SPARK-15328][MLLIB][ML] Word2Vec import for original binary format

## What changes were proposed in this pull request?

Add `loadGoogleModel()` function to import original wor2vec binary format.


## How was this patch tested?

`mllib.feature.Word2VecSuite` and `ml.feature.Word2VecSuite`




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13119


commit 17f98089b2371033c5c88933123f070c7ad4c145
Author: Yuming Wang 
Date:   2016-05-14T18:44:15Z

Load Google word2vec model




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219230521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219230523
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58608/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219230470
  
**[Test build #58608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58608/consoleFull)**
 for PR 13117 at commit 
[`1ff05ba`](https://github.com/apache/spark/commit/1ff05ba66f2595c850357ccf2150d6b9a3f61bfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15304] [SPARK-15305] [SPARK-15306] [SQL...

2016-05-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12812#discussion_r63278723
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -64,6 +65,19 @@ class SparkSession private(
|  Session-related state  |
* --- */
 
+  {
+val defaultWarehousePath =
+  SQLConf.WAREHOUSE_PATH
+.defaultValueString
+.replace("${system:user.dir}", System.getProperty("user.dir"))
+val warehousePath = sparkContext.conf.get(
+  SQLConf.WAREHOUSE_PATH.key,
+  defaultWarehousePath)
+sparkContext.conf.set(SQLConf.WAREHOUSE_PATH.key, warehousePath)
+sparkContext.conf.set("hive.metastore.warehouse.dir", warehousePath)
--- End diff --

Currently, the `Set` command does not work if the property is `hive.x.y.z`. 
Will try to submit a PR for resolving that tonight. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15304] [SPARK-15305] [SPARK-15306] [SQL...

2016-05-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12812#discussion_r63277350
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -64,6 +65,19 @@ class SparkSession private(
|  Session-related state  |
* --- */
 
+  {
+val defaultWarehousePath =
+  SQLConf.WAREHOUSE_PATH
+.defaultValueString
+.replace("${system:user.dir}", System.getProperty("user.dir"))
+val warehousePath = sparkContext.conf.get(
+  SQLConf.WAREHOUSE_PATH.key,
+  defaultWarehousePath)
+sparkContext.conf.set(SQLConf.WAREHOUSE_PATH.key, warehousePath)
+sparkContext.conf.set("hive.metastore.warehouse.dir", warehousePath)
--- End diff --

At runtime, if users change the value of `SQLConf.WAREHOUSE_PATH.key` by 
using the `Set` Command, we still need to set `hive.metastore.warehouse.dir`. 
Right? 

In addition, I think we should disallow users to change the value of 
`hive.metastore.warehouse.dir` by using the `Set` Command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219224748
  
**[Test build #58608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58608/consoleFull)**
 for PR 13117 at commit 
[`1ff05ba`](https://github.com/apache/spark/commit/1ff05ba66f2595c850357ccf2150d6b9a3f61bfd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: update from orign

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13118#issuecomment-219223888
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: update from orign

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13118#issuecomment-219223860
  
@zhaorongsheng it seems it is open mistakenly. I guess this might have to 
be closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: update from orign

2016-05-14 Thread zhaorongsheng

GitHub user zhaorongsheng opened a pull request:

https://github.com/apache/spark/pull/13118

update from orign

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhaorongsheng/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13118


commit 00a39d9c05c55b5ffcd4f49aadc91cedf227669a
Author: Patrick Wendell 
Date:   2015-12-15T23:09:57Z

Preparing Spark release v1.6.0-rc3

commit 08aa3b47e6a295a8297e741effa14cd0d834aea8
Author: Patrick Wendell 
Date:   2015-12-15T23:10:04Z

Preparing development version 1.6.0-SNAPSHOT

commit 9e4ac56452710ddd8efb695e69c8de49317e3f28
Author: tedyu 
Date:   2015-12-16T02:15:10Z

[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling 
setConf

This is continuation of SPARK-12056 where change is applied to 
SqlNewHadoopRDD.scala

andrewor14
FYI

Author: tedyu 

Closes #10164 from tedyu/master.

(cherry picked from commit f725b2ec1ab0d89e35b5e2d3ddeddb79fec85f6d)
Signed-off-by: Andrew Or 

commit 2c324d35a698b353c2193e2f9bd8ba08c741c548
Author: Timothy Chen 
Date:   2015-12-16T02:20:00Z

[SPARK-12351][MESOS] Add documentation about submitting Spark with mesos 
cluster mode.

Adding more documentation about submitting jobs with mesos cluster mode.

Author: Timothy Chen 

Closes #10086 from tnachen/mesos_supervise_docs.

(cherry picked from commit c2de99a7c3a52b0da96517c7056d2733ef45495f)
Signed-off-by: Andrew Or 

commit 8e9a600313f3047139d3cebef85acc782903123b
Author: Naveen 
Date:   2015-12-16T02:25:22Z

[SPARK-9886][CORE] Fix to use ShutdownHookManager in

ExternalBlockStore.scala

Author: Naveen 

Closes #10313 from naveenminchu/branch-fix-SPARK-9886.

(cherry picked from commit 8a215d2338c6286253e20122640592f9d69896c8)
Signed-off-by: Andrew Or 

commit 93095eb29a1e59dbdbf6220bfa732b502330e6ae
Author: Bryan Cutler 
Date:   2015-12-16T02:28:16Z

[SPARK-12062][CORE] Change Master to asyc rebuild UI when application 
completes

This change builds the event history of completed apps asynchronously so 
the RPC thread will not be blocked and allow new workers to register/remove if 
the event log history is very large and takes a long time to rebuild.

Author: Bryan Cutler 

Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.

(cherry picked from commit c5b6b398d5e368626e589feede80355fb74c2bd8)
Signed-off-by: Andrew Or 

commit fb08f7b784bc8b5e0cd110f315f72c7d9fc65e08
Author: Wenchen Fan 
Date:   2015-12-16T02:29:19Z

[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability

Author: Wenchen Fan 

Closes #8645 from cloud-fan/test.

(cherry picked from commit a89e8b6122ee5a1517fbcf405b1686619db56696)
Signed-off-by: Andrew Or 

commit a2d584ed9ab3c073df057bed5314bdf877a47616
Author: Timothy Hunter 
Date:   2015-12-16T18:12:33Z

[SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation

This fixes the sidebar, using a pure CSS mechanism to hide it when the 
browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).

Note that I am not a CSS expert, so I can only address comments up to some 
extent.

Default view:
https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png;>

When collapsed manually by the user:
https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png;>

Disappears when column is too narrow:
https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png;>

Can still be opened by the user if necessary:
https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png;>

Author: Timothy Hunter

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219223504
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219223502
  
**[Test build #58607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58607/consoleFull)**
 for PR 13117 at commit 
[`bc1720e`](https://github.com/apache/spark/commit/bc1720e60bb49165dca71691a3e0dfd2c23641b5).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219223505
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58607/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13117#issuecomment-219223421
  
**[Test build #58607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58607/consoleFull)**
 for PR 13117 at commit 
[`bc1720e`](https://github.com/apache/spark/commit/bc1720e60bb49165dca71691a3e0dfd2c23641b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13117#discussion_r63276920
  
--- Diff: sql/hive-thriftserver/pom.xml ---
@@ -106,12 +111,6 @@
   
 
   
-
--- End diff --

This isn't technically related, but is a simple fix for a build warning, 
and was editing the file anyway


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...

2016-05-14 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/13117

[SPARK-12972] [CORE] Update org.apache.httpcomponents.httpclient

## What changes were proposed in this pull request?

(Retry of https://github.com/apache/spark/pull/13049)

- update to httpclient 4.5 / httpcore 4.4
- remove some defunct exclusions
- manage httpmime version to match
- update selenium / httpunit to support 4.5 (possible now that Jetty 9 is 
used)

## How was this patch tested?

Jenkins tests. Also, locally running the same test command of one Jenkins 
profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver 
-Pkinesis-asl ...`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-12972.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13117


commit bc1720e60bb49165dca71691a3e0dfd2c23641b5
Author: Sean Owen 
Date:   2016-05-14T14:26:39Z

Update to httpclient 4.5 / httpcore 4.4. Remove some defunct exclusions; 
manage httpmime version to match. Update selenium / httpunit to support 4.5 
(possible now that Jetty 9 is used)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc lay...

2016-05-14 Thread sun-rui

Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/13109#issuecomment-219222873
  
This looks better. but the roxygen style is a little bit deviated. The 
previous is like:
#' function name
#' description

Current is like:
#' function name - description

We may need a consistent roxygen style documentation. At least for two 
styles:
one function for one RD
multiple functions for one RD

And also if you type '?corr' in R, only corr() for Column functions is 
displayed.  Since R is function oriented, I think two corr() descriptions 
better to be displayed together in one page?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63276327
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -321,11 +323,13 @@ object CatalystTypeConverters {
   }
 
   private class DecimalConverter(dataType: DecimalType)
-extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
+  extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
--- End diff --

Why change this? I think we should use encoders most of the time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10125#discussion_r63276311
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -129,6 +129,23 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   }
 
   /**
+   * Set this Decimal to the given BigInteger value. Will have precision 
38 and scale 0.
+   */
+  def set(BigIntVal: BigInteger): Decimal = {
--- End diff --

lower case the variable name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219216499
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58606/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219216498
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219216467
  
**[Test build #58606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58606/consoleFull)**
 for PR 13113 at commit 
[`d4b41c5`](https://github.com/apache/spark/commit/d4b41c596fa9d95282633694508c4a910418a4ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13115#issuecomment-219213787
  
(I think it would be nicer if the PR description is fill up.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13116#discussion_r63274249
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   3, 17, 27, 58, 62)
   }
 
+  test("takeSample") {
+val n = 1000
+val data = sparkContext.parallelize(1 to n, 2).toDS()
+for (num <- List(0, 5, 20, 100)) {
+  val sample = data.takeSample(withReplacement = false, num = num)
+  assert(sample.count === num) // Got exactly num elements
+  assert(sample.distinct.count === num) // Elements are distinct
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = false, 20, seed)
+  assert(sample.count() === 20) // Got exactly 20 elements
+  assert(sample.distinct.count === 20) // Elements are distinct
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = false, 100, seed)
+  assert(sample.count === 100) // Got only 100 elements
+  assert(sample.distinct.count === 100) // Elements are distinct
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = true, 20, seed)
+  assert(sample.count === 20) // Got exactly 20 elements
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+{
+  val sample = data.takeSample(withReplacement = true, num = 20)
+  assert(sample.count === 20) // Got exactly 100 elements
+val sampleDisCount = sample.distinct.count
+  assert(sampleDisCount <= 20, "sampling with replacement returned all 
distinct elements")
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+{
+  val sample = data.takeSample(withReplacement = true, num = n)
+  assert(sample.count === n) // Got exactly 100 elements
+  // Chance of getting all distinct elements is astronomically low, so 
test we got < 100
+  assert(sample.distinct.count < n, "sampling with replacement 
returned all distinct elements")
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = true, n, seed)
+  assert(sample.count === n) // Got exactly 100 elements
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = true, 2 * n, seed)
+  assert(sample.count === 2 * n) // Got exactly 200 elements
+  // Chance of getting all distinct elements is still quite low, so 
test we got < 100
+  assert(sample.distinct.count < n, "sampling with replacement 
returned all distinct elements")
+}
+{
+  val emptySet = sparkContext.parallelize(Seq.empty[Int], 2)
+  val sample = emptySet.takeSample(false, 20, 1)
+  assert(sample.length === 0)
+}
--- End diff --

(I think we might not need a extra closure here and below)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13116#discussion_r63274238
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   3, 17, 27, 58, 62)
   }
 
+  test("takeSample") {
+val n = 1000
+val data = sparkContext.parallelize(1 to n, 2).toDS()
+for (num <- List(0, 5, 20, 100)) {
+  val sample = data.takeSample(withReplacement = false, num = num)
+  assert(sample.count === num) // Got exactly num elements
+  assert(sample.distinct.count === num) // Elements are distinct
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = false, 20, seed)
+  assert(sample.count() === 20) // Got exactly 20 elements
+  assert(sample.distinct.count === 20) // Elements are distinct
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = false, 100, seed)
+  assert(sample.count === 100) // Got only 100 elements
+  assert(sample.distinct.count === 100) // Elements are distinct
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+for (seed <- 1 to 5) {
+  val sample = data.takeSample(withReplacement = true, 20, seed)
+  assert(sample.count === 20) // Got exactly 20 elements
+  val sampleData = sample.collect()
+  assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in 
[1, $n]")
+}
+{
+  val sample = data.takeSample(withReplacement = true, num = 20)
+  assert(sample.count === 20) // Got exactly 100 elements
+val sampleDisCount = sample.distinct.count
--- End diff --

(It seems indentation is not consistent here.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13116#discussion_r63274228
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -18,6 +18,9 @@
 package org.apache.spark.sql
 
 import java.io.CharArrayWriter
+import java.util.Random
+
+import org.apache.spark.util.random.SamplingUtils
--- End diff --

(it seems we need to reorder imports, 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13116#issuecomment-219213307
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13105#discussion_r63274194
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
 ---
@@ -172,4 +173,13 @@ class DefaultSource extends FileFormat with 
DataSourceRegister {
 .mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, 
pair._2.getLength, charset)))
 }
   }
+
+  private def verifySchema(schema: StructType): Unit = {
+schema.foreach(field => field.dataType match {
--- End diff --

(Maybe starting with `{` for a multiple-line closure, `foreach { field =>`)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...

2016-05-14 Thread burness

GitHub user burness opened a pull request:

https://github.com/apache/spark/pull/13116

[SPARK-15324] [SQL] Add the takeSample function to the Dataset

## What changes were proposed in this pull request?

In this pr, I add the takeSample function with the Dataset which is to 
sampling with the specify num instead of the fraction in sample function.


## How was this patch tested?

add a test in `DatasetSuite`




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/burness/spark takeSample

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13116


commit c003f24cf402bcf80c0d920d4291f3753cb76ed1
Author: burness 
Date:   2016-05-14T10:16:56Z

add takeSample in Dataset

commit 9874686563de7a5cf2bf312481910126f3dc0f12
Author: burness 
Date:   2016-05-14T10:20:31Z

modify the format of the comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13115#issuecomment-219213070
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13115#issuecomment-219213071
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58605/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13115#issuecomment-219213068
  
**[Test build #58605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58605/consoleFull)**
 for PR 13115 at commit 
[`3b5eb9b`](https://github.com/apache/spark/commit/3b5eb9bb6bdf7377144b9bdd6c97a9cd5f39d088).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13115#issuecomment-219212931
  
**[Test build #58605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58605/consoleFull)**
 for PR 13115 at commit 
[`3b5eb9b`](https://github.com/apache/spark/commit/3b5eb9bb6bdf7377144b9bdd6c97a9cd5f39d088).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219212927
  
**[Test build #58606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58606/consoleFull)**
 for PR 13113 at commit 
[`d4b41c5`](https://github.com/apache/spark/commit/d4b41c596fa9d95282633694508c4a910418a4ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread KaiXinXiaoLei

Github user KaiXinXiaoLei commented on the pull request:

https://github.com/apache/spark/pull/10900#issuecomment-219212923
  
@andrewor14  See https://github.com/apache/spark/pull/13115,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread KaiXinXiaoLei

Github user KaiXinXiaoLei closed the pull request at:

https://github.com/apache/spark/pull/10900


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...

2016-05-14 Thread KaiXinXiaoLei

GitHub user KaiXinXiaoLei opened a pull request:

https://github.com/apache/spark/pull/13115

[SPARK-12492] Using spark-sql commond to run query, write the event of 
SparkListenerJobStart

See https://github.com/apache/spark/pull/10900
## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
hot; otherwise, remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KaiXinXiaoLei/spark sqlPage2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13115


commit 3b5eb9bb6bdf7377144b9bdd6c97a9cd5f39d088
Author: KaiXinXiaoLei 
Date:   2016-05-14T10:17:30Z

sql page




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.4

2016-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/13114#issuecomment-219212846
  
Close this PR @GuoNing89 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.4

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13114#issuecomment-219212714
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.4

2016-05-14 Thread GuoNing89

GitHub user GuoNing89 opened a pull request:

https://github.com/apache/spark/pull/13114

Branch 1.4

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13114


commit 4634be5a7db4f2fd82cfb5c602b79129d1d9e246
Author: Josh Rosen 
Date:   2015-06-14T16:34:35Z

[SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch 
space in UnsafeFixedWidthAggregationMap

UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when 
allocating row conversion scratch space: we take a size requirement, measured 
in bytes, then allocate a long array of that size.  This means that we end up 
allocating 8x too much conversion space.

This patch fixes this by allocating a `byte[]` array instead.  This doesn't 
impose any new limitations on the maximum sizes of UnsafeRows, since 
UnsafeRowConverter already used integers when calculating the size requirements 
for rows.

Author: Josh Rosen 

Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the 
following commits:

6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is 
constrained by max byte[] size

(cherry picked from commit ea7fd2ff6454e8d819a39bf49901074e49b5714e)
Signed-off-by: Josh Rosen 

commit 2805d145e30e4cabd11a7d33c4f80edbc54cc54a
Author: Michael Armbrust 
Date:   2015-06-14T18:21:42Z

[SPARK-8358] [SQL] Wait for child resolution when resolving generators

Author: Michael Armbrust 

Closes #6811 from marmbrus/aliasExplodeStar and squashes the following 
commits:

fbd2065 [Michael Armbrust] more style
806a373 [Michael Armbrust] fix style
7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when 
resolving generatorsa

(cherry picked from commit 9073a426e444e4bc6efa8608e54e0a986f38a270)
Signed-off-by: Michael Armbrust 

commit 0ffbf085190b9d4dc13a8b6545e4e1022083bd35
Author: Peter Hoffmann 
Date:   2015-06-14T18:41:16Z

fix read/write mixup

Author: Peter Hoffmann 

Closes #6815 from hoffmann/patch-1 and squashes the following commits:

2abb6da [Peter Hoffmann] fix read/write mixup

(cherry picked from commit f3f2a4397da164f0ddfa5d60bf441099296c4346)
Signed-off-by: Reynold Xin 

commit fff8d7ee6c7e88ed96c29260480e8228e7fb1435
Author: tedyu 
Date:   2015-06-16T00:00:38Z

SPARK-8336 Fix NullPointerException with functions.rand()

This PR fixes the problem reported by Justin Yip in the thread 
'NullPointerException with functions.rand()'

Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", 
rand(30)).show()

Author: tedyu 

Closes #6793 from tedyu/master and squashes the following commits:

62fd97b [tedyu] Create RandomSuite
750f92c [tedyu] Add test for Rand() with seed
a1d66c5 [tedyu] Fix NullPointerException with functions.rand()

(cherry picked from commit 1a62d61696a0481508d83a07d19ab3701245ac20)
Signed-off-by: Reynold Xin 

commit f287f7ea141fa7a3e9f8b7d3a2180b63cd77088d
Author: huangzhaowei 
Date:   2015-06-16T06:16:09Z

[SPARK-8367] [STREAMING] Add a limit for 'spark.streaming.blockInterval` 
since a data loss bug.

Bug had reported in the jira 
[SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367)
The relution is limitting the configuration `spark.streaming.blockInterval` 
to a positive number.

Author: huangzhaowei 
Author: huangzhaowei 

Closes #6818 from SaintBacchus/SPARK-8367 and squashes the following 
commits:

c9d1927 [huangzhaowei] Update BlockGenerator.scala
bd3f71a [huangzhaowei] Use requre instead of if
3d17796 [huangzhaowei] [SPARK_8367][Streaming]Add a limit for 
'spark.streaming.blockInterval' since a data loss bug.

(cherry picked from commit

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219212380
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219212377
  
**[Test build #58604 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58604/consoleFull)**
 for PR 13113 at commit 
[`f854382`](https://github.com/apache/spark/commit/f85438276a40a13b260cb3b96d4dc0cea4113412).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219212381
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58604/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13113#issuecomment-219212325
  
**[Test build #58604 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58604/consoleFull)**
 for PR 13113 at commit 
[`f854382`](https://github.com/apache/spark/commit/f85438276a40a13b260cb3b96d4dc0cea4113412).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...

2016-05-14 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/13113

[SPARK-15325][SQL] Replace the usage of deprecated DataSet API in tests 
(Scala/Java)

## What changes were proposed in this pull request?

It seems `unionAll(other: Dataset[T])` and `registerTempTable(tableName: 
String)` are deprecated but it seems they are still being used across Spark 
tests.

In Scala/Java, it seems only `registerTempTable(tableName: String)` is 
being used. This PR replaces `registerTempTable(tableName: String)` to 
`createOrReplaceTempView(viewName: String)`

## How was this patch tested?

Jenkins tests. Existing tests should cover this.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-15325

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13113


commit f85438276a40a13b260cb3b96d4dc0cea4113412
Author: hyukjinkwon 
Date:   2016-05-14T09:38:52Z

Replace the usage of registerTempTable to createOrReplaceTempView




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...

2016-05-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12969


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15197][Docs] Added Scaladoc for countAp...

2016-05-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12955


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...

2016-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12969#issuecomment-219208737
  
Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15197][Docs] Added Scaladoc for countAp...

2016-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-219208684
  
Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15263][Core] Make shuffle service dir c...

2016-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/13042#issuecomment-219208629
  
This seems OK to me. There is actually another delete-recursively method in 
TestShuffleDataContext in network-shuffle which should be able to use this 
method rather than define it again.

It seems like this could be usefully implemented in the main 
`Utils.deleteRecursively` as well, rather than have two differing 
implementations. I'm trying to figure out whether that's a win or poses any 
risks; it probably speeds up some big deletes but does mean spawning a process, 
in many cases at JVM shutdown.

If anyone's supportive of that we could try it here, but it's not essential


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15318][ML][Example]:spark.ml Collaborat...

2016-05-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13110#discussion_r63273137
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala ---
@@ -28,7 +28,7 @@ object ALSExample {
 
   // $example on$
   case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: 
Long)
-  object Rating {
+  object RatingUtil {
--- End diff --

Is this object even needed? there's no reason this couldn't just be defined 
in main?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219208034
  
Rather than change this in just a couple places, can you update all 
internal usages of the old accumulator API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15323] Fix reading of partitioned forma...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13104#issuecomment-219207979
  
Let me cc @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13112#discussion_r63272945
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/stopwatches.scala 
---
@@ -19,7 +19,8 @@ package org.apache.spark.ml.util
 
 import scala.collection.mutable
 
-import org.apache.spark.{Accumulator, SparkContext}
+import org.apache.spark.{SparkContext}
+import org.apache.spark.util.LongAccumulator;
--- End diff --

(The imports might have to be cleaned up as below:)

```scala
import org.apache.spark.SparkContext
import org.apache.spark.util.LongAccumulator
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219207501
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58603/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219207485
  
**[Test build #58603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58603/consoleFull)**
 for PR 13112 at commit 
[`2761dff`](https://github.com/apache/spark/commit/2761dff513eb2da87464735722807e3ea0ea7676).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219207500
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix reading of partitioned format=text dataset...

2016-05-14 Thread jurriaan

Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13104#issuecomment-219207482
  
I'll create a JIRA just to be sure, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix reading of partitioned format=text dataset...

2016-05-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13104#issuecomment-219207342
  
Oh, I just meant it changes codes to support partitioned table for text 
data source which seems  disabled in Spark 2.0. It seems the guide says it does 
not a JIRA only if it works as the same regardless of a PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix reading of partitioned format=text dataset...

2016-05-14 Thread jurriaan

Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13104#issuecomment-219207031
  
@HyukjinKwon It's related to 
https://issues.apache.org/jira/browse/SPARK-14463. Or should I create a new 
JIRA?

And how is this changing existing behaviour? It was working perfectly fine 
in Spark 1.6.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-14 Thread JeremyNixon

Github user JeremyNixon commented on the pull request:

https://github.com/apache/spark/pull/13000#issuecomment-219206838
  
As SparkR grows at some point it will make sense to split the docs into 
different files to separate out different parts of the library - do you think 
that it's worth splitting off the SQL/core examples from the machine learning 
examples at this point?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-14 Thread JeremyNixon

Github user JeremyNixon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13000#discussion_r63272747
  
--- Diff: examples/src/main/r/ml.R ---
@@ -25,30 +25,102 @@ library(SparkR)
 sc <- sparkR.init(appName="SparkR-ML-example")
 sqlContext <- sparkRSQL.init(sc)
 
-# Train GLM of family 'gaussian'
+ spark.glm and glm 
##
+
+# Fit a generalized linear model with spark.glm
 training1 <- suppressWarnings(createDataFrame(sqlContext, iris))
 test1 <- training1
-model1 <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = 
"gaussian")
+model1 <- spark.glm(training1, Sepal_Length ~ Sepal_Width + Species, 
family = "gaussian")
 
 # Model summary
 summary(model1)
 
 # Prediction
 predictions1 <- predict(model1, test1)
-head(select(predictions1, "Sepal_Length", "prediction"))
+showDF(predictions1)
+
+# Fit a generalized linear model with glm (R-compliant)
+sameModel <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = 
"gaussian")
+summary(sameModel)
+
+ spark.survreg 
##
+
+# Use the ovarian dataset available in R survival package
+library(survival)
 
-# Train GLM of family 'binomial'
-training2 <- filter(training1, training1$Species != "setosa")
+# Fit an accelerated failure time (AFT) survival regression model with 
spark.survreg
+training2 <- suppressWarnings(createDataFrame(sqlContext, ovarian))
 test2 <- training2
-model2 <- glm(Species ~ Sepal_Length + Sepal_Width, data = training2, 
family = "binomial")
--- End diff --

It may be worth keeping in the classification example for glm - users who 
come to the docs to see what's possible and who aren't familiar with link 
functions or don't assume that a binomial link function exists may not realize 
that it's possible to do classification with the algorithm. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-14 Thread JeremyNixon

Github user JeremyNixon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13000#discussion_r63272738
  
--- Diff: examples/src/main/r/ml.R ---
@@ -25,30 +25,102 @@ library(SparkR)
 sc <- sparkR.init(appName="SparkR-ML-example")
 sqlContext <- sparkRSQL.init(sc)
 
-# Train GLM of family 'gaussian'
+ spark.glm and glm 
##
+
+# Fit a generalized linear model with spark.glm
 training1 <- suppressWarnings(createDataFrame(sqlContext, iris))
 test1 <- training1
-model1 <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = 
"gaussian")
+model1 <- spark.glm(training1, Sepal_Length ~ Sepal_Width + Species, 
family = "gaussian")
 
 # Model summary
 summary(model1)
--- End diff --

For user readability, it would be great if the models were given names that 
aligned with their algorithm - something like glmModel, naiveBayesModel, that 
makes it clear which model corresponds to which algorithm. For the predictions 
the same change may be helpful for knowing at a glance which variables 
correspond to their outputs. In the MLlib docs the examples are cleanly 
separated from one another so that there's no ambiguity, but as these are in a 
large contiguous file it may make sense to disambiguate things. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219206065
  
**[Test build #58603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58603/consoleFull)**
 for PR 13112 at commit 
[`2761dff`](https://github.com/apache/spark/commit/2761dff513eb2da87464735722807e3ea0ea7676).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread MLnick

Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219206010
  
Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219204283
  
This looks good. ping @mengxr @jkbradley @MLnick Could you help to add 
@WeichenXu123 to whitelist? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15320] [SQL] Spark-SQL Cli Ignores Para...

2016-05-14 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13111#issuecomment-219204110
  
In the PR https://github.com/apache/spark/pull/12812, we mention we will 
not use `hive.metastore.warehouse.dir` to set the location. Thus, I think we 
should issue an exception if users try to set it in the CLI parameter? or just 
issue a warning LOG message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13112#issuecomment-219204019
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/13112

[SPARK-15322][mllib]update deprecate accumulator usage into accumulatorV2 
in mllib

## What changes were proposed in this pull request?

MLlib code has two position use sc.accumulator method and it is deprecate, 
update it.
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala line 282
mllib/src/main/scala/org/apache/spark/ml/util/stopwatches.scala line 106

## How was this patch tested?

rerun build and test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark update_accuV2_in_mllib

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13112


commit 2761dff513eb2da87464735722807e3ea0ea7676
Author: WeichenXu 
Date:   2016-05-14T06:10:34Z

update deprecate accumulator usage into accumulatorV2 in mllib




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12754#issuecomment-219203483
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58602/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

2016-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12754#issuecomment-219203482
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12754#issuecomment-219203467
  
**[Test build #58602 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58602/consoleFull)**
 for PR 12754 at commit 
[`8b9f33a`](https://github.com/apache/spark/commit/8b9f33a0a5991959743e29e9f61175a20ce14a87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15320] [SQL] Spark-SQL Cli Ignores Para...

2016-05-14 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13111#issuecomment-219203243
  
@yhuai @andrewor14 @rxin @liancheng 
**Question**: This PR is to set `spark.sql.warehouse.dir` by using the 
user-specified value of `hive.metastore.warehouse.dir` in CLI command line. 

Another option is to issue an exception and force users to use 
`spark.sql.warehouse.dir`. 

Let me know which one is preferrable. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

2016-05-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12754#issuecomment-219203197
  
**[Test build #58602 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58602/consoleFull)**
 for PR 12754 at commit 
[`8b9f33a`](https://github.com/apache/spark/commit/8b9f33a0a5991959743e29e9f61175a20ce14a87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

90 matches

Mail list logo