date:20160809

[GitHub] spark pull request #14559: [SPARK-16968]Add additional options in jdbc when ...

2016-08-09 Thread GraceH

Github user GraceH commented on a diff in the pull request:

https://github.com/apache/spark/pull/14559#discussion_r74027475
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -447,7 +447,16 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   // Create the table if the table didn't exist.
   if (!tableExists) {
 val schema = JdbcUtils.schemaString(df, url)
-val sql = s"CREATE TABLE $table ($schema)"
+// To allow certain options to append when create a new table, 
which can be
+// table_options or partition_options.
+// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
+val createtblOptions = {
+  extraOptions.get("jdbc.create.table.options") match {
--- End diff --

Thanks Sean. Actually, here I have a little bit hesitation. For example, 
"mergeSchema" which may not be so that similar to the other option name 
(prefixed with "spark"). 
```
val mergedDF = spark.read.option("mergeSchema", 
"true").parquet("data/test_table")
```

How about to use some short name as "createTableOptions"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14561
  
**[Test build #63435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63435/consoleFull)**
 for PR 14561 at commit 
[`def6954`](https://github.com/apache/spark/commit/def695421948db1efd0418625243ed645d0958fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14561
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14559: [SPARK-16968]Add additional options in jdbc when ...

2016-08-09 Thread GraceH

Github user GraceH commented on a diff in the pull request:

https://github.com/apache/spark/pull/14559#discussion_r74026903
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -447,7 +447,16 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   // Create the table if the table didn't exist.
   if (!tableExists) {
 val schema = JdbcUtils.schemaString(df, url)
-val sql = s"CREATE TABLE $table ($schema)"
+// To allow certain options to append when create a new table, 
which can be
+// table_options or partition_options.
+// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
+val createtblOptions = {
+  extraOptions.get("jdbc.create.table.options") match {
+case Some(value) => " " + value
+case None => ""
+  }
+}
+val sql = s"CREATE TABLE $table ($schema)" + createtblOptions
--- End diff --

Yes. so right. will fix that, which looks better as a whole part. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14557
  
**[Test build #63434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63434/consoleFull)**
 for PR 14557 at commit 
[`9ea08e8`](https://github.com/apache/spark/commit/9ea08e8680a544fc051574efcecff00270f2d2d6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14561: [SPARK-16972][CORE] Move DriverEndpoint out of CoarseGra...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14561
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14561: SPARK-16972: Move DriverEndpoint out of CoarseGra...

2016-08-09 Thread lshmouse

GitHub user lshmouse opened a pull request:

https://github.com/apache/spark/pull/14561

SPARK-16972: Move DriverEndpoint out of CoarseGrainedSchedulerBackend

## What changes were proposed in this pull request?
Move DriverEndpoint out of CoarseGrainedSchedulerBackend and make the two 
classes clean.

## How was this patch tested?
Pass the unit tests in local.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lshmouse/spark DriverEndpoint

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14561


commit def695421948db1efd0418625243ed645d0958fa
Author: Liu Shaohui 
Date:   2016-08-09T09:25:41Z

SPARK-16972: Move DriverEndpoint out of CoarseGrainedSchedulerBackend




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14552: [SPARK-16952] don't lookup spark home directory when exe...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14552
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63426/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14552: [SPARK-16952] don't lookup spark home directory when exe...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14552
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14552: [SPARK-16952] don't lookup spark home directory when exe...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14552
  
**[Test build #63426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63426/consoleFull)**
 for PR 14552 at commit 
[`a19cec7`](https://github.com/apache/spark/commit/a19cec746c3314fa12844adcd04eeb9fb900cd46).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63424/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14557
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14557
  
**[Test build #63424 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63424/consoleFull)**
 for PR 14557 at commit 
[`9263678`](https://github.com/apache/spark/commit/926367815a262c89a24f86fd735348f493e64881).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14558
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14558
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63423/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14558
  
**[Test build #63423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63423/consoleFull)**
 for PR 14558 at commit 
[`82e2f09`](https://github.com/apache/spark/commit/82e2f09517e9f3d726af0046d251748f892f59c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74024410
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ---
@@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager()
   val handleToOperation = ReflectionUtils
 .getSuperField[JMap[OperationHandle, Operation]](this, 
"handleToOperation")
 
-  val sessionToActivePool = Map[SessionHandle, String]()
-  val sessionToContexts = Map[SessionHandle, SQLContext]()
+  val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]()
--- End diff --

It's only `private[thriftserver]`. It's minor, and a whole lot of stuff in 
Spark that should be `private` isn't, but I wondered if it was worth it here 
because you're concerned with synchronizing access to this object and therefore 
possibly concerned with what is accessing it. The usages you changed look like 
they're sufficiently protected, but are there others BTW? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13988
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63425/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13988
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13988
  
**[Test build #63425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63425/consoleFull)**
 for PR 13988 at commit 
[`a634435`](https://github.com/apache/spark/commit/a63443505483287fa9bb20312a24b38e75f90588).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14534: [SPARK-16941]Use concurrentHashMap instead of scala Map ...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14534
  
**[Test build #63433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63433/consoleFull)**
 for PR 14534 at commit 
[`0a436a0`](https://github.com/apache/spark/commit/0a436a0a911151e0cc823a81974473f89e8bb966).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14560: [SPARK-16971][SQL] Strip trailing zeros for decimal's st...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14560
  
**[Test build #63432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63432/consoleFull)**
 for PR 14560 at commit 
[`d11ce1f`](https://github.com/apache/spark/commit/d11ce1f8554f5028e7a64b3b6abe5e1b6a290529).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread SaintBacchus

Github user SaintBacchus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74023262
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ---
@@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager()
   val handleToOperation = ReflectionUtils
 .getSuperField[JMap[OperationHandle, Operation]](this, 
"handleToOperation")
 
-  val sessionToActivePool = Map[SessionHandle, String]()
-  val sessionToContexts = Map[SessionHandle, SQLContext]()
+  val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]()
--- End diff --

the whole class is private, it this necessary to make flied to be private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14560: [SPARK-16971][SQL] Strip trailing zeros for decimals whe...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14560
  
**[Test build #63431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63431/consoleFull)**
 for PR 14560 at commit 
[`b8a5267`](https://github.com/apache/spark/commit/b8a5267d495f4a9bf882c82b730c660858e1eebf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14560: [SPARK-16971][SQL] Strip trailing zeros for decim...

2016-08-09 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14560

[SPARK-16971][SQL] Strip trailing zeros for decimals when using show() API 
in Dataset

## What changes were proposed in this pull request?

Currently, `Dataset.show()` prints all the trailing zeros for decimals. For 
example,

```
spark.range(11).toDF("a").select('a.cast(DecimalType(30, 20))).show()
```

prints below:

```bash
++
|   a|
++
|   0E-20|
|1.000...|
|2.000...|
|3.000...|
|4.000...|
|5.000...|
|6.000...|
|7.000...|
|8.000...|
|9.000...|
|10.00...|
++
```

It might be confusing, in particular, for `0E-20`. Also, I think we can 
strip the trailing zeros.

This PR fixes this as below:

```bash
+---+
|  a|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
| 10|
+---+
```

## How was this patch tested?

Unit test in `DataFrameSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16971

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14560


commit b8a5267d495f4a9bf882c82b730c660858e1eebf
Author: hyukjinkwon 
Date:   2016-08-09T08:58:13Z

Strip trailing zeros for decimals when using show() API in Dataset




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74022257
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
+
+  private def validate(): Unit = {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
--- End diff --

Yes, I do see test for 0 length vector. 

https://github.com/apache/spark/blob/master/mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala#L81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74022313
  
--- Diff: R/pkg/R/functions.R ---
@@ -1273,12 +1267,14 @@ setMethod("round",
 #' bround
 #'
 #' Returns the value of the column `e` rounded to `scale` decimal places 
using HALF_EVEN rounding
-#' mode if `scale` >= 0 or at integral part when `scale` < 0.
+#' mode if `scale` >= 0 or at integer part when `scale` < 0.
 #' Also known as Gaussian rounding or bankers' rounding that rounds to the 
nearest even number.
 #' bround(2.5, 0) = 2, bround(3.5, 0) = 4.
 #'
 #' @param x Column to compute on.
-#'
+#' @param scale round to `scale` digits to the right of the decimal point 
when `scale` > 0,
--- End diff --

it seems this is duplicating L1270 and might seem confusing since they seem 
to different behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13868: [SPARK-15899] [SQL] Fix the construction of the file pat...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/13868
  
OK will merge soonish if there are no further comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13868: [SPARK-15899] [SQL] Fix the construction of the file pat...

2016-08-09 Thread avulanov

Github user avulanov commented on the issue:

https://github.com/apache/spark/pull/13868
  
@srowen Sure. I've addressed @vanzin 's comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13868: [SPARK-15899] [SQL] Fix the construction of the file pat...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13868
  
**[Test build #63430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63430/consoleFull)**
 for PR 13868 at commit 
[`ea24b59`](https://github.com/apache/spark/commit/ea24b59fe83c37dbab27579141b5c63cccee138d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74021856
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

I also thought about `{...}`,  just feel putting into one method is better. 
Anyway I can do that way if this is not proper for spark code style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-09 Thread shenh062326

Github user shenh062326 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14557#discussion_r74021599
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1564,6 +1564,14 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
 }
   }
 
+  def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): 
Boolean = {
--- End diff --

Jerryshao, Thanks for your prompt. I will move the method to TaskSetManager.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74021332
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
+
+  private def validate(): Unit = {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
--- End diff --

This allows a size 0 vector now. I guess that's good, because `DenseVector` 
allows this (a 0 length array).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74021040
  
--- Diff: R/pkg/R/functions.R ---
@@ -1560,7 +1556,8 @@ setMethod("stddev_samp",
 #'
 #' Creates a new struct column that composes multiple input columns.
 #'
-#' @param x Column to compute on.
+#' @param x a column to compute on.
+#' @param ... additional column(s) to be included.
--- End diff --

optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74021092
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

They wouldn't become fields unless used outside the constructor. You can 
also use a simple scope `{...}` to guard against this. I understand the 
argument and don't feel strongly either way, but we don't do this in other code 
in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74020775
  
--- Diff: R/pkg/R/functions.R ---
@@ -2654,6 +2647,9 @@ setMethod("expr", signature(x = "character"),
 #'
 #' Formats the arguments in printf-style and returns the result as a 
string column.
 #'
+#' @param format a character object of format strings.
+#' @param x a Column object.
+#' @param ... additional columns.
--- End diff --

Let's keep type in capital case? `Column` or `Columns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74020655
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ---
@@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager()
   val handleToOperation = ReflectionUtils
 .getSuperField[JMap[OperationHandle, Operation]](this, 
"handleToOperation")
 
-  val sessionToActivePool = Map[SessionHandle, String]()
-  val sessionToContexts = Map[SessionHandle, SQLContext]()
+  val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]()
+  val sessionToContexts = new ConcurrentHashMap[SessionHandle, 
SQLContext]()
 
   override def newExecuteStatementOperation(
   parentSession: HiveSession,
   statement: String,
   confOverlay: JMap[String, String],
   async: Boolean): ExecuteStatementOperation = synchronized {
-val sqlContext = sessionToContexts(parentSession.getSessionHandle)
+val sqlContext = sessionToContexts.get(parentSession.getSessionHandle)
+if (null == sqlContext) {
--- End diff --

Does this have to be `HiveSQLException`? I'd just use `require` to generate 
an `IllegalArgumentException`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74020589
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ---
@@ -39,15 +38,19 @@ private[thriftserver] class SparkSQLOperationManager()
   val handleToOperation = ReflectionUtils
 .getSuperField[JMap[OperationHandle, Operation]](this, 
"handleToOperation")
 
-  val sessionToActivePool = Map[SessionHandle, String]()
-  val sessionToContexts = Map[SessionHandle, SQLContext]()
+  val sessionToActivePool = new ConcurrentHashMap[SessionHandle, String]()
--- End diff --

While we're here, make them `private` for a bit more future-proofing of 
access to these


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14525: [SPARK-16324] [SQL] regexp_extract should doc that it re...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14525
  
**[Test build #63429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63429/consoleFull)**
 for PR 14525 at commit 
[`a48be1f`](https://github.com/apache/spark/commit/a48be1f2955c6bd73ebdf3b03fcdadd8eb347278).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14528: [SPARK-16940][SQL] `checkAnswer` should raise `Te...

2016-08-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14528


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14528: [SPARK-16940][SQL] `checkAnswer` should raise `TestFaile...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14528
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14534: [SPARK-16941]Use concurrentHashMap instead of sca...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14534#discussion_r74020482
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 ---
@@ -206,15 +206,16 @@ private[hive] class SparkExecuteStatementOperation(
   statementId,
   parentSession.getUsername)
 sqlContext.sparkContext.setJobGroup(statementId, statement)
-sessionToActivePool.get(parentSession.getSessionHandle).foreach { pool 
=>
+val pool = sessionToActivePool.get(parentSession.getSessionHandle)
+if(null != pool) {
--- End diff --

`if (pool != null)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14540: [SPARK-16950] [PySpark] fromOffsets parameter support in...

2016-08-09 Thread szczeles

Github user szczeles commented on the issue:

https://github.com/apache/spark/pull/14540
  
@holdenk I've checked setSeed methods in MLlib and it seems py4j handles 
them well. If function gets simple arguments (strings, numerric, bool), py4j 
applies conversion between types (see 
https://github.com/bartdag/py4j/blob/master/py4j-java/src/main/java/py4j/reflection/MethodInvoker.java#L99).
 

For setSeed(Long), if argument is mapped to Integer, it goes through 
toString and Long.parseLong (see 
https://github.com/bartdag/py4j/blob/master/py4j-java/src/main/java/py4j/reflection/TypeConverter.java#L88)

Apparently, this conversion does not work for complex types like 
fromOffsets map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14175
  
Merged to master, but it doesn't pick cleanly into 2.0, and the conflict in 
the tests wasn't entirely trivial. You can open another PR if it's important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-08-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14175


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14533: [SPARK-16606] [CORE] Misleading warning for Spark...

2016-08-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14533


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14533: [SPARK-16606] [CORE] Misleading warning for SparkContext...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14533
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14539: [SPARK-16947][SQL] Improve type coercion for inli...

2016-08-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14539#discussion_r74018815
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1192,8 +1192,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   }
 
   test("SPARK-10740: handle nondeterministic expressions correctly for set 
operations") {
-val df1 = (1 to 20).map(Tuple1.apply).toDF("i")
-val df2 = (1 to 10).map(Tuple1.apply).toDF("i")
+val df1 = spark.range(1, 20).select('id.cast("int").as("i"))
+val df2 = spark.range(1, 10).select('id.cast("int").as("i"))
--- End diff --

This syntax `(1 to 20).map(Tuple1.apply).toDF("i")` produces a 
`LocalRelation`. These `LocalRelation`s are subsequently `Union`'ed. The new 
optimizer rules reduces this Union into a single LocalRelation. Which fails 
this test, because the new approach results in an already evaluated 
`LocalRelation` (using a different seed for the RNG), instead of a `Union` with 
two separate partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13868: [SPARK-15899] [SQL] Fix the construction of the file pat...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/13868
  
@avulanov can you have one more look at Marcelo's last small comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74017865
  
--- Diff: R/pkg/R/functions.R ---
@@ -3033,6 +3033,9 @@ setMethod("when", signature(condition = "Column", 
value = "ANY"),
 #' Evaluates a list of conditions and returns \code{yes} if the conditions 
are satisfied.
 #' Otherwise \code{no} is returned for unmatched conditions.
 #'
+#' @param test a Column expression that describes the condition.
+#' @param yes return values for true elements of test.
--- End diff --

true -> TRUE
false -> FALSE?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14491
  
@ganeshchand could you address his last comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-09 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14557#discussion_r74017373
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1564,6 +1564,14 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
 }
   }
 
+  def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): 
Boolean = {
--- End diff --

It is not suitable to add a public method here in `SparkContext`, 
`SparkContext` is a public entry point, any method adds to here should be 
considered carefully.  In your case looks like only Spark internally will use 
this method, why not directly change the `TaskSetManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74017287
  
--- Diff: R/pkg/R/generics.R ---
@@ -1022,6 +1059,7 @@ setGeneric("month", function(x) { 
standardGeneric("month") })
 #' @export
 setGeneric("months_between", function(y, x) { 
standardGeneric("months_between") })
 
+#' @param x a SparkDataFrame or a Column object.
 #' @rdname nrow
--- End diff --

here for "n" column function, it shouldn't be under rdname nrow, which is 
for count for DataFrame - I'd change this to a new rdname, `count` and put 
"count" and "n" under that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14556: [SPARK-16966][Core] Make App Name to the valid name inst...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14556
  
Just so it's not missed, I have a slightly different proposal in the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016459
  
--- Diff: R/pkg/R/generics.R ---
@@ -1091,8 +1129,8 @@ setGeneric("reverse", function(x) { 
standardGeneric("reverse") })
 #' @export
 setGeneric("rint", function(x, ...) { standardGeneric("rint") })
 
-#' @rdname row_number
-#' @export
+# @rdname row_number
+# @export
--- End diff --

`#'` changed to `#`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016421
  
--- Diff: R/pkg/R/generics.R ---
@@ -1046,8 +1084,8 @@ setGeneric("ntile", function(x) { 
standardGeneric("ntile") })
 #' @export
 setGeneric("n_distinct", function(x, ...) { standardGeneric("n_distinct") 
})
 
-#' @rdname percent_rank
-#' @export
+# @rdname percent_rank
--- End diff --

`#'` changed to `#`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016338
  
--- Diff: R/pkg/R/mllib.R ---
@@ -57,6 +57,9 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #'
 #' Saves the MLlib model to the input path. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param path the directory where the model is saved.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

does it complain about this? This rd does not have a function signature so 
it shouldn't ask to document parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016268
  
--- Diff: R/pkg/R/mllib.R ---
@@ -69,6 +72,8 @@ NULL
 #'
 #' Makes predictions from a MLlib model. For more information, see the 
specific
 #' MLlib model below.
+#' @param object a fitted ML model object.
+#' @param ... additional argument(s) passed to the method.
--- End diff --

does it complain about this? This rd does not have a function signature so 
it shouldn't ask to document parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74016085
  
--- Diff: R/pkg/R/mllib.R ---
@@ -82,15 +87,16 @@ NULL
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training.
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param family A description of the error distribution and link function 
to be used in the model.
+#' @param family a description of the error distribution and link function 
to be used in the model.
 #'   This can be a character string naming a family function, 
a family function or
 #'   the result of a call to a family function. Refer R family 
at
 #'   
\url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
-#' @param tol Positive convergence tolerance of iterations.
-#' @param maxIter Integer giving the maximal number of IRLS iterations.
+#' @param tol positive convergence tolerance of iterations.
+#' @param maxIter integer giving the maximal number of IRLS iterations.
+#' @param ... additional arguments passed to the method.
--- End diff --

there is no `...` here in the signature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14180: Wheelhouse and VirtualEnv support

2016-08-09 Thread Stibbons

Github user Stibbons commented on the issue:

https://github.com/apache/spark/pull/14180
  
Yes I am back from vacation! Can work on it now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74015938
  
--- Diff: R/pkg/R/mllib.R ---
@@ -298,14 +304,15 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
 #' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #'
-#' @param data SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', '.', ':', '+', 
and '-'.
 #'Note that the response variable of formula is empty in 
spark.kmeans.
-#' @param k Number of centers
-#' @param maxIter Maximum iteration number
-#' @param initMode The initialization algorithm choosen to fit the model
-#' @return \code{spark.kmeans} returns a fitted k-means model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

there is no `...` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14556: [SPARK-16966][Core] Make App Name to the valid name inst...

2016-08-09 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/14556
  
Would you please add a unit test to verify the changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74015800
  
--- Diff: R/pkg/R/mllib.R ---
@@ -346,8 +353,11 @@ setMethod("spark.kmeans", signature(data = 
"SparkDataFrame", formula = "formula"
 #' Get fitted result from a k-means model, similarly to R's fitted().
 #' Note: A saved-loaded model does not support this method.
 #'
-#' @param object A fitted k-means model
-#' @return \code{fitted} returns a SparkDataFrame containing fitted values
+#' @param object a fitted k-means model.
+#' @param method type of fitted results, `"centers"` for cluster centers
--- End diff --

I wouldn't put it in ` and " - roxygen2 doesn't really handle `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14559: [SPARK-16968]Add additional options in jdbc when ...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14559#discussion_r74015718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -447,7 +447,16 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   // Create the table if the table didn't exist.
   if (!tableExists) {
 val schema = JdbcUtils.schemaString(df, url)
-val sql = s"CREATE TABLE $table ($schema)"
+// To allow certain options to append when create a new table, 
which can be
+// table_options or partition_options.
+// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
+val createtblOptions = {
+  extraOptions.get("jdbc.create.table.options") match {
+case Some(value) => " " + value
+case None => ""
+  }
+}
+val sql = s"CREATE TABLE $table ($schema)" + createtblOptions
--- End diff --

Why not also use interpolation for the new var?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14559: [SPARK-16968]Add additional options in jdbc when ...

2016-08-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14559#discussion_r74015696
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -447,7 +447,16 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   // Create the table if the table didn't exist.
   if (!tableExists) {
 val schema = JdbcUtils.schemaString(df, url)
-val sql = s"CREATE TABLE $table ($schema)"
+// To allow certain options to append when create a new table, 
which can be
+// table_options or partition_options.
+// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
+val createtblOptions = {
+  extraOptions.get("jdbc.create.table.options") match {
--- End diff --

Probably need a different prop name starting with spark. See other option 
naming conventions. The outer scope isn't necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14539: [SPARK-16947][SQL] Improve type coercion for inli...

2016-08-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14539#discussion_r74015279
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -756,16 +756,20 @@ case class Repartition(numPartitions: Int, shuffle: 
Boolean, child: LogicalPlan)
 /**
  * A relation with one row. This is used in "SELECT ..." without a from 
clause.
  */
-case object OneRowRelation extends LeafNode {
+abstract class AbstractOneRowRelation extends LeafNode {
--- End diff --

Yeah that is fair


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost [WIP]

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14547
  
**[Test build #63428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63428/consoleFull)**
 for PR 14547 at commit 
[`b4e5e6c`](https://github.com/apache/spark/commit/b4e5e6cc6a48ba5160c9aa8a0e03800f193b561e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74014809
  
--- Diff: R/pkg/R/mllib.R ---
@@ -563,11 +574,12 @@ read.ml <- function(path) {
 #' \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #'
-#' @param data A SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', ':', '+', and 
'-'.
 #'Note that operator '.' is not supported currently
-#' @return \code{spark.survreg} returns a fitted AFT survival regression 
model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

or document as `Currently not used.` like 
http://ugrad.stat.ubc.ca/R/library/e1071/html/predict.naiveBayes.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74014676
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,11 +425,12 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 #' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
 #' Only categorical data is supported.
 #'
-#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a \code{SparkDataFrame} of observations and labels for 
model fitting.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'   operators are supported, including '~', '.', ':', '+', 
and '-'.
-#' @param smoothing Smoothing parameter
-#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model
+#' @param smoothing smoothing parameter.
+#' @param ... additional parameter(s) passed to the method.
--- End diff --

same here - `...` are unused


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74014544
  
--- Diff: R/pkg/R/mllib.R ---
@@ -563,11 +574,12 @@ read.ml <- function(path) {
 #' \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #'
-#' @param data A SparkDataFrame for training
-#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
 #'operators are supported, including '~', ':', '+', and 
'-'.
 #'Note that operator '.' is not supported currently
-#' @return \code{spark.survreg} returns a fitted AFT survival regression 
model
+#' @param ... additional argument(s) passed to the method.
--- End diff --

there are a few cases where they are not clear why `...` should be in the 
function signature. I think we should remove them since they are not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14559
  
**[Test build #63427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63427/consoleFull)**
 for PR 14559 at commit 
[`b302b1c`](https://github.com/apache/spark/commit/b302b1c7ec75ae1e78d132f7ecdb9bb7f33816d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13146
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63420/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13146
  
**[Test build #63420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63420/consoleFull)**
 for PR 13146 at commit 
[`8119f6d`](https://github.com/apache/spark/commit/8119f6ded867a8b2e0b212f3247f52278b9e8c28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74013650
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -328,6 +328,7 @@ sparkRHive.init <- function(jsc = NULL) {
 #' @param sparkPackages Character vector of packages from 
spark-packages.org
 #' @param enableHiveSupport Enable support for Hive, fallback if not built 
with Hive support; once
 #'set, this cannot be turned off on an existing session
+#' @param ... additional parameters passed to the method
--- End diff --

I'd clarify as in L 317, for example, "named Spark properties passed to the 
method"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14559: [SPARK-16968]Add additional options in jdbc when ...

2016-08-09 Thread GraceH

GitHub user GraceH opened a pull request:

https://github.com/apache/spark/pull/14559

[SPARK-16968]Add additional options in jdbc when creating a new table

## What changes were proposed in this pull request?

In the PR, we just allow the user to add additional options when create a 
new table in JDBC writer. 
The options can be table_options or partition_options.
E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8"

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
will apply test result soon.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GraceH/spark jdbc_options

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14559


commit b302b1c7ec75ae1e78d132f7ecdb9bb7f33816d4
Author: GraceH <93113...@qq.com>
Date:   2016-08-09T06:47:51Z

Add additional options in jdbc when creating a new table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74013042
  
--- Diff: R/pkg/R/generics.R ---
@@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { 
standardGeneric("dapply") })
 #' @export
 setGeneric("dapplyCollect", function(x, func) { 
standardGeneric("dapplyCollect") })
 
+#' @param x a SparkDataFrame or GroupedData.
+#' @param ... additional argument(s) passed to the method.
 #' @rdname gapply
 #' @export
 setGeneric("gapply", function(x, ...) { standardGeneric("gapply") })
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

same here for gapplyCollect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14552: [SPARK-16952] don't lookup spark home directory when exe...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14552
  
**[Test build #63426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63426/consoleFull)**
 for PR 14552 at commit 
[`a19cec7`](https://github.com/apache/spark/commit/a19cec746c3314fa12844adcd04eeb9fb900cd46).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74013005
  
--- Diff: R/pkg/R/generics.R ---
@@ -465,10 +477,14 @@ setGeneric("dapply", function(x, func, schema) { 
standardGeneric("dapply") })
 #' @export
 setGeneric("dapplyCollect", function(x, func) { 
standardGeneric("dapplyCollect") })
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

gapply is only for GroupedData?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...

2016-08-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14558#discussion_r74012847
  
--- Diff: R/pkg/R/generics.R ---
@@ -395,6 +396,9 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 
   SparkDataFrame Methods 
 
+#' @param x a SparkDataFrame or GroupedData.
--- End diff --

Hmm.. I see why this would be a place for it. I think it would be easier to 
maintain if the documentation is next to the function body instead of the 
generics, but I haven't completely figure the best way to do yet.

The approach we have so far is to keep most of the tag/doc on one of the 
definition - do you think it would work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14552: [SPARK-16952] don't lookup spark home directory when exe...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14552
  
Seems OK to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14552: [SPARK-16952] don't lookup spark home directory when exe...

2016-08-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14552
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63419/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14546
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14546
  
**[Test build #63419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63419/consoleFull)**
 for PR 14546 at commit 
[`1dc193a`](https://github.com/apache/spark/commit/1dc193a15fa02359bf3e767662c7ef633464caac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13146
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63417/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13146: [SPARK-13081][PYSPARK][SPARK_SUBMIT]. Allow set pythonEx...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13146
  
**[Test build #63417 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63417/consoleFull)**
 for PR 13146 at commit 
[`3826f33`](https://github.com/apache/spark/commit/3826f3340785a4f3e1c0ad92bd0bfff32a3525c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63418/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...

2016-08-09 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14519#discussion_r74011434
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala 
---
@@ -478,21 +482,23 @@ object AFTSurvivalRegressionModel extends 
MLReadable[AFTSurvivalRegressionModel]
  *$$
  * 
  *
- * @param parameters including three part: The log of scale parameter, the 
intercept and
- *regression coefficients corresponding to the features.
+ * @param bcParameters The broadcasted value includes three part: The log 
of scale parameter,
+ * the intercept and regression coefficients 
corresponding to the features.
  * @param fitIntercept Whether to fit an intercept term.
- * @param featuresStd The standard deviation values of the features.
+ * @param bcFeaturesStd The broadcast standard deviation values of the 
features.
  */
 private class AFTAggregator(
-parameters: BDV[Double],
+bcParameters: Broadcast[BDV[Double]],
 fitIntercept: Boolean,
-featuresStd: Array[Double]) extends Serializable {
+bcFeaturesStd: Broadcast[Array[Double]]) extends Serializable {
 
+  // make transient so we do not serialize between aggregation stages
+  @transient private lazy val parameters = bcParameters.value
   // the regression coefficients to the covariates
-  private val coefficients = parameters.slice(2, parameters.length)
-  private val intercept = parameters(1)
+  @transient private lazy val coefficients = parameters.slice(2, 
parameters.length)
+  @transient private lazy val intercept = parameters(1)
   // sigma is the scale parameter of the AFT model
-  private val sigma = math.exp(parameters(0))
+  @transient private lazy val sigma = math.exp(parameters(0))
 
--- End diff --

if we using 
@transient val xxx = ... as a class member,
the complier will generate the assignment code into the class constructor.
when deserialzing it, if deserializer do not init this val, it will surely 
be null,
because deserializing will not call the constructor. 

@transient lazy val xxx =...
using another mechanism.
when using this val it will generate the value and do the val assignment,
so do not have the problem above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14546
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14546
  
**[Test build #63418 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63418/consoleFull)**
 for PR 14546 at commit 
[`1ca8d59`](https://github.com/apache/spark/commit/1ca8d59dc3f94dd491740ae89f4d8c8223b11944).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...

2016-08-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14558
  
Ah, I'm actually about half way though this as well, but let's review yours.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14517
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63422/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14517
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14517
  
**[Test build #63422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63422/consoleFull)**
 for PR 14517 at commit 
[`31c43e6`](https://github.com/apache/spark/commit/31c43e6f3d9544478142990b4968fb105d8a03d4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13988
  
**[Test build #63425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63425/consoleFull)**
 for PR 13988 at commit 
[`a634435`](https://github.com/apache/spark/commit/a63443505483287fa9bb20312a24b38e75f90588).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-08-09 Thread junyangq

Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14384#discussion_r74009550
  
--- Diff: R/pkg/R/mllib.R ---
@@ -632,3 +642,147 @@ setMethod("predict", signature(object = 
"AFTSurvivalRegressionModel"),
   function(object, newData) {
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
+
+
+#' Alternating Least Squares (ALS) for Collaborative Filtering
+#'
+#' \code{spark.als} learns latent factors in collaborative filtering via 
alternating least
+#' squares. Users can call \code{summary} to obtain fitted latent factors, 
\code{predict}
+#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
+#' Collaborative Filtering}.
+#' Additional arguments can be passed to the methods.
+#' \describe{
+#'\item{nonnegative}{logical value indicating whether to apply 
nonnegativity constraints.
+#'   Default: FALSE}
+#'\item{implicitPrefs}{logical value indicating whether to use 
implicit preference.
+#' Default: FALSE}
+#'\item{alpha}{alpha parameter in the implicit preference formulation 
(>= 0). Default: 1.0}
+#'\item{seed}{integer seed for random number generation. Default: 0}
+#'\item{numUserBlocks}{number of user blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{numItemBlocks}{number of item blocks used to parallelize 
computation (> 0).
+#' Default: 10}
+#'\item{checkpointInterval}{number of checkpoint intervals (>= 1) or 
disable checkpoint (-1).
+#'  Default: 10}
+#'}
+#'
+#' @param data A SparkDataFrame for training
+#' @param ratingCol column name for ratings
+#' @param userCol column name for user ids. Ids must be (or can be coerced 
into) integers
+#' @param itemCol column name for item ids. Ids must be (or can be coerced 
into) integers
+#' @param rank rank of the matrix factorization (> 0)
+#' @param reg regularization parameter (>= 0)
+#' @param maxIter maximum number of iterations (>= 0)
+
+#' @return \code{spark.als} returns a fitted ALS model
+#' @rdname spark.als
+#' @aliases spark.als,SparkDataFrame
+#' @name spark.als
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(ratings)
--- End diff --

Good point. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74009150
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

2 reasons
* group the validation code together
* I may define some temp variable for validation, without method it would 
become variable of SparseVector


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14557
  
**[Test build #63424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63424/consoleFull)**
 for PR 14557 at commit 
[`9263678`](https://github.com/apache/spark/commit/926367815a262c89a24f86fd735348f493e64881).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 2 3 4 5 6 7 8 >

601 - 700 of 741 matches

Mail list logo