[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16720
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72103/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16720
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16720
  
**[Test build #72103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72103/testReport)**
 for PR 16720 at commit 
[`f51f504`](https://github.com/apache/spark/commit/f51f504acb3a64da27bf0bddbb156c68d62d89bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...

2017-01-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16720
  
Sure, I've simplified it.

Good point on the ordering - digging into it looks like it's just file 
system search order, which really is not reliable.

We could certainly add a test util - though seems like some tests are 
different though, for example test_context doesn't need a SparkSession.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16720
  
**[Test build #72103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72103/testReport)**
 for PR 16720 at commit 
[`f51f504`](https://github.com/apache/spark/commit/f51f504acb3a64da27bf0bddbb156c68d62d89bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16721: [SPARKR][DOCS] update R API doc for subset/extrac...

2017-01-27 Thread felixcheung
GitHub user felixcheung reopened a pull request:

https://github.com/apache/spark/pull/16721

[SPARKR][DOCS] update R API doc for subset/extract

## What changes were proposed in this pull request?

With extract `[[` or replace `[[<-`, the parameter `i` is a column index, 
that needs to be corrected in doc. Also a few minor updates: examples, links.

## How was this patch tested?

manual


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rsubsetdoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16721


commit 2c1f67353b6049e7679947d9c6c1e9901d7e1c9f
Author: Felix Cheung 
Date:   2017-01-27T22:50:09Z

update doc

commit bff1e56af55fdbf5e216d49a5e673cee6085cc13
Author: Felix Cheung 
Date:   2017-01-27T22:58:00Z

vignettes error

commit de56852daf03de33fbc6dfa0280e1ea5f5f32cc7
Author: Felix Cheung 
Date:   2017-01-27T23:00:19Z

do not link to rename




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16721: [SPARKR][DOCS] update R API doc for subset/extrac...

2017-01-27 Thread felixcheung
Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/16721


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde Tables a...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16636
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72101/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde Tables a...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16636
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde Tables a...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16636
  
**[Test build #72101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72101/testReport)**
 for PR 16636 at commit 
[`c5cfa1a`](https://github.com/apache/spark/commit/c5cfa1afe3896ab92b34b833ba6c18cfce88b224).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16726: [SPARK-19390][SQL] Replace the unnecessary usages of hiv...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16726
  
**[Test build #72102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72102/testReport)**
 for PR 16726 at commit 
[`e9e7486`](https://github.com/apache/spark/commit/e9e748601da00e594f73b68fe31f3b80385a0bac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16724
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72100/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16724
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16724
  
**[Test build #72100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72100/testReport)**
 for PR 16724 at commit 
[`aaa3c3d`](https://github.com/apache/spark/commit/aaa3c3dd42a9e1b79d30d31790560464be6df6c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...

2017-01-27 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16726

[SPARK-19390][SQL] Replace the unnecessary usages of hiveQlTable

### What changes were proposed in this pull request?
`catalogTable` is the native table metadata structure for Spark SQL. Thus, 
we should avoid using Hive's table metadata structure `Table` in our code base. 
This PR is to replace it. 

### How was this patch tested?
The existing test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark cleanupMetastoreRelation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16726.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16726


commit e9e748601da00e594f73b68fe31f3b80385a0bac
Author: gatorsmile 
Date:   2017-01-28T06:11:12Z

replace hiveQlTable by CatalogTable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98325754
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
+  // the location path in FileSystem is changed to 'a=1/b=2', 
which is wrongPath, then
+  // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem 
is deleted, but 'a=1'
+  // is still exists, which we also need to delete
+  val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
--- End diff --

So far, the partition rename DDL we support is for a single pair of 
partition spec. That is, `ALTER TABLE table PARTITION spec1 RENAME TO PARTITION 
spec2`. This is not an issue for end users.

Thus, your concern looks reasonable, but I think we should not support the 
multi-partition renaming in the SessionCatalog and ExternalCatalog. It just 
makes the code more complex for error handling. Let me remove it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13072: [SPARK-15288] [Mesos] Mesos dispatcher should handle gra...

2017-01-27 Thread devaraj-kavali
Github user devaraj-kavali commented on the issue:

https://github.com/apache/spark/pull/13072
  
MesosClusterDispatcher also has multiple threads like Executor, when any 
one thread terminates in the MesosClusterDispatcher process due to some 
error/exception it keeps running without performing the terminated thread 
functionality. I think we need to handle those uncaught exceptions from the 
MesosClusterDispatcher process threads using the UncaughtExceptionHandler and 
take the action instead of running the MesosClusterDispatcher without 
performing the functionality and without notifying the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16725
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should ...

2017-01-27 Thread devaraj-kavali
GitHub user devaraj-kavali opened a pull request:

https://github.com/apache/spark/pull/16725

[SPARK-19377] [WEBUI] [CORE] Killed tasks should have the status as KILLED

## What changes were proposed in this pull request?

Copying of the killed status was missing while getting the newTaskInfo 
object by dropping the unnecessary details to reduce the memory usage. This 
patch adds the copying of the killed status to newTaskInfo object, this will 
correct the display of the status from wrong status to KILLED status in Web UI.

## How was this patch tested?

Current behaviour of displaying tasks in stage UI page,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | 
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle 
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
--- |
|143|10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x 
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  | 
|0.0 B / 0|TaskKilled (killed intentionally)|
|156|11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x 
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  | 
|0.0 B / 0|TaskKilled (killed intentionally)|



Web UI display after applying the patch,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | 
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle 
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
--- |
|143|10 |0  |KILLED |NODE_LOCAL |6 / x.xx.x.x stdout 
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  |  | 0.0 B / 
0  | TaskKilled (killed intentionally)|
|156|11 |0  |KILLED |NODE_LOCAL |5 / x.xx.x.x stdout 
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  |  |0.0 B / 
0   | TaskKilled (killed intentionally)|


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/devaraj-kavali/spark SPARK-19377

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16725.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16725


commit 6206d109b646e55223a4b162a37e70f42f4570a1
Author: Devaraj K 
Date:   2017-01-28T05:53:21Z

[SPARK-19377] [WEBUI] [CORE] Killed tasks should have the status as KILLED




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98325213
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
+  // the location path in FileSystem is changed to 'a=1/b=2', 
which is wrongPath, then
+  // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem 
is deleted, but 'a=1'
+  // is still exists, which we also need to delete
+  val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
--- End diff --

`client.renamePartitions` is called at the beginning of `renamePartitions` 
for all specs at once. It creates the directory `a=1` and `a=1/b=2` and 
`a=1/b=3`.

When you iterates specs and rename the directories with FileSystem.rename, 
in the first iteration, `a=1/b=2` is renamed, and `a=1` is deleted in this 
change, then `a=1/b=3` will be deleted too. So in next iteration, the renaming 
of `a=1/b=3` to `A=1/B=3` will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98325110
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
+  // the location path in FileSystem is changed to 'a=1/b=2', 
which is wrongPath, then
+  // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem 
is deleted, but 'a=1'
+  // is still exists, which we also need to delete
+  val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
--- End diff --

The path `a=1` was created when you call `client.renamePartitions`, right? 
Based on my understanding, when you rename `A=1/B=3`, Hive will create the 
directory `a=1` and `a=1/b=3`. Thus, the rename will not fail. Have you made a 
try?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72099/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72099/testReport)**
 for PR 16722 at commit 
[`2112720`](https://github.com/apache/spark/commit/21127206db1c42710e63174904267663c9d92790).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde T...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16636#discussion_r98324289
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -455,4 +462,133 @@ private[spark] object HiveUtils extends Logging {
 case (decimal, DecimalType()) => decimal.toString
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
+
+  /** Converts the native StructField to Hive's FieldSchema. */
+  private def toHiveColumn(c: StructField): FieldSchema = {
+val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
+  c.metadata.getString(HiveUtils.hiveTypeString)
+} else {
+  c.dataType.catalogString
+}
+new FieldSchema(c.name, typeString, c.getComment.orNull)
+  }
+
+  /** Builds the native StructField from Hive's FieldSchema. */
+  private def fromHiveColumn(hc: FieldSchema): StructField = {
+val columnType = try {
+  CatalystSqlParser.parseDataType(hc.getType)
+} catch {
+  case e: ParseException =>
+throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
+}
+
+val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
+val field = StructField(
+  name = hc.getName,
+  dataType = columnType,
+  nullable = true,
+  metadata = metadata)
+Option(hc.getComment).map(field.withComment).getOrElse(field)
+  }
+
+  // TODO: merge this with HiveClientImpl#toHiveTable
--- End diff --

So far, it is a little bit tricky when merging them, because our execution 
is using 1.2.1, but Hive metadata APIs support the versions from 0.12 to 1.2. 
Thus, it does not make sense to do it. 

So far, the schema inference is not using metadata Hive client. I checked 
the code. The changes between 0.12 and 1.2 look fine to me. Schema inference 
should work correctly. I think I need to add a test case to VersionSuite.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16724
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16724
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72097/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16724
  
**[Test build #72097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72097/testReport)**
 for PR 16724 at commit 
[`b84c08b`](https://github.com/apache/spark/commit/b84c08bd6f1d66e09cafa9026b7da48b3f67ece4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16636#discussion_r98324106
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -455,4 +462,133 @@ private[spark] object HiveUtils extends Logging {
 case (decimal, DecimalType()) => decimal.toString
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
+
+  /** Converts the native StructField to Hive's FieldSchema. */
+  private def toHiveColumn(c: StructField): FieldSchema = {
+val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
+  c.metadata.getString(HiveUtils.hiveTypeString)
+} else {
+  c.dataType.catalogString
+}
+new FieldSchema(c.name, typeString, c.getComment.orNull)
+  }
+
+  /** Builds the native StructField from Hive's FieldSchema. */
+  private def fromHiveColumn(hc: FieldSchema): StructField = {
+val columnType = try {
+  CatalystSqlParser.parseDataType(hc.getType)
+} catch {
+  case e: ParseException =>
+throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
+}
+
+val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
+val field = StructField(
+  name = hc.getName,
+  dataType = columnType,
+  nullable = true,
+  metadata = metadata)
+Option(hc.getComment).map(field.withComment).getOrElse(field)
+  }
+
+  // TODO: merge this with HiveClientImpl#toHiveTable
+  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
+  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
+// We start by constructing an API table as Hive performs several 
important transformations
+// internally when converting an API table to a QL table.
+val tTable = new org.apache.hadoop.hive.metastore.api.Table()
+tTable.setTableName(catalogTable.identifier.table)
+tTable.setDbName(catalogTable.database)
+
+val tableParameters = new java.util.HashMap[String, String]()
+tTable.setParameters(tableParameters)
+catalogTable.properties.foreach { case (k, v) => 
tableParameters.put(k, v) }
+
+tTable.setTableType(catalogTable.tableType match {
+  case CatalogTableType.EXTERNAL => 
HiveTableType.EXTERNAL_TABLE.toString
+  case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString
+  case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString
+})
+
+val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
+tTable.setSd(sd)
+
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = 
catalogTable.schema.map(toHiveColumn).partition { c =>
+  catalogTable.partitionColumnNames.contains(c.getName)
+}
+sd.setCols(schema.asJava)
+tTable.setPartitionKeys(partCols.asJava)
+
+catalogTable.storage.locationUri.foreach(sd.setLocation)
+catalogTable.storage.inputFormat.foreach(sd.setInputFormat)
+catalogTable.storage.outputFormat.foreach(sd.setOutputFormat)
+
+val serdeInfo = new org.apache.hadoop.hive.metastore.api.SerDeInfo
+catalogTable.storage.serde.foreach(serdeInfo.setSerializationLib)
+sd.setSerdeInfo(serdeInfo)
+
+val serdeParameters = new java.util.HashMap[String, String]()
+catalogTable.storage.properties.foreach { case (k, v) => 
serdeParameters.put(k, v) }
+serdeInfo.setParameters(serdeParameters)
+
+new HiveTable(tTable)
+  }
+
+  /**
+   * Converts the native partition metadata representation format 
CatalogTablePartition to
+   * Hive's Partition.
+   */
+  def toHivePartition(
+  catalogTable: CatalogTable,
+  hiveTable: HiveTable,
+  partition: CatalogTablePartition): HivePartition = {
+val tPartition = new org.apache.hadoop.hive.metastore.api.Partition
+tPartition.setDbName(catalogTable.database)
+tPartition.setTableName(catalogTable.identifier.table)
+
tPartition.setValues(catalogTable.partitionColumnNames.map(partition.spec(_)).asJava)
+
+val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
+tPartition.setSd(sd)
+
+// Note: In Hive the schema and partition columns must be disjoint sets
+val schema = catalogTable.schema.map(toHiveColumn).filter { c =>
+  !catalogTable.partitionColumnNames.contains(c.getName)
+}
+sd.setCols(schema.asJava)
+
+partition.storage.locationUri.foreach(sd.setLocation)
+partition.storage.inputFormat.foreach(sd.setInputFormat)
+

[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16636
  
**[Test build #72101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72101/testReport)**
 for PR 16636 at commit 
[`c5cfa1a`](https://github.com/apache/spark/commit/c5cfa1afe3896ab92b34b833ba6c18cfce88b224).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16724
  
**[Test build #72100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72100/testReport)**
 for PR 16724 at commit 
[`aaa3c3d`](https://github.com/apache/spark/commit/aaa3c3dd42a9e1b79d30d31790560464be6df6c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16636#discussion_r98323927
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -455,4 +462,133 @@ private[spark] object HiveUtils extends Logging {
 case (decimal, DecimalType()) => decimal.toString
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
+
+  /** Converts the native StructField to Hive's FieldSchema. */
+  private def toHiveColumn(c: StructField): FieldSchema = {
+val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
+  c.metadata.getString(HiveUtils.hiveTypeString)
+} else {
+  c.dataType.catalogString
+}
+new FieldSchema(c.name, typeString, c.getComment.orNull)
+  }
+
+  /** Builds the native StructField from Hive's FieldSchema. */
+  private def fromHiveColumn(hc: FieldSchema): StructField = {
+val columnType = try {
+  CatalystSqlParser.parseDataType(hc.getType)
+} catch {
+  case e: ParseException =>
+throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
+}
+
+val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
+val field = StructField(
+  name = hc.getName,
+  dataType = columnType,
+  nullable = true,
+  metadata = metadata)
+Option(hc.getComment).map(field.withComment).getOrElse(field)
+  }
+
+  // TODO: merge this with HiveClientImpl#toHiveTable
+  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
+  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
+// We start by constructing an API table as Hive performs several 
important transformations
+// internally when converting an API table to a QL table.
+val tTable = new org.apache.hadoop.hive.metastore.api.Table()
+tTable.setTableName(catalogTable.identifier.table)
+tTable.setDbName(catalogTable.database)
+
+val tableParameters = new java.util.HashMap[String, String]()
+tTable.setParameters(tableParameters)
+catalogTable.properties.foreach { case (k, v) => 
tableParameters.put(k, v) }
+
+tTable.setTableType(catalogTable.tableType match {
+  case CatalogTableType.EXTERNAL => 
HiveTableType.EXTERNAL_TABLE.toString
+  case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString
+  case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString
+})
+
+val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
+tTable.setSd(sd)
+
+// Note: In Hive the schema and partition columns must be disjoint sets
+val (partCols, schema) = 
catalogTable.schema.map(toHiveColumn).partition { c =>
+  catalogTable.partitionColumnNames.contains(c.getName)
+}
+sd.setCols(schema.asJava)
+tTable.setPartitionKeys(partCols.asJava)
+
+catalogTable.storage.locationUri.foreach(sd.setLocation)
+catalogTable.storage.inputFormat.foreach(sd.setInputFormat)
+catalogTable.storage.outputFormat.foreach(sd.setOutputFormat)
+
+val serdeInfo = new org.apache.hadoop.hive.metastore.api.SerDeInfo
+catalogTable.storage.serde.foreach(serdeInfo.setSerializationLib)
+sd.setSerdeInfo(serdeInfo)
+
+val serdeParameters = new java.util.HashMap[String, String]()
+catalogTable.storage.properties.foreach { case (k, v) => 
serdeParameters.put(k, v) }
+serdeInfo.setParameters(serdeParameters)
+
+new HiveTable(tTable)
+  }
+
+  /**
+   * Converts the native partition metadata representation format 
CatalogTablePartition to
+   * Hive's Partition.
+   */
+  def toHivePartition(
+  catalogTable: CatalogTable,
+  hiveTable: HiveTable,
+  partition: CatalogTablePartition): HivePartition = {
+val tPartition = new org.apache.hadoop.hive.metastore.api.Partition
+tPartition.setDbName(catalogTable.database)
+tPartition.setTableName(catalogTable.identifier.table)
+
tPartition.setValues(catalogTable.partitionColumnNames.map(partition.spec(_)).asJava)
+
+val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
+tPartition.setSd(sd)
+
+// Note: In Hive the schema and partition columns must be disjoint sets
+val schema = catalogTable.schema.map(toHiveColumn).filter { c =>
+  !catalogTable.partitionColumnNames.contains(c.getName)
+}
+sd.setCols(schema.asJava)
+
+partition.storage.locationUri.foreach(sd.setLocation)
+partition.storage.inputFormat.foreach(sd.setInputFormat)
+

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98322977
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
-case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
-case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, 
l)
-case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
-case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
-case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
-case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
-case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+case Not(GreaterThan(l, r)) =>
+  assert(l.hashCode() <= r.hashCode())
--- End diff --

thanks! maybe an alternative way is to add comments saying it's guaranteed 
that `l.hashcode <= r.hashcode`, otherwise people might wonder why there is no 
`case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode()` at their first 
glance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16724
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16724
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72096/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16724
  
**[Test build #72096 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72096/testReport)**
 for PR 16724 at commit 
[`93d3806`](https://github.com/apache/spark/commit/93d380620c411dc33c14a4787f2ceee28e9c155c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98322886
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite {
   setTest(1, aUpper >= bUpper, bUpper <= aUpper)
 
   // `Not` canonicalization
-  setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), 
Literal(1) >= aUpper)
-  setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), 
Literal(1) <= aUpper)
-  setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), 
Literal(1) > aUpper)
-  setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), 
Literal(1) < aUpper)
+  setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), 
Literal(1) >= maxHash)
+  setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), 
Literal(1) >= minHash)
+  setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), 
Literal(1) <= maxHash)
+  setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), 
Literal(1) <= minHash)
+  setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), 
Literal(1) > maxHash)
+  setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), 
Literal(1) > minHash)
+  setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), 
Literal(1) < maxHash)
+  setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), 
Literal(1) < minHash)
--- End diff --

yea sure they are covered correctly even prior to this patch's changes!

the previous `aUpper`'hashcode is either greater than or less than `1`'s 
hashcode but can not be both, while this change aims to test both cases -- but 
I'm quite open to revert the changes if they are considered unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72099/testReport)**
 for PR 16722 at commit 
[`2112720`](https://github.com/apache/spark/commit/21127206db1c42710e63174904267663c9d92790).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98322377
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
+  // the location path in FileSystem is changed to 'a=1/b=2', 
which is wrongPath, then
+  // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem 
is deleted, but 'a=1'
+  // is still exists, which we also need to delete
+  val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
--- End diff --

Hmmm, could it possibly have multiple specs sharing the same parent 
directory, e.g., 'A=1/B=2', 'A=1/B=3', ...?

If so, when you delete the path 'a=1' here, in processing the next spec 
'A=1/B=3', I think the rename will fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72098/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72098 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72098/testReport)**
 for PR 16722 at commit 
[`8278724`](https://github.com/apache/spark/commit/827872489194e46421263c28231beb9cb6646dfe).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72098/testReport)**
 for PR 16722 at commit 
[`8278724`](https://github.com/apache/spark/commit/827872489194e46421263c28231beb9cb6646dfe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72092/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14725
  
**[Test build #72092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72092/testReport)**
 for PR 14725 at commit 
[`8b401ec`](https://github.com/apache/spark/commit/8b401ecc814a31f600da3fe28c4ff393bd4b3269).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16724
  
**[Test build #72097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72097/testReport)**
 for PR 16724 at commit 
[`b84c08b`](https://github.com/apache/spark/commit/b84c08bd6f1d66e09cafa9026b7da48b3f67ece4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16700
  
Thanks! Merging it to master. 

You can fix the minor comments in your other PRs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-27 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98320764
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -171,6 +174,42 @@ class StreamingQueryStatusAndProgressSuite extends 
StreamTest {
   query.stop()
 }
   }
+
+  test("SPARK-19378: Continue reporting stateOp metrics even if there is 
no active trigger") {
+import testImplicits._
+
+withSQLConf(SQLConf.STREAMING_NO_DATA_PROGRESS_EVENT_INTERVAL.key -> 
"10") {
+  val inputData = MemoryStream[Int]
+
+  val query = inputData.toDS().toDF("value")
+.select('value)
+.groupBy($"value")
+.agg(count("*"))
+.writeStream
+.queryName("metric_continuity")
+.format("memory")
+.outputMode("complete")
+.start()
+  try {
+inputData.addData(1, 2)
+query.processAllAvailable()
+
+val progress = query.lastProgress
+assert(progress.stateOperators.length > 0)
+// Should emit new progresses every 10 ms, but we could be facing 
a slow Jenkins
+eventually(timeout(1 minute)) {
+  val nextProgress = query.lastProgress
+  assert(nextProgress.timestamp !== progress.timestamp)
+  assert(nextProgress.numInputRows === 0)
+  assert(nextProgress.stateOperators.head.numRowsTotal === 2)
+  assert(nextProgress.stateOperators.head.numRowsTotal === 2)
--- End diff --

why is this line twice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...

2017-01-27 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16720
  
I am not sure tests are ever meant to run on a cluster (see the number of 
uses of LocalSparkContext in core/src/test/scala) -- The main reason I dont 
want to introduce the 'first test' approach is that we are then relying too 
much on test names not clashing / getting in front of each other which seems 
fragile.

The other thing that might be good is to create a test util function like 
`initializeTestSparkContext` and inside that we put both the session start and 
install stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16699
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72095/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16699
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72094/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16723
  
**[Test build #72094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72094/testReport)**
 for PR 16723 at commit 
[`fa522e6`](https://github.com/apache/spark/commit/fa522e6fba43faf481935f1204ddf9c13d82227f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16699
  
**[Test build #72095 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72095/testReport)**
 for PR 16699 at commit 
[`52bc32b`](https://github.com/apache/spark/commit/52bc32b2d86b2cd5ce092f86ee61f8fe9aebec5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16723
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72088/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72088/testReport)**
 for PR 16043 at commit 
[`91001fc`](https://github.com/apache/spark/commit/91001fc97af2e80272ef29f90038cc99283ca258).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72087/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16650
  
**[Test build #72087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72087/testReport)**
 for PR 16650 at commit 
[`eed4112`](https://github.com/apache/spark/commit/eed4112c092d49b4eafab363f3b0a16d83ec7c9d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16700


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16724
  
cc @cloud-fan @rxin @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98317878
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
+  // the location path in FileSystem is changed to 'a=1/b=2', 
which is wrongPath, then
+  // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem 
is deleted, but 'a=1'
--- End diff --

Either `although` or `but` needs to be deleted. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98317808
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
+  // the location path in FileSystem is changed to 'a=1/b=2', 
which is wrongPath, then
--- End diff --

`, then` -> `. Then`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98317719
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions,
--- End diff --

`give` -> `given`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98317671
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> 
v }
   }
 
+
+  /**
+   * The partition path created by Hive is in lowercase, while Spark SQL 
will
+   * rename it with the partition name in partitionColumnNames, and this 
function
+   * returns the extra lowercase path created by Hive, and then we can 
delete it.
--- End diff --

Nit: all of them are commas. You need to use periods. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98317696
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> 
v }
   }
 
+
+  /**
+   * The partition path created by Hive is in lowercase, while Spark SQL 
will
+   * rename it with the partition name in partitionColumnNames, and this 
function
+   * returns the extra lowercase path created by Hive, and then we can 
delete it.
+   * e.g. /path/A=1/B=2/C=3 is changed to /path/A=4/B=5/C=6, this function 
returns
--- End diff --

The same issue here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16724
  
**[Test build #72096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72096/testReport)**
 for PR 16724 at commit 
[`93d3806`](https://github.com/apache/spark/commit/93d380620c411dc33c14a4787f2ceee28e9c155c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows a...

2017-01-27 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16724

[SPARK-19352][WIP][SQL] Keep sort order of rows after external sorter when 
writing

## What changes were proposed in this pull request?

WIP

## How was this patch tested?

Will add test case later.

Please review http://spark.apache.org/contributing.html before opening a 
pull request

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 
keep-sort-order-after-external-sorter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16724


commit 93d380620c411dc33c14a4787f2ceee28e9c155c
Author: Liang-Chi Hsieh 
Date:   2017-01-28T00:45:11Z

Keep sort order of rows after external sorter when writing.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16699
  
**[Test build #72095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72095/testReport)**
 for PR 16699 at commit 
[`52bc32b`](https://github.com/apache/spark/commit/52bc32b2d86b2cd5ce092f86ee61f8fe9aebec5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16699
  
@zhengruifeng Thanks for the suggestions. Added casting and 
instrumentation. 
@imatiach-msft Thanks for the clarification! It is probably worth  another 
PR to clean up all tests in GLM.  
Let me know if there is any additional comments! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72093/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72093/testReport)**
 for PR 16722 at commit 
[`2729a63`](https://github.com/apache/spark/commit/2729a63a8f5eab7f0eb88a9b995175bda1f82a1e).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16723
  
**[Test build #72094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72094/testReport)**
 for PR 16723 at commit 
[`fa522e6`](https://github.com/apache/spark/commit/fa522e6fba43faf481935f1204ddf9c13d82227f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...

2017-01-27 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16723
  
@wangmiao1981  Would you mind checking this?  It has small fixes I noticed 
when reviewing your PR for Python LinearSVC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes fo...

2017-01-27 Thread jkbradley
GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/16723

[SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Python Params and 
LinearSVC

## What changes were proposed in this pull request?

* Removed Since tags in Python Params since they are inherited by other 
classes
* Fixed doc links for LinearSVC

## How was this patch tested?

* doc tests
* generating docs locally and checking manually

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark pyparam-fix-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16723


commit fa522e6fba43faf481935f1204ddf9c13d82227f
Author: Joseph K. Bradley 
Date:   2017-01-28T00:28:59Z

removed Since tags in Python Params since they are inherited by other 
classes.  fixed doc links for LinearSVC




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16636#discussion_r98315149
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1527,6 +1527,21 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  test("create a data source table without schema") {
+import testImplicits._
+withTempPath { tempDir =>
+  withTable("tab1", "tab2") {
+(("a", "b") :: Nil).toDF().write.json(tempDir.getCanonicalPath)
+
+val e = intercept[AnalysisException] { sql("CREATE TABLE tab1 
USING json") }.getMessage
--- End diff --

This error message is not from the code added by this PR. It is from [the 
original 
logics](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L343-L345).
 The error message is right if our file-based data sources are unable to infer 
the schema.

Sure, I will add a test case for `LibSVM `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...

2017-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16636
  
: ) Done. Found a solution to infer the schema of Hive Serde tables. Let me 
clean the code now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72093/testReport)**
 for PR 16722 at commit 
[`2729a63`](https://github.com/apache/spark/commit/2729a63a8f5eab7f0eb88a9b995175bda1f82a1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72091/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72091 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72091/testReport)**
 for PR 16722 at commit 
[`7dc1437`](https://github.com/apache/spark/commit/7dc1437df21999554e42d35d1d544839074414cf).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16694


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-27 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16694
  
LGTM, thank you!
Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16722
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72091/testReport)**
 for PR 16722 at commit 
[`7dc1437`](https://github.com/apache/spark/commit/7dc1437df21999554e42d35d1d544839074414cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14725
  
**[Test build #72092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72092/testReport)**
 for PR 14725 at commit 
[`8b401ec`](https://github.com/apache/spark/commit/8b401ecc814a31f600da3fe28c4ff393bd4b3269).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16722
  
ping @jkbradley @imatiach-msft 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-01-27 Thread sethah
GitHub user sethah opened a pull request:

https://github.com/apache/spark/pull/16722

[SPARK-9478][ML][MLlib] Add sample weights to decision trees

## What changes were proposed in this pull request?

This patch adds support for sample weights to `DecisionTreeRegressor` and 
`DecisionTreeClassifier`. 

*Note:* This patch does not add support for sample weights to RandomForest. 
As discussed in the JIRA, we would like to add sample weights into the bagging 
process. This patch is large enough as is, and there are some additional 
considerations to be made for random forests. Since the machinery introduced 
here needs to be present regardless, I have opted to leave random forests for a 
follow up pr. 

## How was this patch tested?

The algorithms are tested to ensure that:
1. Arbitrary scaling of constant weights has no effect
2. Outliers with small weights do not affect the learned model
3. Oversampling and weighting are equivalent

Unit tests are also added to test other smaller components.

## Summary of changes

* Impurity aggregators now store weighted sufficient statistics. They also 
store a raw count, however, since this is needed to use `minInstancesPerNode`. 

* Impurity aggregators now also hold the raw count.

* This patch maintains the meaning of `minInstancesPerNode`, in that the 
parameter still corresponds to raw, unweighted counts. It also adds a new 
parameter `minWeightFractionPerNode` which requires that nodes must contain at 
least `minWeightFractionPerNode * weightedNumExamples` total weight.

* This patch modifies `findSplitsForContinuousFeatures` to use weighted 
sums. Unit tests are added.

* TreePoint is modified to hold a sample weight

* BaggedPoint is modified from:

scala
private[spark] class BaggedPoint[Datum](val datum: Datum, val 
subsampleWeights: Array[Double]) extends Serializable


to 

scala
private[spark] class BaggedPoint[Datum](
val datum: Datum,
val subsampleCounts: Array[Int],
val sampleWeight: Double) extends Serializable


We do not simply multiply the counts by the weight and store that because 
we need the raw counts and the weight in order to use both 
`minInstancesPerNode` and `minWeightPerNode`

*Note:* many of the changed files are due simply to using `Instance` 
instead of `LabeledPoint`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sethah/spark SPARK-9478-tree

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16722


commit 2d86cea640634a205e378bddee0b01780d019ea2
Author: sethah 
Date:   2017-01-27T16:38:36Z

add weights to dt

commit 7dc1437df21999554e42d35d1d544839074414cf
Author: sethah 
Date:   2017-01-27T20:34:24Z

dt tests passing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2017-01-27 Thread julienledem
Github user julienledem commented on the issue:

https://github.com/apache/spark/pull/16281
  
FYI: Parquet 1.8.2 vote thread passed: 
https://mail-archives.apache.org/mod_mbox/parquet-dev/201701.mbox/%3CCAO4re1mHLT%2BLYn8s1RTEDZK8-9WSVugY8-HQqAN%2BtU%3DBOi1L9w%40mail.gmail.com%3E


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16721
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72090/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16721
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16721
  
**[Test build #72090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72090/testReport)**
 for PR 16721 at commit 
[`de56852`](https://github.com/apache/spark/commit/de56852daf03de33fbc6dfa0280e1ea5f5f32cc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16721
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72089/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract

2017-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16721
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract

2017-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16721
  
**[Test build #72089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72089/testReport)**
 for PR 16721 at commit 
[`2c1f673`](https://github.com/apache/spark/commit/2c1f67353b6049e7679947d9c6c1e9901d7e1c9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15768: [SPARK-18080][ML][PySpark] Locality Sensitive Hashing (L...

2017-01-27 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15768
  
Btw, @yanboliang and @Yunni  did you sync?  I'm fine with the takeover, but 
don't want to stomp on toes.  Both can be listed as authors when this gets 
merged.  Should we close this issue with the other taking its place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98309772
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -60,6 +61,137 @@ def numClasses(self):
 
 
 @inherit_doc
+class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
+HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, 
HasStandardization,
+HasThreshold, HasWeightCol, HasAggregationDepth, 
JavaMLWritable, JavaMLReadable):
+"""
+Linear SVM Classifier 
(https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
+This binary classifier optimizes the Hinge Loss using the OWLQN 
optimizer.
+
+>>> from pyspark.sql import Row
+>>> from pyspark.ml.linalg import Vectors
+>>> bdf = sc.parallelize([
+... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
+... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], 
[]))]).toDF()
+>>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
+>>> model = svm.fit(bdf)
+>>> model.coefficients
+DenseVector([1.909])
+>>> model.intercept
+-1.0045358384178
+>>> model.numClasses
+2
+>>> model.numFeatures
+1
+>>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
+>>> result = model.transform(test0).head()
+>>> result.prediction
+0.0
+>>> result.rawPrediction
+DenseVector([2.9135, -2.9135])
+>>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], 
[1.0]))]).toDF()
+>>> model.transform(test1).head().prediction
+1.0
+>>> svm.setParams("vector")
--- End diff --

I know, there are some not great examples to follow.  It'd be nice to clean 
those out sometime...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >