[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #64278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64278/consoleFull)**
 for PR 14452 at commit 
[`f0954cd`](https://github.com/apache/spark/commit/f0954cddbf80b65f1e0aac694be97b4bf9e29436).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14754: [SPARK-17188][SQL] Moves class QuantileSummaries to proj...

2016-08-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14754
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14447
  
**[Test build #64279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64279/consoleFull)**
 for PR 14447 at commit 
[`067f8fd`](https://github.com/apache/spark/commit/067f8fd110599d786ab44e66109f8af178133cc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75810819
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -139,12 +139,9 @@ private[hive] class TestHiveSparkSession(
 
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove HiveSharedState and TestHiveSessionState. 
Otherwise,
-  // we are not really testing the reflection logic based on the setting of
-  // CATALOG_IMPLEMENTATION.
   @transient
-  override lazy val sharedState: HiveSharedState = {
-existingSharedState.getOrElse(new HiveSharedState(sc))
+  override lazy val sharedState: SharedState = {
--- End diff --

i see, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14447
  
**[Test build #64274 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64274/consoleFull)**
 for PR 14447 at commit 
[`0dea0d2`](https://github.com/apache/spark/commit/0dea0d290472049cb7ba0f7226c523756161c8ed).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14447
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14447
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64274/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14618
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14618
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64271/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14618
  
**[Test build #64271 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64271/consoleFull)**
 for PR 14618 at commit 
[`85bd7dd`](https://github.com/apache/spark/commit/85bd7dd33e4c006488a24b7e758271d1029de8b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13796
  
@dbtsai 
in this jira
https://issues.apache.org/jira/browse/SPARK-17163
it consider to unify interface for binary logistic regression and softmax.
but I think they are not equivalent in fact (when numClass == 2).
I think the better way is:
1. extend the binary logistic regression to numClass > 2 cases, but 
optimize it on (numClass-1) parameters.
2. modify the softmax reg==0 case computation, use the way described in `1`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14239
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64277/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14239
  
**[Test build #64277 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64277/consoleFull)**
 for PR 14239 at commit 
[`32c63be`](https://github.com/apache/spark/commit/32c63bee5a12c2deb231896ed43d7b0a8cf7141e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14239
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75807732
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -139,12 +139,9 @@ private[hive] class TestHiveSparkSession(
 
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove HiveSharedState and TestHiveSessionState. 
Otherwise,
-  // we are not really testing the reflection logic based on the setting of
-  // CATALOG_IMPLEMENTATION.
   @transient
-  override lazy val sharedState: HiveSharedState = {
-existingSharedState.getOrElse(new HiveSharedState(sc))
+  override lazy val sharedState: SharedState = {
--- End diff --

Although `existingSharedState` is using the same name, we did not override 
[the 
one](https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L55)
 in `SharedState`. Thus, they are different. 

Actually, I tried it. It breaks [one test 
case](https://github.com/apache/spark/blob/a117afa7c2d94f943106542ec53d74ba2b5f1058/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala#L1082)
 

BTW, it works if `TestHiveSparkSession` overrides both [`sparkContext` and 
`existingSharedState`](https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L54-L55).
 However, we have to change `existingSharedState` in `SparkSession` to 
non-private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14239
  
**[Test build #64277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64277/consoleFull)**
 for PR 14239 at commit 
[`32c63be`](https://github.com/apache/spark/commit/32c63bee5a12c2deb231896ed43d7b0a8cf7141e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14531
  
**[Test build #64276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64276/consoleFull)**
 for PR 14531 at commit 
[`63c4bc5`](https://github.com/apache/spark/commit/63c4bc598987272460dfca9d324bb26af5451280).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bu...

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r75807324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -75,18 +81,52 @@ case class CreateTableLikeCommand(
   throw new AnalysisException(
 s"Source table in CREATE TABLE LIKE does not exist: 
'$sourceTable'")
 }
-if (catalog.isTemporaryTable(sourceTable)) {
-  throw new AnalysisException(
-s"Source table in CREATE TABLE LIKE cannot be temporary: 
'$sourceTable'")
-}
 
-val tableToCreate = catalog.getTableMetadata(sourceTable).copy(
-  identifier = targetTable,
-  tableType = CatalogTableType.MANAGED,
-  createTime = System.currentTimeMillis,
-  lastAccessTime = -1).withNewStorage(locationUri = None)
+val sourceTableDesc = catalog.getTableMetadata(sourceTable)
+val sourceStorageFormat = sourceTableDesc.storage
 
-catalog.createTable(tableToCreate, ifNotExists)
+sourceTableDesc.tableType match {
+  case CatalogTableType.MANAGED | CatalogTableType.EXTERNAL | 
CatalogTableType.VIEW => // OK
+  case o => throw new AnalysisException(
+s"CREATE TABLE LIKE is not allowed when the source table is 
${o.name}")
+}
+
+// For EXTERNAL_TABLE, the table properties has a particular field. To 
change it
+// to a MANAGED_TABLE, we need to remove it; Otherwise, it will be 
EXTERNAL_TABLE,
+// even if we set the tableType to MANAGED
+// 
(metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105)
+// Table comment is stored as a table property. To clean it, we also 
should remove them.
+val newTableProp = sourceTableDesc.properties.filterKeys(_ != 
"EXTERNAL")
--- End diff --

Done. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-22 Thread Sherry302
Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14768
  
Thanks, @jerryshao. I have updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/8880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/8880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64266/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/8880
  
**[Test build #64266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64266/consoleFull)**
 for PR 8880 at commit 
[`4f0732f`](https://github.com/apache/spark/commit/4f0732f3a3f3b6a81067852963356c017b6cda81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14239
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64275/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14239
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14239
  
**[Test build #64275 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64275/consoleFull)**
 for PR 14239 at commit 
[`09ab278`](https://github.com/apache/spark/commit/09ab2789baf95e47653a87c0bb35d9c979ecf5c7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14754: [SPARK-17188][SQL] Moves class QuantileSummaries to proj...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64268/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14754: [SPARK-17188][SQL] Moves class QuantileSummaries to proj...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14754
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75806381
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -139,12 +139,9 @@ private[hive] class TestHiveSparkSession(
 
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove HiveSharedState and TestHiveSessionState. 
Otherwise,
-  // we are not really testing the reflection logic based on the setting of
-  // CATALOG_IMPLEMENTATION.
   @transient
-  override lazy val sharedState: HiveSharedState = {
-existingSharedState.getOrElse(new HiveSharedState(sc))
+  override lazy val sharedState: SharedState = {
--- End diff --

But is `existingSharedState.getOrElse(new SharedState(sc))` same with what 
the parent do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14754: [SPARK-17188][SQL] Moves class QuantileSummaries to proj...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14754
  
**[Test build #64268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64268/consoleFull)**
 for PR 14754 at commit 
[`e8745e7`](https://github.com/apache/spark/commit/e8745e7fd62e975504bfac8a08c1c862829450ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14239
  
**[Test build #64275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64275/consoleFull)**
 for PR 14239 at commit 
[`09ab278`](https://github.com/apache/spark/commit/09ab2789baf95e47653a87c0bb35d9c979ecf5c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14769: [MINOR][SQL] Remove implemented functions from comments ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14769
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14769: [MINOR][SQL] Remove implemented functions from co...

2016-08-22 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14769

[MINOR][SQL] Remove implemented functions from comments of 'HiveSessi…

## What changes were proposed in this pull request?
This PR removes implemented functions from comments of 
`HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.

## How was this patch tested?
Manual.

…onCatalog.scala'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark cleanComment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14769


commit 8f3e25fe3fb88ba51c8c01013786041f58e80427
Author: Weiqing Yang 
Date:   2016-08-23T05:43:36Z

[MINOR][SQL] Remove implemented functions from comments of 
'HiveSessionCatalog.scala'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14447
  
**[Test build #64274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64274/consoleFull)**
 for PR 14447 at commit 
[`0dea0d2`](https://github.com/apache/spark/commit/0dea0d290472049cb7ba0f7226c523756161c8ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-22 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14768#discussion_r75805277
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -522,7 +522,8 @@ public long spill() throws IOException {
   // is accessing the current record. We free this page in that 
caller's next loadNext()
   // call.
   for (MemoryBlock page : allocatedPages) {
-if (!loaded || page.pageNumber != 
((UnsafeInMemorySorter.SortedIterator)upstream).getCurrentPageNumber()) {
+if (!loaded || page.pageNumber != 
((UnsafeInMemorySorter.SortedIterator)upstream)
+.getCurrentPageNumber()) {
--- End diff --

2 space indent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64262/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14757
  
**[Test build #64273 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64273/consoleFull)**
 for PR 14757 at commit 
[`4a14bed`](https://github.com/apache/spark/commit/4a14bed0341c73e8d69df2d727d9ede138058d59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64262 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64262/consoleFull)**
 for PR 14753 at commit 
[`0173d2c`](https://github.com/apache/spark/commit/0173d2c8cbaef33c8de517d65104a4fd92dce894).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64264/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14757
  
@cloud-fan Yeah, `getNewClient` equals to `getClient.newSession`. After 
rethinking it, maybe we do not need it. : ) Thus, the latest changes remove it 
too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64264 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64264/consoleFull)**
 for PR 14753 at commit 
[`d3108ab`](https://github.com/apache/spark/commit/d3108ab7ea1e10b8de31f1fd6546cc3275d6e48a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14757
  
**[Test build #64272 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64272/consoleFull)**
 for PR 14757 at commit 
[`d61ced8`](https://github.com/apache/spark/commit/d61ced8a2e81cef13afa5e0aa6203e505dd67570).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75804665
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -139,12 +139,9 @@ private[hive] class TestHiveSparkSession(
 
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove HiveSharedState and TestHiveSessionState. 
Otherwise,
--- End diff --

The comment is moved to 
https://github.com/gatorsmile/spark/blob/d61ced8a2e81cef13afa5e0aa6203e505dd67570/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala#L147-L148

Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75804066
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -139,12 +139,9 @@ private[hive] class TestHiveSparkSession(
 
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove HiveSharedState and TestHiveSessionState. 
Otherwise,
-  // we are not really testing the reflection logic based on the setting of
-  // CATALOG_IMPLEMENTATION.
   @transient
-  override lazy val sharedState: HiveSharedState = {
-existingSharedState.getOrElse(new HiveSharedState(sc))
+  override lazy val sharedState: SharedState = {
--- End diff --

`sharedState` in `SparkSession` is unable to access the 
`existingSharedState` defined in `TestHiveSparkSession`.  We are unable to 
override it because [it is 
private](https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L55).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14765: [SPARK-15815] Keeping tell yarn the target executors in ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14765
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64263/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14765: [SPARK-15815] Keeping tell yarn the target executors in ...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14765
  
**[Test build #64263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64263/consoleFull)**
 for PR 14765 at commit 
[`59de77b`](https://github.com/apache/spark/commit/59de77b5f523340d50836f072b38afee9bf579c3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14765: [SPARK-15815] Keeping tell yarn the target executors in ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14765
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14768
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/13796
  
The solution of this overparameterized problem in the link is just adding 
the regularization, and users may not want it. I think we need to optimize it 
on (k-1) parameters, and then put the final or the first one back by centering 
the intercepts and weights. @sethah could you have a JIRA to track this issue? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14768: [MINOR][BUILD] Fix Java CheckStyle Error

2016-08-22 Thread Sherry302
GitHub user Sherry302 opened a pull request:

https://github.com/apache/spark/pull/14768

[MINOR][BUILD] Fix Java CheckStyle Error

## What changes were proposed in this pull request?
As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing 
list), besides the critical bugs, it's better to fix the code style errors 
before the release.

Before:  
```
./dev/lint-java
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525]
 (sizes) LineLength: Line is longer than 100 characters (found 119).
[ERROR] 
src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64]
 (sizes) LineLength: Line is longer than 100 characters (found 103).
```
After:
```
./dev/lint-java
Using `mvn` from path: /usr/local/bin/mvn
Checkstyle checks passed.
```
## How was this patch tested?
Manual.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sherry302/spark fixjavastyle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14768.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14768


commit a36989105086f60417f21341d8573b4d3c6bc7eb
Author: Weiqing Yang 
Date:   2016-08-23T04:42:04Z

[MINOR][BUILD] Fix Java CheckStyle Error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64270/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14761
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64260/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14728
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14597#discussion_r75802771
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,47 @@ object ChiSqSelectorModel extends 
Loader[ChiSqSelectorModel] {
 
 /**
  * Creates a ChiSquared feature selector.
- * @param numTopFeatures number of features that selector will select
- *   (ordered by statistic value descending)
- *   Note that if the number of features is less than 
numTopFeatures,
- *   then this will select all features.
  */
-@Since("1.3.0")
-class ChiSqSelector @Since("1.3.0") (
-  @Since("1.3.0") val numTopFeatures: Int) extends Serializable {
+@Since("2.1.0")
+class ChiSqSelector @Since("2.1.0") () extends Serializable {
+  private var numTopFeatures: Int = 50
+  private var percentile: Double = 0.1
+  private var alpha: Double = 0.05
+  private var selectorType = ChiSqSelectorType.KBest
+
+  @Since("1.3.0")
+  def this(numTopFeatures: Int) {
+this()
+this.numTopFeatures = numTopFeatures
+  }
+
+  @Since("2.1.0")
+  def setNumTopFeatures(value: Int): this.type = {
+numTopFeatures = value
+selectorType = ChiSqSelectorType.KBest
+this
+  }
+
+  @Since("2.1.0")
+  def setPercentile(value: Double): this.type = {
+require(value <= 1 && value >= 0, "Percentile should be larger than 0 
and less than 100")
+percentile = value
+selectorType = ChiSqSelectorType.Percentile
+this
+  }
+
+  @Since("2.1.0")
+  def setAlpha(value: Double): this.type = {
--- End diff --

require is added, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14728: [SPARK-17165][SQL] FileStreamSource should not track the...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14728
  
**[Test build #64260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64260/consoleFull)**
 for PR 14728 at commit 
[`a371f05`](https://github.com/apache/spark/commit/a371f05843d1eae3ae09e21bbd1cedeffb19d0e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14597#discussion_r75802754
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,47 @@ object ChiSqSelectorModel extends 
Loader[ChiSqSelectorModel] {
 
 /**
  * Creates a ChiSquared feature selector.
- * @param numTopFeatures number of features that selector will select
- *   (ordered by statistic value descending)
- *   Note that if the number of features is less than 
numTopFeatures,
- *   then this will select all features.
  */
-@Since("1.3.0")
-class ChiSqSelector @Since("1.3.0") (
-  @Since("1.3.0") val numTopFeatures: Int) extends Serializable {
+@Since("2.1.0")
+class ChiSqSelector @Since("2.1.0") () extends Serializable {
+  private var numTopFeatures: Int = 50
+  private var percentile: Double = 0.1
+  private var alpha: Double = 0.05
+  private var selectorType = ChiSqSelectorType.KBest
+
+  @Since("1.3.0")
+  def this(numTopFeatures: Int) {
+this()
+this.numTopFeatures = numTopFeatures
+  }
+
+  @Since("2.1.0")
+  def setNumTopFeatures(value: Int): this.type = {
+numTopFeatures = value
+selectorType = ChiSqSelectorType.KBest
+this
+  }
+
+  @Since("2.1.0")
+  def setPercentile(value: Double): this.type = {
+require(value <= 1 && value >= 0, "Percentile should be larger than 0 
and less than 100")
--- End diff --

Thanks, the code is updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14761
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14761
  
**[Test build #64269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64269/consoleFull)**
 for PR 14761 at commit 
[`95fec8b`](https://github.com/apache/spark/commit/95fec8be7d69a7478d1ff7ffbd4853073d17d0e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64269/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14763: [SPARK-17194] Use single quotes when generating SQL for ...

2016-08-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14763
  
Hi, @JoshRosen .
The generated SQL seems to be changed. In order to pass 
`LogicalPlanToSQLSuite`, you can regenerate the answer set with the following 
command.
```
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only 
*LogicalPlanToSQLSuite"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13796
  
@dbtsai 
simply say, when reg==0, the minimizer is not a point but becomes a plane. 
so the second derivative on the plane will turn into zero. so ...
and I give a reference here: 
http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
it mentions this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreCatalog....

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14618
  
**[Test build #64271 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64271/consoleFull)**
 for PR 14618 at commit 
[`85bd7dd`](https://github.com/apache/spark/commit/85bd7dd33e4c006488a24b7e758271d1029de8b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14761
  
**[Test build #64270 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64270/consoleFull)**
 for PR 14761 at commit 
[`39c34da`](https://github.com/apache/spark/commit/39c34dac40943da5e74af3cfc0d3fb69ebe1c8ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75800807
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -139,12 +139,9 @@ private[hive] class TestHiveSparkSession(
 
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove HiveSharedState and TestHiveSessionState. 
Otherwise,
--- End diff --

Yeah. Let me move it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/13796
  
@WeichenXu123 Do you run into this potential issue with any dataset? If so, 
we may need to consider optimize softmax with pivoting when `reg == 0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75800677
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -40,6 +40,15 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
 
   protected def resetState(): Unit = { }
 
+  // Clear all state before each test
+  override def beforeEach(): Unit = {
+try {
+  resetState()
--- End diff --


https://github.com/apache/spark/blob/7bb64aae27f670531699f59d3f410e38866609b7/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala#L31-L35

Before this PR, we create a metastore at a temp location to avoid any 
potential conflict of having multiple connections to a single derby instance. 
This does not sound an issue now. Is my understanding right?

If we do not create a new metastore, we have to clean the existing 
metastore; otherwise, [the 
checking](https://github.com/apache/spark/blob/5effc016c893ce917d535cc1b5026d8e4c846721/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala#L60-L61)
 in the test case could fail.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14767: [SparkR][Minor] Remove Reference Link for the Common Win...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14767
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64267/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14761: [SparkR][Minor] Add installation message for remote mast...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14761
  
**[Test build #64269 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64269/consoleFull)**
 for PR 14761 at commit 
[`95fec8b`](https://github.com/apache/spark/commit/95fec8be7d69a7478d1ff7ffbd4853073d17d0e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14767: [SparkR][Minor] Remove Reference Link for the Common Win...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14767
  
**[Test build #64267 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64267/consoleFull)**
 for PR 14767 at commit 
[`2bf9be0`](https://github.com/apache/spark/commit/2bf9be051b5a4fe6d90ea34f6397d42f7a27fe88).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14767: [SparkR][Minor] Remove Reference Link for the Common Win...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14767
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-08-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13796
  
I found a problem in the merged code:  when reg == 0 the minimizer of 
softmax cost is not unique.
In such case, it will cause Hessian matrix non-invertible, and I thinks it 
may cause quasi-newton's methods such as LBFGS run into numerical problems.
so, is it better to forbid the `reg==0` case for softmax parameters ? 
cc @sethah @dbtsai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14625: [SPARK-17045] [SQL] Build/move Join-related test cases i...

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14625
  
Sure, will split it to multiple PRs. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14757: [SPARK-17190] [SQL] Removal of HiveSharedState

2016-08-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14757#discussion_r75799684
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -41,13 +42,20 @@ import org.apache.spark.sql.types.{DataType, StructType}
  * A persistent implementation of the system catalog using Hive.
  * All public methods must be synchronized for thread-safety.
  */
-private[spark] class HiveExternalCatalog(client: HiveClient, hadoopConf: 
Configuration)
+private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: 
Configuration)
   extends ExternalCatalog with Logging {
 
   import CatalogTypes.TablePartitionSpec
   import HiveExternalCatalog._
   import CatalogTableType._
 
+  /**
+   * A Hive client used to interact with the metastore.
+   */
+  private val client: HiveClient = {
--- End diff --

Sure, will do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14625: [SPARK-17045] [SQL] Build/move Join-related test ...

2016-08-22 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/14625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64256/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64256/consoleFull)**
 for PR 14753 at commit 
[`7d88b20`](https://github.com/apache/spark/commit/7d88b2046e691f3c5d0a396e0cee0c0f85ec8308).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14766
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14766
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64265/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14766
  
**[Test build #64265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64265/consoleFull)**
 for PR 14766 at commit 
[`0d0f20e`](https://github.com/apache/spark/commit/0d0f20e5117aa685342801b7c3896cf59e38f07c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HasAggregationDepth(Params):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14754: [SPARK-17188][SQL] Moves class QuantileSummaries to proj...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14754
  
**[Test build #64268 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64268/consoleFull)**
 for PR 14754 at commit 
[`e8745e7`](https://github.com/apache/spark/commit/e8745e7fd62e975504bfac8a08c1c862829450ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14752: [SPARK-17186][SQL] remove catalog table type INDEX

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14752
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64253/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14767: [SparkR][Minor] Remove Reference Link for the Common Win...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14767
  
**[Test build #64267 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64267/consoleFull)**
 for PR 14767 at commit 
[`2bf9be0`](https://github.com/apache/spark/commit/2bf9be051b5a4fe6d90ea34f6397d42f7a27fe88).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14752: [SPARK-17186][SQL] remove catalog table type INDEX

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14752
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14767: [SparkR][Minor] Remove Reference Link for the Com...

2016-08-22 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14767

[SparkR][Minor] Remove Reference Link for the Common Windows Environment 
Variables.

## What changes were proposed in this pull request?

The PR removes reference link in the doc for environment variables for 
common Windows folders. The cran check gave code 503: service unavailable on 
the original link. 


## How was this patch tested?

Manual check.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARKR-RemoveLink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14767


commit 2bf9be051b5a4fe6d90ea34f6397d42f7a27fe88
Author: Junyang Qian 
Date:   2016-08-23T03:42:25Z

Remove reference link for the common Windows environment variables.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14752: [SPARK-17186][SQL] remove catalog table type INDEX

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14752
  
**[Test build #64253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64253/consoleFull)**
 for PR 14752 at commit 
[`d1e83d2`](https://github.com/apache/spark/commit/d1e83d2ace803b51ea3d25c575b84e1af6bde353).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64255/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14754: [SPARK-17188][SQL] Moves class QuantileSummaries to proj...

2016-08-22 Thread clockfly
Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14754
  
@rxin updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64255/consoleFull)**
 for PR 14753 at commit 
[`0fdc1ea`](https://github.com/apache/spark/commit/0fdc1eadf46c6db96cf479fd317e8e0b89e65b05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class TypedImperativeAggregate[T] extends ImperativeAggregate 
`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-22 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75798274
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,175 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * Aggregation function which allows **arbitrary** user-defined java 
object to be used as internal
+ * aggregation buffer object.
+ *
+ * {{{
+ *aggregation buffer for normal aggregation function `avg`
+ *|
+ *v
+ *  
+--+---+---+
+ *  |  sum1 (Long) | count1 (Long) | generic user-defined 
java objects |
+ *  
+--+---+---+
+ * ^
+ * |
+ *Aggregation buffer object for 
`TypedImperativeAggregate` aggregation function
+ * }}}
+ *
+ * Work flow (Partial mode aggregate at Mapper side, and Final mode 
aggregate at Reducer side):
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object.
+ *  2. Upon each input row, the framework calls
+ * `update(buffer: T, input: InternalRow): Unit` to update the 
aggregation buffer object T.
+ *  3. After processing all rows of current group (group by key), the 
framework will serialize
+ * aggregation buffer object T to SparkSQL internally supported 
underlying storage format, and
+ * persist the serializable format to disk if needed.
+ *  4. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object (type T) for merging.
+ *  2. For each aggregation output of Stage 1, The framework de-serializes 
the storage
+ * format and generates one input aggregation object (type T).
+ *  3. For each input aggregation object, the framework calls 
`merge(buffer: T, input: T): Unit`
+ * to merge the input aggregation object into aggregation buffer 
object.
+ *  4. After processing all input aggregation objects of current group 
(group by key), the framework
+ * calls method `eval(buffer: T)` to generate the final output for 
this group.
+ *  5. The framework moves on to next group, until all groups have been 
processed.
+ */
+abstract class TypedImperativeAggregate[T] extends ImperativeAggregate {
+
+  /**
+   * Creates an empty aggregation buffer object. This is called before 
processing each key group
+   * (group by key).
+   *
+   * @return an aggregation buffer object
+   */
+  def createAggregationBuffer(): T
+
+  /**
+   * In-place updates the aggregation buffer object with an input row. 
buffer = buffer + input.
+   * This is typically called when doing Partial or Complete mode 
aggregation.
+   *
+   * @param buffer The aggregation buffer object.
+   * @param input an input row
+   */
+  def update(buffer: T, input: InternalRow): Unit
+
+  /**
+   * Merges an input aggregation object into aggregation buffer object. 
buffer = buffer + input.
+   * This is typically called when doing PartialMerge or Final mode 
aggregation.
+   *
+   * @param buffer the aggregation buffer object used to store the 
aggregation result.
+   * @param input an input aggregation object. Input aggregation object 
can be produced by
+   *  de-serializing the partial aggregate's output from 
Mapper side.
+   */
+  def merge(buffer: T, input: T): Unit
--- End diff --

@cloud-fan you can find an example at

https://github.com/apache/spark/commit/0a777cc8880edd01e209f73d96fe1c644a0c473f


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/8880
  
**[Test build #64266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64266/consoleFull)**
 for PR 8880 at commit 
[`4f0732f`](https://github.com/apache/spark/commit/4f0732f3a3f3b6a81067852963356c017b6cda81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tr...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14766
  
**[Test build #64265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64265/consoleFull)**
 for PR 14766 at commit 
[`0d0f20e`](https://github.com/apache/spark/commit/0d0f20e5117aa685342801b7c3896cf59e38f07c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14524: [SPARK-16832] [ML] [WIP] CrossValidator and TrainValidat...

2016-08-22 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/14524
  
Sorry for late response! I'm against this change since it introduces 
indeterministic behavior and makes applications hard to debug. For example, I 
want to cross validate some estimator that accepts a random seed, but I leave 
its random seed as the default. The cross validation result would be inaccurate 
if a random seed is generated every time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-22 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r75797813
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,146 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * Aggregation function which allows **arbitrary** user-defined java 
object to be used as internal
+ * aggregation buffer object.
+ *
+ * {{{
+ *aggregation buffer for normal aggregation function `avg`
+ *|
+ *v
+ *  
+--+---+---+
+ *  |  sum1 (Long) | count1 (Long) | generic user-defined 
java objects |
+ *  
+--+---+---+
+ * ^
+ * |
+ *Aggregation buffer object for 
`TypedImperativeAggregate` aggregation function
+ * }}}
+ *
+ * Work flow (Partial mode aggregate at Mapper side, and Final mode 
aggregate at Reducer side):
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object.
+ *  2. Upon each input row, the framework calls
+ * `update(buffer: T, input: InternalRow): Unit` to update the 
aggregation buffer object T.
+ *  3. After processing all rows of current group (group by key), the 
framework will serialize
+ * aggregation buffer object T to storage format (Array[Byte]) and 
persist the Array[Byte]
+ * to disk if needed.
+ *  4. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object (type T) for merging.
+ *  2. For each aggregation output of Stage 1, The framework de-serializes 
the storage
+ * format (Array[Byte]) and produces one input aggregation object 
(type T).
+ *  3. For each input aggregation object, the framework calls 
`merge(buffer: T, input: T): Unit`
+ * to merge the input aggregation object into aggregation buffer 
object.
+ *  4. After processing all input aggregation objects of current group 
(group by key), the framework
+ * calls method `eval(buffer: T)` to generate the final output for 
this group.
+ *  5. The framework moves on to next group, until all groups have been 
processed.
+ */
+abstract class TypedImperativeAggregate[T] extends ImperativeAggregate {
+
+  /**
+   * Creates an empty aggregation buffer object. This is called before 
processing each key group
+   * (group by key).
+   *
+   * @return an aggregation buffer object
+   */
+  def createAggregationBuffer(): T
+
+  /**
+   * In-place updates the aggregation buffer object with an input row. 
buffer = buffer + input.
+   * This is typically called when doing Partial or Complete mode 
aggregation.
+   *
+   * @param buffer The aggregation buffer object.
+   * @param input an input row
+   */
+  def update(buffer: T, input: InternalRow): Unit
+
+  /**
+   * Merges an input aggregation object into aggregation buffer object. 
buffer = buffer + input.
+   * This is typically called when doing PartialMerge or Final mode 
aggregation.
+   *
+   * @param buffer the aggregation buffer object used to store the 
aggregation result.
+   * @param input an input aggregation object. Input aggregation object 
can be produced by
+   *  de-serializing the partial aggregate's output from 
Mapper side.
+   */
+  def merge(buffer: T, input: T): Unit
+
+  /**
+   * Generates the final aggregation result value for current key group 
with the aggregation buffer
+   * object.
+   *
+   * @param buffer aggregation buffer object.
+   * @return The aggregation result of current key group
+   */
+  def eval(buffer: T): Any
+
+  /** Returns the class of aggregation buffer object */
+  def aggregationBufferClass: Class[T]
+
+  /** Serializes the aggregation buffer object T to Array[Byte] */
+  def serialize(buffer: T): Array[Byte]
--- End diff --

Here we limit the serializable format to `Array[Byte]`

The reason is that SpecialMutableRow will do type check for atomic types 
for each `update` call of  the aggregation 

[GitHub] spark issue #14765: [SPARK-15815] Keeping tell yarn the target executors in ...

2016-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14765
  
**[Test build #64263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64263/consoleFull)**
 for PR 14765 at commit 
[`59de77b`](https://github.com/apache/spark/commit/59de77b5f523340d50836f072b38afee9bf579c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

2016-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13988
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14766: [SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supp...

2016-08-22 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/14766

[SPARK-17197] [ML] [PySpark] PySpark LiR/LoR supports tree aggregation 
level configurable.

## What changes were proposed in this pull request?
[SPARK-17197](https://issues.apache.org/jira/browse/SPARK-17197) makes tree 
aggregation level in LiR/LoR configurable, this PR makes PySpark support this 
function.

## How was this patch tested?
Since ```aggregationDepth``` is an expert param, I'm not prefer to test it 
in doctest which is also used for example. Here is the offline test result:





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-17197

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14766.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14766


commit 0d0f20e5117aa685342801b7c3896cf59e38f07c
Author: Yanbo Liang 
Date:   2016-08-23T03:15:02Z

PySpark LiR/LoR supports tree aggregation level configurable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >