[GitHub] spark pull request: [SQL] sum and avg on empty table should always...

2014-12-11 Thread adrian-wang
GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3675

[SQL] sum and avg on empty table should always return null

So the optimizations are not valid. Also I think the optimization here is 
rarely encounter, so removing them will not have influence on performance.

I'll create JIRA after jira is back. Can we merge #3445 before I add a 
comparison test case from this?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark sumempty

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3675


commit 42df76399a9f815ddd235273d32ebfaafcc7c2fe
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-12-11T08:07:54Z

sum and avg on empty table should always return null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] sum and avg on empty table should always...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3675#issuecomment-66585913
  
  [Test build #24357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24357/consoleFull)
 for   PR 3675 at commit 
[`42df763`](https://github.com/apache/spark/commit/42df76399a9f815ddd235273d32ebfaafcc7c2fe).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21662905
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
 ---
@@ -85,10 +86,22 @@ private[hive] class HiveThriftServer2(hiveContext: 
HiveContext)
 setSuperField(this, cliService, sparkSqlCliService)
 addService(sparkSqlCliService)
 
-val thriftCliService = new ThriftBinaryCLIService(sparkSqlCliService)
-setSuperField(this, thriftCLIService, thriftCliService)
-addService(thriftCliService)
+if (isHTTPTransportMode(hiveConf)){
+  val thriftCliService = new ThriftHttpCLIService(sparkSqlCliService)
+  setSuperField(this, thriftCLIService, thriftCliService)
+  addService(thriftCliService)
+} else {
+  val thriftCliService = new ThriftBinaryCLIService(sparkSqlCliService)
+  setSuperField(this, thriftCLIService, thriftCliService)
+  addService(thriftCliService)
+}
 
 initCompositeService(hiveConf)
   }
+
+  private def isHTTPTransportMode(hiveConf: HiveConf): Boolean = {
+val transportMode: String = 
hiveConf.getVar(ConfVars.HIVE_SERVER2_TRANSPORT_MODE)
+return transportMode.equalsIgnoreCase(http)
--- End diff --

In Scala we don't need `return` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21662926
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -70,11 +70,16 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 port
   }
 
-  def withJdbcStatement(serverStartTimeout: FiniteDuration = 1.minute)(f: 
Statement = Unit) {
+  def withJdbcStatement(serverStartTimeout: FiniteDuration = 1.minute, 
httpMode: Boolean = false)(f: Statement = Unit) {
 val port = randomListeningPort
 
-startThriftServer(port, serverStartTimeout) {
-  val jdbcUri = sjdbc:hive2://${localhost}:$port/
+startThriftServer(port, serverStartTimeout, httpMode) {
+  val jdbcUri = if (httpMode) {
+
sjdbc:hive2://${localhost}:$port/default?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice
--- End diff --

100 columns exceeded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21662942
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -113,7 +118,8 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 
   def startThriftServer(
   port: Int,
-  serverStartTimeout: FiniteDuration = 1.minute)(
+  serverStartTimeout: FiniteDuration = 1.minute,
+  httpMode: Boolean = false )(
--- End diff --

Please remove the space before `)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3662#issuecomment-66586675
  
  [Test build #24358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24358/consoleFull)
 for   PR 3662 at commit 
[`411b287`](https://github.com/apache/spark/commit/411b28709b55cfa94ebd04ced6d67df997ebf467).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21663017
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -121,15 +127,28 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 val warehousePath = getTempFilePath(warehouse)
 val metastorePath = getTempFilePath(metastore)
 val metastoreJdbcUri = 
sjdbc:derby:;databaseName=$metastorePath;create=true
+
 val command =
-  s$startScript
- |  --master local
- |  --hiveconf hive.root.logger=INFO,console
- |  --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri
- |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
- |  --hiveconf 
${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost}
- |  --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_PORT}=$port
-   .stripMargin.split(\\s+).toSeq
+  if (httpMode){
+  s$startScript
+   |  --master local
+   |  --hiveconf hive.root.logger=INFO,console
+   |  --hiveconf 
${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri
+   |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
+   |  --hiveconf 
${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost}
+   |  --hiveconf ${ConfVars.HIVE_SERVER2_TRANSPORT_MODE}=${http}
--- End diff --

The `${...}` wrapper is not needed in this line as well as the line above. 
It's safe to use double quotes within `...`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21663058
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -121,15 +127,28 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 val warehousePath = getTempFilePath(warehouse)
 val metastorePath = getTempFilePath(metastore)
 val metastoreJdbcUri = 
sjdbc:derby:;databaseName=$metastorePath;create=true
+
 val command =
-  s$startScript
- |  --master local
- |  --hiveconf hive.root.logger=INFO,console
- |  --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri
- |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
- |  --hiveconf 
${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost}
- |  --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_PORT}=$port
-   .stripMargin.split(\\s+).toSeq
+  if (httpMode){
+  s$startScript
+   |  --master local
+   |  --hiveconf hive.root.logger=INFO,console
+   |  --hiveconf 
${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri
+   |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
+   |  --hiveconf 
${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost}
+   |  --hiveconf ${ConfVars.HIVE_SERVER2_TRANSPORT_MODE}=${http}
+   |  --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_HTTP_PORT}=$port
+.stripMargin.split(\\s+).toSeq
+  } else {
+  s$startScript
+ |  --master local
+ |  --hiveconf hive.root.logger=INFO,console
+ |  --hiveconf 
${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri
+ |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
+ |  --hiveconf 
${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost}
--- End diff --

Ah, I see, the original code uses a redundant `${...}` wrapper, please help 
removing this one :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21663078
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -217,6 +236,25 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 }
   }
 
+  test(Test JDBC query execution in Http Mode) {
+withJdbcStatement( httpMode = true ) { statement =
--- End diff --

Please remove spaces after `(` and before `)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4827][SQL] Fix resolution of deeply nes...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3674#issuecomment-66587059
  
  [Test build #24355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24355/consoleFull)
 for   PR 3674 at commit 
[`d83d6a1`](https://github.com/apache/spark/commit/d83d6a150d85bc9033742256c518e08770296371).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21663114
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -267,6 +305,14 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 }
   }
 
+  test(Checks Hive version in Http Mode) {
+withJdbcStatement( httpMode = true ) { statement =
--- End diff --

Remove the extra spaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4825] [SQL] CTAS fails to resolve when ...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3673#issuecomment-66587032
  
  [Test build #24359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24359/consoleFull)
 for   PR 3673 at commit 
[`e8cbd56`](https://github.com/apache/spark/commit/e8cbd561beb2476eb810ff2c7f5dadbae49cdadf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3672#issuecomment-66587060
  
This LGTM except for several minor styling issue. Thanks for working on 
this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4827][SQL] Fix resolution of deeply nes...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3674#issuecomment-66587064
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24355/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3672#issuecomment-66587346
  
One more thing, please rename the PR title to [SQL] SPARK-4700:  You 
can find names of all valid Spark components from the JIRA. (Couldn't provide a 
URL right now because JIRA is reindexing...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-11 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-66587933
  
Thanks for that. I add new commit to make the methods private now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3629#issuecomment-66588176
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24352/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3629#issuecomment-66588161
  
  [Test build #24352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24352/consoleFull)
 for   PR 3629 at commit 
[`34cfbe8`](https://github.com/apache/spark/commit/34cfbe8b309addb98deb23429626b14cb13a8e2a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-11 Thread Lewuathe
Github user Lewuathe commented on the pull request:

https://github.com/apache/spark/pull/3636#issuecomment-66588539
  
@jkbradley I updated. Could you check it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3672#issuecomment-66590225
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24356/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3672#issuecomment-66590223
  
  [Test build #24356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24356/consoleFull)
 for   PR 3672 at commit 
[`377532c`](https://github.com/apache/spark/commit/377532cdff819010aef1786f84c987eddb63af45).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] sum and avg on empty table should always...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3675#issuecomment-66592192
  
  [Test build #24357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24357/consoleFull)
 for   PR 3675 at commit 
[`42df763`](https://github.com/apache/spark/commit/42df76399a9f815ddd235273d32ebfaafcc7c2fe).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] sum and avg on empty table should always...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3675#issuecomment-66592197
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24357/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66592619
  
  [Test build #24360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24360/consoleFull)
 for   PR 2405 at commit 
[`b016a81`](https://github.com/apache/spark/commit/b016a81cc89d04ef3cb535f9a39ffdb26eaa32d7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66592751
  
  [Test build #24360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24360/consoleFull)
 for   PR 2405 at commit 
[`b016a81`](https://github.com/apache/spark/commit/b016a81cc89d04ef3cb535f9a39ffdb26eaa32d7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedGetField(child: Expression, fieldName: String) 
extends UnaryExpression `
  * `case class StructGetField(child: Expression, field: StructField, 
ordinal: Int) extends UnaryExpression `
  * `case class ArrayGetField(child: Expression, field: StructField, 
ordinal: Int, containsNull: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66592755
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24360/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66593009
  
Hi @marmbrus @liancheng, I have updated this PR to support `GetField` on 
one level of array of struct for now. As I mentioned in 
https://github.com/apache/spark/pull/2543, resolve `GetFiled` during analyze 
phase make things easy such as this PR. Please let me know if you think 
something is wrong here. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66593677
  
  [Test build #24362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24362/consoleFull)
 for   PR 2405 at commit 
[`6e9f94b`](https://github.com/apache/spark/commit/6e9f94bab93c95f90ce790fce3b11d15d4dd1ad3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66593703
  
  [Test build #24361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24361/consoleFull)
 for   PR 3505 at commit 
[`af7eb71`](https://github.com/apache/spark/commit/af7eb714ab9916a628859682b8cbf9c4c2396029).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66594023
  
  [Test build #24362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24362/consoleFull)
 for   PR 2405 at commit 
[`6e9f94b`](https://github.com/apache/spark/commit/6e9f94bab93c95f90ce790fce3b11d15d4dd1ad3).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedGetField(child: Expression, fieldName: String) 
extends UnaryExpression `
  * `case class StructGetField(child: Expression, field: StructField, 
ordinal: Int)`
  * `case class ArrayGetField(child: Expression, field: StructField, 
ordinal: Int, containsNull: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66594025
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24362/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4825] [SQL] CTAS fails to resolve when ...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3673#issuecomment-66594222
  
  [Test build #24359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24359/consoleFull)
 for   PR 3673 at commit 
[`e8cbd56`](https://github.com/apache/spark/commit/e8cbd561beb2476eb810ff2c7f5dadbae49cdadf).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4825] [SQL] CTAS fails to resolve when ...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3673#issuecomment-66594228
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24359/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-11 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66594423
  
I removed my changes of the `join` methods. Now it only adds new 
`skewedJoin` methods, and users need to call them explicitly.

 Other DSLs on top of Spark core like Pig, Hive, and Scalding.

A great point I never thought.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3662#issuecomment-66595143
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24358/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3662#issuecomment-66595135
  
  [Test build #24358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24358/consoleFull)
 for   PR 3662 at commit 
[`411b287`](https://github.com/apache/spark/commit/411b28709b55cfa94ebd04ced6d67df997ebf467).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4818][Core] Add 'iterator' to reduce me...

2014-12-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3671#discussion_r21668908
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -493,9 +493,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   def leftOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner): 
RDD[(K, (V, Option[W]))] = {
 this.cogroup(other, partitioner).flatMapValues { pair =
   if (pair._2.isEmpty) {
-pair._1.map(v = (v, None))
+pair._1.iterator.map(v = (v, None): (V, Option[W]))
--- End diff --

Interesting, are these types required? or can it be limited to just 
changing `None` to `None: Option[W]`?
Not that it hurts to spell out the types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66603178
  
  [Test build #24361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24361/consoleFull)
 for   PR 3505 at commit 
[`af7eb71`](https://github.com/apache/spark/commit/af7eb714ab9916a628859682b8cbf9c4c2396029).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ChunkBuffer[T: ClassTag](parameters: ChunkParameters)`
  * `class ExternalOrderingAppendOnlyMap[K, V, C](`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66603180
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24361/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66604421
  
  [Test build #24363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24363/consoleFull)
 for   PR 2405 at commit 
[`fa0d2c7`](https://github.com/apache/spark/commit/fa0d2c78aba12201098e3e4db2d0fda9e357d0bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4818][Core] Add 'iterator' to reduce me...

2014-12-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/3671#discussion_r21670105
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -493,9 +493,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   def leftOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner): 
RDD[(K, (V, Option[W]))] = {
 this.cogroup(other, partitioner).flatMapValues { pair =
   if (pair._2.isEmpty) {
-pair._1.map(v = (v, None))
+pair._1.iterator.map(v = (v, None): (V, Option[W]))
--- End diff --

 None to None: Option[W]

Have tried. But not work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...

2014-12-11 Thread adrian-wang
GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3676

[SPARK-4829] [SQL] add rule to fold count(expr) if expr is not null



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark countexpr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3676


commit dc5765b1cf01553bcf2a24fee2b8447c951cd3ed
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-12-11T08:57:05Z

add rule to fold count(expr) if expr is not null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3676#issuecomment-66607006
  
  [Test build #24364 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24364/consoleFull)
 for   PR 3676 at commit 
[`dc5765b`](https://github.com/apache/spark/commit/dc5765b1cf01553bcf2a24fee2b8447c951cd3ed).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66608522
  
  [Test build #24365 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24365/consoleFull)
 for   PR 1269 at commit 
[`4ac42d1`](https://github.com/apache/spark/commit/4ac42d1dad85593b1f05c02b2a2b48080abaaa05).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66611133
  
  [Test build #24366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24366/consoleFull)
 for   PR 1269 at commit 
[`e5f4a7b`](https://github.com/apache/spark/commit/e5f4a7b54d0cf7e73c0f567084439216a34fe9bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66611226
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24366/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66611223
  
  [Test build #24366 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24366/consoleFull)
 for   PR 1269 at commit 
[`e5f4a7b`](https://github.com/apache/spark/commit/e5f4a7b54d0cf7e73c0f567084439216a34fe9bd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DocumentParameters(val document: Document,`
  * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize 
: Int)`
  * `class PLSA(@transient protected val sc: SparkContext,`
  * `class RobustDocumentParameters(document: Document,`
  * `class RobustGlobalParameters(phi : Array[Array[Float]],`
  * `class RobustPLSA(@transient protected val sc: SparkContext,`
  * `trait DocumentOverTopicDistributionRegularizer extends Serializable 
with MatrixInPlaceModification `
  * `trait TopicsRegularizer extends MatrixInPlaceModification `
  * `class UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer `
  * `class UniformTopicRegularizer extends TopicsRegularizer `
  * `class Document(val tokens: SparseVector[Int]) extends Serializable `
  * `class TokenEnumerator extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66611502
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24363/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-66611497
  
  [Test build #24363 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24363/consoleFull)
 for   PR 2405 at commit 
[`fa0d2c7`](https://github.com/apache/spark/commit/fa0d2c78aba12201098e3e4db2d0fda9e357d0bd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedGetField(child: Expression, fieldName: String) 
extends UnaryExpression `
  * `trait GetField extends UnaryExpression `
  * `case class StructGetField(child: Expression, field: StructField, 
ordinal: Int)`
  * `case class ArrayGetField(child: Expression, field: StructField, 
ordinal: Int, containsNull: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66612337
  
  [Test build #24367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24367/consoleFull)
 for   PR 1269 at commit 
[`8e953e7`](https://github.com/apache/spark/commit/8e953e7d378fe012b2c3364b9cc570cd1af57f0e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3672#issuecomment-66612787
  
Could you please also add a section in the SQL programming guide page to 
introduce how to enable the HTTP mode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3672#discussion_r21673560
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -121,15 +127,28 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 val warehousePath = getTempFilePath(warehouse)
 val metastorePath = getTempFilePath(metastore)
 val metastoreJdbcUri = 
sjdbc:derby:;databaseName=$metastorePath;create=true
+
 val command =
-  s$startScript
- |  --master local
- |  --hiveconf hive.root.logger=INFO,console
- |  --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri
- |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
- |  --hiveconf 
${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost}
- |  --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_PORT}=$port
-   .stripMargin.split(\\s+).toSeq
+  if (httpMode){
--- End diff --

A space before `{`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread akopich
Github user akopich commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66613030
  
@jkbradley 

I moved Dirichlet to mllib/stats and added setters to `TokenEnumerator`.

BTW, why was it decided to use setter instead of constructors? We can set 
default parameter values in constructor... I don't contest the decision -- I'm 
just curious. 

As far as I can see, we've got only two things left -- scalastyle and 
testing against another open source project. 

I definitely can test it against [tm 
project](https://github.com/ispras/tm). 

Is it enough to run both implementation on the same data and obtain nearly 
the same perplexity values?   Is it necessary to add a unit-test for this? (It 
may be a headache, because tm was not tested against scala 2.10... ). 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3676#issuecomment-66613387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24364/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3676#issuecomment-66613382
  
  [Test build #24364 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24364/consoleFull)
 for   PR 3676 at commit 
[`dc5765b`](https://github.com/apache/spark/commit/dc5765b1cf01553bcf2a24fee2b8447c951cd3ed).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4458][Build] Eliminate compilation of t...

2014-12-11 Thread tdas
Github user tdas closed the pull request at:

https://github.com/apache/spark/pull/3324


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4458][Build] Eliminate compilation of t...

2014-12-11 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3324#issuecomment-66613560
  
Alright, I am going to close this PR then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-11 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-66614419
  
Hi, intersect, diff and foreach are all replaced with while-loop in the new 
commit to follow BLAS.dot pattern. Please see if there is any problem. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66616629
  
  [Test build #24365 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24365/consoleFull)
 for   PR 1269 at commit 
[`4ac42d1`](https://github.com/apache/spark/commit/4ac42d1dad85593b1f05c02b2a2b48080abaaa05).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DocumentParameters(val document: Document,`
  * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize 
: Int)`
  * `class PLSA(@transient protected val sc: SparkContext,`
  * `class RobustDocumentParameters(document: Document,`
  * `class RobustGlobalParameters(phi : Array[Array[Float]],`
  * `class RobustPLSA(@transient protected val sc: SparkContext,`
  * `trait DocumentOverTopicDistributionRegularizer extends Serializable 
with MatrixInPlaceModification `
  * `trait TopicsRegularizer extends MatrixInPlaceModification `
  * `class UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer `
  * `class UniformTopicRegularizer extends TopicsRegularizer `
  * `class Document(val tokens: SparseVector[Int]) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66616637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24365/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...

2014-12-11 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/3677

[SPARK-4526][MLLIB]Gradient should be added batch computing interface.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-4526

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3677


commit 1fd1d020cb05f3d7d09289d7ae64869dbea58695
Author: GuoQiang Li wi...@qq.com
Date:   2014-12-11T13:19:26Z

Gradient should be added batch computing interface.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3677#issuecomment-66617874
  
  [Test build #24368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24368/consoleFull)
 for   PR 3677 at commit 
[`1fd1d02`](https://github.com/apache/spark/commit/1fd1d020cb05f3d7d09289d7ae64869dbea58695).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66617927
  
  [Test build #24367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24367/consoleFull)
 for   PR 1269 at commit 
[`8e953e7`](https://github.com/apache/spark/commit/8e953e7d378fe012b2c3364b9cc570cd1af57f0e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DocumentParameters(val document: Document,`
  * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize 
: Int)`
  * `class PLSA(@transient protected val sc: SparkContext,`
  * `class RobustDocumentParameters(document: Document,`
  * `class RobustGlobalParameters(phi : Array[Array[Float]],`
  * `class RobustPLSA(@transient protected val sc: SparkContext,`
  * `trait DocumentOverTopicDistributionRegularizer extends Serializable 
with MatrixInPlaceModification `
  * `trait TopicsRegularizer extends MatrixInPlaceModification `
  * `class UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer `
  * `class UniformTopicRegularizer extends TopicsRegularizer `
  * `class Document(val tokens: SparseVector[Int]) extends Serializable `
  * `class TokenEnumerator extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66617936
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24367/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...

2014-12-11 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3677#issuecomment-66618650
  
cc /@mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66619759
  
  [Test build #24369 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24369/consoleFull)
 for   PR 1269 at commit 
[`c54afc9`](https://github.com/apache/spark/commit/c54afc96bb493143d9ce0484118a452ad8c7514d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

2014-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3653


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

2014-12-11 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3653#issuecomment-66626293
  
@JoshRosen I have addressed your final comments and merged it. Thank you 
very much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3653#issuecomment-66627047
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24370/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66627555
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24369/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66627541
  
  [Test build #24369 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24369/consoleFull)
 for   PR 1269 at commit 
[`c54afc9`](https://github.com/apache/spark/commit/c54afc96bb493143d9ce0484118a452ad8c7514d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DocumentParameters(val document: Document,`
  * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize 
: Int)`
  * `class PLSA(@transient protected val sc: SparkContext,`
  * `class RobustDocumentParameters(document: Document,`
  * `class RobustGlobalParameters(phi : Array[Array[Float]],`
  * `class RobustPLSA(@transient protected val sc: SparkContext,`
  * `trait DocumentOverTopicDistributionRegularizer extends Serializable 
with MatrixInPlaceModification `
  * `trait TopicsRegularizer extends MatrixInPlaceModification `
  * `class UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer `
  * `class UniformTopicRegularizer extends TopicsRegularizer `
  * `class Document(val tokens: SparseVector[Int]) extends Serializable `
  * `class TokenEnumerator extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3677#issuecomment-66627835
  
  [Test build #24368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24368/consoleFull)
 for   PR 3677 at commit 
[`1fd1d02`](https://github.com/apache/spark/commit/1fd1d020cb05f3d7d09289d7ae64869dbea58695).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3677#issuecomment-66627845
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24368/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66629318
  
  [Test build #24371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24371/consoleFull)
 for   PR 1269 at commit 
[`0764aaa`](https://github.com/apache/spark/commit/0764aaa9e8737c824ad0a71ec6ecb197476e2419).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-11 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3022#discussion_r21683030
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
 ---
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = 
BreezeMatrix}
+import breeze.linalg.{Transpose, det, inv}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors}
+import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
+import org.apache.spark.SparkContext.DoubleAccumulatorParam
+
+/**
+ * Expectation-Maximization for multivariate Gaussian Mixture Models.
+ * 
+ */
+object GMMExpectationMaximization {
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k)
+  .setMaxIterations(maxIterations)
+  .setDelta(delta)
+  .run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int): 
GaussianMixtureModel = {
+new 
GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).setDelta(delta).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   */
+  def train(data: RDD[Vector], k: Int): GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).run(data)
+  }
+}
+
+/**
+ * This class performs multivariate Gaussian expectation maximization.  It 
will 
+ * maximize the log-likelihood for a mixture of k Gaussians, iterating 
until
+ * the log-likelihood changes by less than delta, or until it has reached
+ * the max number of iterations.  
+ */
+class GMMExpectationMaximization private (
+private var k: Int, 
+private var delta: Double, 
+private var maxIterations: Int) extends Serializable {
+  
+  // Type aliases for convenience
+  private type DenseDoubleVector = BreezeVector[Double]
+  private type DenseDoubleMatrix = BreezeMatrix[Double]
+  
+  // number of samples per cluster to use when initializing Gaussians
+  private val nSamples = 5;
+  
+  // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood 
threshold
+  def this() = this(2, 0.01, 100)
+  
+  /** Set the number of Gaussians in the mixture model.  Default: 2 */
+  def setK(k: Int): this.type = {
+this.k = k
+this
+  }
+  
+  /** Set the maximum number of iterations to run. Default: 100 */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+  
+  /**
+   * Set the largest 

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-11 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request:

https://github.com/apache/spark/pull/3022#discussion_r21683119
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
 ---
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = 
BreezeMatrix}
+import breeze.linalg.{Transpose, det, inv}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors}
+import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
+import org.apache.spark.SparkContext.DoubleAccumulatorParam
+
+/**
+ * Expectation-Maximization for multivariate Gaussian Mixture Models.
+ * 
+ */
+object GMMExpectationMaximization {
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k)
+  .setMaxIterations(maxIterations)
+  .setDelta(delta)
+  .run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int): 
GaussianMixtureModel = {
+new 
GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).setDelta(delta).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   */
+  def train(data: RDD[Vector], k: Int): GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).run(data)
+  }
+}
+
+/**
+ * This class performs multivariate Gaussian expectation maximization.  It 
will 
+ * maximize the log-likelihood for a mixture of k Gaussians, iterating 
until
+ * the log-likelihood changes by less than delta, or until it has reached
+ * the max number of iterations.  
+ */
+class GMMExpectationMaximization private (
+private var k: Int, 
+private var delta: Double, 
+private var maxIterations: Int) extends Serializable {
+  
+  // Type aliases for convenience
+  private type DenseDoubleVector = BreezeVector[Double]
+  private type DenseDoubleMatrix = BreezeMatrix[Double]
+  
+  // number of samples per cluster to use when initializing Gaussians
+  private val nSamples = 5;
+  
+  // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood 
threshold
+  def this() = this(2, 0.01, 100)
+  
+  /** Set the number of Gaussians in the mixture model.  Default: 2 */
+  def setK(k: Int): this.type = {
+this.k = k
+this
+  }
+  
+  /** Set the maximum number of iterations to run. Default: 100 */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+  
+  /**
+   * Set the largest 

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-11 Thread tgaloppo
Github user tgaloppo commented on the pull request:

https://github.com/apache/spark/pull/3022#issuecomment-66636308
  
@jkbradley Thank you for your comments.  I am working to resolve these 
issues and will push these changes in a day or two.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Do not include SPARK_CLASSPATH if empty

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3678#issuecomment-66638939
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...

2014-12-11 Thread jbencook
GitHub user jbencook opened a pull request:

https://github.com/apache/spark/pull/3679

[SPARK-2980][mllib] testing the Chi-squared hypothesis test

This PR tests the pyspark Chi-squared hypothesis test from this commit: 
c8abddc5164d8cf11cdede6ab3d5d1ea08028708 and moves some of the error messaging 
in to python.

It is a port of the Scala tests here: 
[HypothesisTestSuite.scala](https://github.com/apache/spark/blob/master/mllib/src/test/scala/orgapache/spark/mllib/stat/HypothesisTestSuite.scala)

Hopefully, SPARK-2980 can be closed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbencook/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3679


commit 3aeb0d91007960f33076b6e6775944bb9d81ead8
Author: jbencook jbenjaminc...@gmail.com
Date:   2014-12-11T15:44:08Z

[SPARK-2980][mllib] bringing Chi-squared error messages to the python side

commit a17ee843185bdb1ee96574712450243d112fbce6
Author: jbencook jbenjaminc...@gmail.com
Date:   2014-12-11T15:44:34Z

[SPARK-2980][mllib] adding unit tests for the pyspark chi-squared test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3679#issuecomment-66639796
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Do not include SPARK_CLASSPATH if empty

2014-12-11 Thread darabos
GitHub user darabos opened a pull request:

https://github.com/apache/spark/pull/3678

Do not include SPARK_CLASSPATH if empty

My guess for fixing https://issues.apache.org/jira/browse/SPARK-4831.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/darabos/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3678


commit 36e12437a6cfd3eab1568ca50a5b8fc26ed275c1
Author: Daniel Darabos darabos.dan...@gmail.com
Date:   2014-12-11T15:49:23Z

Do not include SPARK_CLASSPATH if empty.

Adding an empty string to the classpath adds the current directory.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66642598
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24371/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66642585
  
  [Test build #24371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24371/consoleFull)
 for   PR 1269 at commit 
[`0764aaa`](https://github.com/apache/spark/commit/0764aaa9e8737c824ad0a71ec6ecb197476e2419).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DocumentParameters(val document: Document,`
  * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize 
: Int)`
  * `class PLSA(@transient protected val sc: SparkContext,`
  * `class RobustDocumentParameters(document: Document,`
  * `class RobustGlobalParameters(phi : Array[Array[Float]],`
  * `class RobustPLSA(@transient protected val sc: SparkContext,`
  * `trait DocumentOverTopicDistributionRegularizer extends Serializable 
with MatrixInPlaceModification `
  * `trait TopicsRegularizer extends MatrixInPlaceModification `
  * `class UniformDocumentOverTopicRegularizer extends 
DocumentOverTopicDistributionRegularizer `
  * `class UniformTopicRegularizer extends TopicsRegularizer `
  * `class Document(val tokens: SparseVector[Int]) extends Serializable `
  * `class TokenEnumerator extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4476][SQL] Use MapType for dict in...

2014-12-11 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/3406#issuecomment-66643222
  
Yeah, I am sorry I have not got a change to continue my work. I need to 
finish the unit test part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3575][SQL] Removes the Metastore Parque...

2014-12-11 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3441#discussion_r21688244
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -81,9 +80,27 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 }
 
   // Since HiveQL is case insensitive for table names we make them all 
lowercase.
-  MetastoreRelation(
+  val relation = MetastoreRelation(
 databaseName, tblName, alias)(
   table.getTTable, partitions.map(part = 
part.getTPartition))(hive)
+
+  if (hive.convertMetastoreParquet 
+
relation.tableDesc.getSerdeClassName.toLowerCase.contains(parquet)) {
+val path = if (relation.hiveQlTable.isPartitioned) {
+  partitions.map(_.getLocation).mkString(,)
--- End diff --

Yea, forgot that in case of `MetastoreRelation` partition pruning is done 
within `HiveTableScan`... I'll add a WIP tag to this PR and add back partition 
pruning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...

2014-12-11 Thread ilganeli
Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/3518#issuecomment-66658459
  
Hi @JoshRosen - with the updates I've made is this ok to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3022#discussion_r21695326
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
 ---
@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = 
BreezeMatrix}
+import breeze.linalg.{Transpose, det, inv}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors}
+import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
+import org.apache.spark.SparkContext.DoubleAccumulatorParam
+
+/**
+ * Expectation-Maximization for multivariate Gaussian Mixture Models.
+ * 
+ */
+object GMMExpectationMaximization {
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k)
+  .setMaxIterations(maxIterations)
+  .setDelta(delta)
+  .run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int): 
GaussianMixtureModel = {
+new 
GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).setDelta(delta).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stored as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   */
+  def train(data: RDD[Vector], k: Int): GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).run(data)
+  }
+}
+
+/**
+ * This class performs multivariate Gaussian expectation maximization.  It 
will 
+ * maximize the log-likelihood for a mixture of k Gaussians, iterating 
until
+ * the log-likelihood changes by less than delta, or until it has reached
+ * the max number of iterations.  
+ */
+class GMMExpectationMaximization private (
+private var k: Int, 
+private var delta: Double, 
+private var maxIterations: Int) extends Serializable {
+  
+  // Type aliases for convenience
+  private type DenseDoubleVector = BreezeVector[Double]
+  private type DenseDoubleMatrix = BreezeMatrix[Double]
+  
+  // number of samples per cluster to use when initializing Gaussians
+  private val nSamples = 5;
+  
+  // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood 
threshold
+  def this() = this(2, 0.01, 100)
+  
+  /** Set the number of Gaussians in the mixture model.  Default: 2 */
+  def setK(k: Int): this.type = {
+this.k = k
+this
+  }
+  
+  /** Set the maximum number of iterations to run. Default: 100 */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+  
+  /**
+   * Set the 

[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...

2014-12-11 Thread tylerprete
Github user tylerprete commented on the pull request:

https://github.com/apache/spark/pull/2872#issuecomment-5398
  
@jontg I'm using this patch with your modifications (private_ip_address), 
but I'm getting the following errors when the script tries and starts the 
master:

SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: 
ip-10-0-2-213: ip-10-0-2-213

10.0.2.213 is the master's ip in this case, but it looks like it's picking 
up ip-10-0-2-213 as the hostname and that isn't resolving. Did you run into 
anything like this, and if so, how'd you resolve it?

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4728] Add exponential, gamma, and log n...

2014-12-11 Thread rnowling
GitHub user rnowling opened a pull request:

https://github.com/apache/spark/pull/3680

[SPARK-4728] Add exponential, gamma, and log normal sampling to MLlib da...

...ta generators

This patch adds:

* Exponential, gamma, and log normal generators that wrap Apache Commons 
math3 to the private API
* Functions for generating exponential, gamma, and log normal RDDs and 
vector RDDs
* Tests for the above

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rnowling/spark spark4728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3680


commit 9f96232a675ae0850275347c3cc9bd69676df5af
Author: RJ Nowling rnowl...@gmail.com
Date:   2014-12-11T18:31:38Z

[SPARK-4728] Add exponential, gamma, and log normal sampling to MLlib data 
generators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

2014-12-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3651#discussion_r21697554
  
--- Diff: yarn/pom.xml ---
@@ -152,6 +147,15 @@
   /environmentVariables
 /configuration
   /plugin
+  plugin
+groupIdorg.apache.maven.plugins/groupId
+artifactIdmaven-surefire-plugin/artifactId
+configuration
+  environmentVariables
+SPARK_HOME${basedir}/../../SPARK_HOME
--- End diff --

OK, in the name of keeping it simple I might not touch this this time. 
Since this occurs 2 places only, it doesn't save much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4728][MLLib] Add exponential, gamma, an...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3680#issuecomment-7105
  
  [Test build #24372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24372/consoleFull)
 for   PR 3680 at commit 
[`9f96232`](https://github.com/apache/spark/commit/9f96232a675ae0850275347c3cc9bd69676df5af).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4740][investigation-only] Disable trans...

2014-12-11 Thread rxin
Github user rxin closed the pull request at:

https://github.com/apache/spark/pull/3667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4740][investigation-only] Disable trans...

2014-12-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3667#issuecomment-8233
  
Alright closing now since transferTo isn't the issue at all.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

2014-12-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3651#discussion_r21699001
  
--- Diff: pom.xml ---
@@ -941,19 +950,38 @@
 forktrue/fork
   /configuration
 /plugin
+!-- Surefire runs all Java tests --
 plugin
   groupIdorg.apache.maven.plugins/groupId
   artifactIdmaven-surefire-plugin/artifactId
-  version2.17/version
+  version2.18/version
+  !-- Note config is repeated in scalatest config --
   configuration
-!-- Uses scalatest instead --
-skipTeststrue/skipTests
+includes
+  include**/Test*.java/include
+  include**/*Test.java/include
+  include**/*TestCase.java/include
+  include**/*Suite.java/include
+/includes
+
reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
+argLine-Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=512m/argLine
+systemProperties
+  java.awt.headlesstrue/java.awt.headless
+  
spark.test.home${session.executionRootDirectory}/spark.test.home
+  spark.testing1/spark.testing
+  spark.ui.enabledfalse/spark.ui.enabled
+  
spark.ui.showConsoleProgressfalse/spark.ui.showConsoleProgress
+  
spark.executor.extraClassPath${test_classpath}/spark.executor.extraClassPath
+  
spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
+/systemProperties
   /configuration
 /plugin
+!-- Scalatest runs all Scala tests --
 plugin
   groupIdorg.scalatest/groupId
   artifactIdscalatest-maven-plugin/artifactId
   version1.0/version
+  !-- Note config is repeated in surefire config --
   configuration
 
reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
--- End diff --

No, the files underneath are named by test suite, so they won't collide. I 
double-checked just now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

2014-12-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-9398
  
Good point about `log4j.appender.file.append=false`. It looks like the 
Scala tests overwrite. Hm, why not set append to `true` indeed? it's in 
`target`, so gets deleted by `clean`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PySpark] Fix tests with Python 2.6 in 1.0 bra...

2014-12-11 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3668#issuecomment-66670110
  
Yeah branch 0.9 is also having the same problem. I haven't looked deep into 
the issue yet but maybe @shaneknapp has a better idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4754] Refactor SparkContext into Execut...

2014-12-11 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3614#issuecomment-66670276
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4754] Refactor SparkContext into Execut...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3614#issuecomment-66670974
  
  [Test build #24373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24373/consoleFull)
 for   PR 3614 at commit 
[`187070d`](https://github.com/apache/spark/commit/187070d22b629a783203aa9d5013b4d38b769ca2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4728][MLLib] Add exponential, gamma, an...

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3680#issuecomment-66676689
  
  [Test build #24374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24374/consoleFull)
 for   PR 3680 at commit 
[`84fd98d`](https://github.com/apache/spark/commit/84fd98d6b1e625e1c143bf16fccbf91ff2040d08).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4808] Remove Spillable minimum threshol...

2014-12-11 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3656#issuecomment-66676807
  
@lawlerd things are done this way because estimating the size for every 
record would be prohibitively expensive.  Also, the trackMemoryThreshold is 
required at least until we figure out a solution for SPARK-4452.  Without it, 
when there are multiple shuffle data structures in a thread and the first takes 
a bunch of memory, the second ends up spilling on every record (this was a 
blocker for 1.2).

Your concern of course is valid - that we're not tracking memory 100% 
accurately.  One response to this is that we're conservative with.  E.g. we 
only use up to spark.shuffle.safetyFraction (default 80%) of the available 
shuffle memory.

One improvement that might make sense would be to do the sampling based on 
memory size rather than number of records.  So if we notice that records are 
larger we would sample more frequently and maybe adjust the 
trackMemoryThreshold.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >