[GitHub] spark pull request: [SQL] sum and avg on empty table should always...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3675 [SQL] sum and avg on empty table should always return null So the optimizations are not valid. Also I think the optimization here is rarely encounter, so removing them will not have influence on performance. I'll create JIRA after jira is back. Can we merge #3445 before I add a comparison test case from this? You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark sumempty Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3675.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3675 commit 42df76399a9f815ddd235273d32ebfaafcc7c2fe Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-12-11T08:07:54Z sum and avg on empty table should always return null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] sum and avg on empty table should always...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3675#issuecomment-66585913 [Test build #24357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24357/consoleFull) for PR 3675 at commit [`42df763`](https://github.com/apache/spark/commit/42df76399a9f815ddd235273d32ebfaafcc7c2fe). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21662905 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala --- @@ -85,10 +86,22 @@ private[hive] class HiveThriftServer2(hiveContext: HiveContext) setSuperField(this, cliService, sparkSqlCliService) addService(sparkSqlCliService) -val thriftCliService = new ThriftBinaryCLIService(sparkSqlCliService) -setSuperField(this, thriftCLIService, thriftCliService) -addService(thriftCliService) +if (isHTTPTransportMode(hiveConf)){ + val thriftCliService = new ThriftHttpCLIService(sparkSqlCliService) + setSuperField(this, thriftCLIService, thriftCliService) + addService(thriftCliService) +} else { + val thriftCliService = new ThriftBinaryCLIService(sparkSqlCliService) + setSuperField(this, thriftCLIService, thriftCliService) + addService(thriftCliService) +} initCompositeService(hiveConf) } + + private def isHTTPTransportMode(hiveConf: HiveConf): Boolean = { +val transportMode: String = hiveConf.getVar(ConfVars.HIVE_SERVER2_TRANSPORT_MODE) +return transportMode.equalsIgnoreCase(http) --- End diff -- In Scala we don't need `return` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21662926 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -70,11 +70,16 @@ class HiveThriftServer2Suite extends FunSuite with Logging { port } - def withJdbcStatement(serverStartTimeout: FiniteDuration = 1.minute)(f: Statement = Unit) { + def withJdbcStatement(serverStartTimeout: FiniteDuration = 1.minute, httpMode: Boolean = false)(f: Statement = Unit) { val port = randomListeningPort -startThriftServer(port, serverStartTimeout) { - val jdbcUri = sjdbc:hive2://${localhost}:$port/ +startThriftServer(port, serverStartTimeout, httpMode) { + val jdbcUri = if (httpMode) { + sjdbc:hive2://${localhost}:$port/default?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice --- End diff -- 100 columns exceeded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21662942 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -113,7 +118,8 @@ class HiveThriftServer2Suite extends FunSuite with Logging { def startThriftServer( port: Int, - serverStartTimeout: FiniteDuration = 1.minute)( + serverStartTimeout: FiniteDuration = 1.minute, + httpMode: Boolean = false )( --- End diff -- Please remove the space before `)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3662#issuecomment-66586675 [Test build #24358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24358/consoleFull) for PR 3662 at commit [`411b287`](https://github.com/apache/spark/commit/411b28709b55cfa94ebd04ced6d67df997ebf467). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21663017 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -121,15 +127,28 @@ class HiveThriftServer2Suite extends FunSuite with Logging { val warehousePath = getTempFilePath(warehouse) val metastorePath = getTempFilePath(metastore) val metastoreJdbcUri = sjdbc:derby:;databaseName=$metastorePath;create=true + val command = - s$startScript - | --master local - | --hiveconf hive.root.logger=INFO,console - | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri - | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath - | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost} - | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_PORT}=$port - .stripMargin.split(\\s+).toSeq + if (httpMode){ + s$startScript + | --master local + | --hiveconf hive.root.logger=INFO,console + | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri + | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath + | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost} + | --hiveconf ${ConfVars.HIVE_SERVER2_TRANSPORT_MODE}=${http} --- End diff -- The `${...}` wrapper is not needed in this line as well as the line above. It's safe to use double quotes within `...`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21663058 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -121,15 +127,28 @@ class HiveThriftServer2Suite extends FunSuite with Logging { val warehousePath = getTempFilePath(warehouse) val metastorePath = getTempFilePath(metastore) val metastoreJdbcUri = sjdbc:derby:;databaseName=$metastorePath;create=true + val command = - s$startScript - | --master local - | --hiveconf hive.root.logger=INFO,console - | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri - | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath - | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost} - | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_PORT}=$port - .stripMargin.split(\\s+).toSeq + if (httpMode){ + s$startScript + | --master local + | --hiveconf hive.root.logger=INFO,console + | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri + | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath + | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost} + | --hiveconf ${ConfVars.HIVE_SERVER2_TRANSPORT_MODE}=${http} + | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_HTTP_PORT}=$port +.stripMargin.split(\\s+).toSeq + } else { + s$startScript + | --master local + | --hiveconf hive.root.logger=INFO,console + | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri + | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath + | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost} --- End diff -- Ah, I see, the original code uses a redundant `${...}` wrapper, please help removing this one :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21663078 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -217,6 +236,25 @@ class HiveThriftServer2Suite extends FunSuite with Logging { } } + test(Test JDBC query execution in Http Mode) { +withJdbcStatement( httpMode = true ) { statement = --- End diff -- Please remove spaces after `(` and before `)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4827][SQL] Fix resolution of deeply nes...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3674#issuecomment-66587059 [Test build #24355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24355/consoleFull) for PR 3674 at commit [`d83d6a1`](https://github.com/apache/spark/commit/d83d6a150d85bc9033742256c518e08770296371). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21663114 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -267,6 +305,14 @@ class HiveThriftServer2Suite extends FunSuite with Logging { } } + test(Checks Hive version in Http Mode) { +withJdbcStatement( httpMode = true ) { statement = --- End diff -- Remove the extra spaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4825] [SQL] CTAS fails to resolve when ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3673#issuecomment-66587032 [Test build #24359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24359/consoleFull) for PR 3673 at commit [`e8cbd56`](https://github.com/apache/spark/commit/e8cbd561beb2476eb810ff2c7f5dadbae49cdadf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3672#issuecomment-66587060 This LGTM except for several minor styling issue. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4827][SQL] Fix resolution of deeply nes...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3674#issuecomment-66587064 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24355/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3672#issuecomment-66587346 One more thing, please rename the PR title to [SQL] SPARK-4700: You can find names of all valid Spark components from the JIRA. (Couldn't provide a URL right now because JIRA is reindexing...) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-66587933 Thanks for that. I add new commit to make the methods private now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-66588176 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24352/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-66588161 [Test build #24352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24352/consoleFull) for PR 3629 at commit [`34cfbe8`](https://github.com/apache/spark/commit/34cfbe8b309addb98deb23429626b14cb13a8e2a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user Lewuathe commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-66588539 @jkbradley I updated. Could you check it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3672#issuecomment-66590225 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24356/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3672#issuecomment-66590223 [Test build #24356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24356/consoleFull) for PR 3672 at commit [`377532c`](https://github.com/apache/spark/commit/377532cdff819010aef1786f84c987eddb63af45). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] sum and avg on empty table should always...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3675#issuecomment-66592192 [Test build #24357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24357/consoleFull) for PR 3675 at commit [`42df763`](https://github.com/apache/spark/commit/42df76399a9f815ddd235273d32ebfaafcc7c2fe). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] sum and avg on empty table should always...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3675#issuecomment-66592197 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24357/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66592619 [Test build #24360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24360/consoleFull) for PR 2405 at commit [`b016a81`](https://github.com/apache/spark/commit/b016a81cc89d04ef3cb535f9a39ffdb26eaa32d7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66592751 [Test build #24360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24360/consoleFull) for PR 2405 at commit [`b016a81`](https://github.com/apache/spark/commit/b016a81cc89d04ef3cb535f9a39ffdb26eaa32d7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedGetField(child: Expression, fieldName: String) extends UnaryExpression ` * `case class StructGetField(child: Expression, field: StructField, ordinal: Int) extends UnaryExpression ` * `case class ArrayGetField(child: Expression, field: StructField, ordinal: Int, containsNull: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66592755 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24360/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66593009 Hi @marmbrus @liancheng, I have updated this PR to support `GetField` on one level of array of struct for now. As I mentioned in https://github.com/apache/spark/pull/2543, resolve `GetFiled` during analyze phase make things easy such as this PR. Please let me know if you think something is wrong here. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66593677 [Test build #24362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24362/consoleFull) for PR 2405 at commit [`6e9f94b`](https://github.com/apache/spark/commit/6e9f94bab93c95f90ce790fce3b11d15d4dd1ad3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3505#issuecomment-66593703 [Test build #24361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24361/consoleFull) for PR 3505 at commit [`af7eb71`](https://github.com/apache/spark/commit/af7eb714ab9916a628859682b8cbf9c4c2396029). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66594023 [Test build #24362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24362/consoleFull) for PR 2405 at commit [`6e9f94b`](https://github.com/apache/spark/commit/6e9f94bab93c95f90ce790fce3b11d15d4dd1ad3). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedGetField(child: Expression, fieldName: String) extends UnaryExpression ` * `case class StructGetField(child: Expression, field: StructField, ordinal: Int)` * `case class ArrayGetField(child: Expression, field: StructField, ordinal: Int, containsNull: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66594025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24362/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4825] [SQL] CTAS fails to resolve when ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3673#issuecomment-66594222 [Test build #24359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24359/consoleFull) for PR 3673 at commit [`e8cbd56`](https://github.com/apache/spark/commit/e8cbd561beb2476eb810ff2c7f5dadbae49cdadf). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4825] [SQL] CTAS fails to resolve when ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3673#issuecomment-66594228 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24359/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3505#issuecomment-66594423 I removed my changes of the `join` methods. Now it only adds new `skewedJoin` methods, and users need to call them explicitly. Other DSLs on top of Spark core like Pig, Hive, and Scalding. A great point I never thought. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3662#issuecomment-66595143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24358/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3662#issuecomment-66595135 [Test build #24358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24358/consoleFull) for PR 3662 at commit [`411b287`](https://github.com/apache/spark/commit/411b28709b55cfa94ebd04ced6d67df997ebf467). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4818][Core] Add 'iterator' to reduce me...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3671#discussion_r21668908 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -493,9 +493,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def leftOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, Option[W]))] = { this.cogroup(other, partitioner).flatMapValues { pair = if (pair._2.isEmpty) { -pair._1.map(v = (v, None)) +pair._1.iterator.map(v = (v, None): (V, Option[W])) --- End diff -- Interesting, are these types required? or can it be limited to just changing `None` to `None: Option[W]`? Not that it hurts to spell out the types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3505#issuecomment-66603178 [Test build #24361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24361/consoleFull) for PR 3505 at commit [`af7eb71`](https://github.com/apache/spark/commit/af7eb714ab9916a628859682b8cbf9c4c2396029). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ChunkBuffer[T: ClassTag](parameters: ChunkParameters)` * `class ExternalOrderingAppendOnlyMap[K, V, C](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3505#issuecomment-66603180 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24361/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66604421 [Test build #24363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24363/consoleFull) for PR 2405 at commit [`fa0d2c7`](https://github.com/apache/spark/commit/fa0d2c78aba12201098e3e4db2d0fda9e357d0bd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4818][Core] Add 'iterator' to reduce me...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/3671#discussion_r21670105 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -493,9 +493,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) def leftOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, Option[W]))] = { this.cogroup(other, partitioner).flatMapValues { pair = if (pair._2.isEmpty) { -pair._1.map(v = (v, None)) +pair._1.iterator.map(v = (v, None): (V, Option[W])) --- End diff -- None to None: Option[W] Have tried. But not work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3676 [SPARK-4829] [SQL] add rule to fold count(expr) if expr is not null You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark countexpr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3676 commit dc5765b1cf01553bcf2a24fee2b8447c951cd3ed Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-12-11T08:57:05Z add rule to fold count(expr) if expr is not null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3676#issuecomment-66607006 [Test build #24364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24364/consoleFull) for PR 3676 at commit [`dc5765b`](https://github.com/apache/spark/commit/dc5765b1cf01553bcf2a24fee2b8447c951cd3ed). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66608522 [Test build #24365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24365/consoleFull) for PR 1269 at commit [`4ac42d1`](https://github.com/apache/spark/commit/4ac42d1dad85593b1f05c02b2a2b48080abaaa05). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66611133 [Test build #24366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24366/consoleFull) for PR 1269 at commit [`e5f4a7b`](https://github.com/apache/spark/commit/e5f4a7b54d0cf7e73c0f567084439216a34fe9bd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66611226 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24366/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66611223 [Test build #24366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24366/consoleFull) for PR 1269 at commit [`e5f4a7b`](https://github.com/apache/spark/commit/e5f4a7b54d0cf7e73c0f567084439216a34fe9bd). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DocumentParameters(val document: Document,` * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)` * `class PLSA(@transient protected val sc: SparkContext,` * `class RobustDocumentParameters(document: Document,` * `class RobustGlobalParameters(phi : Array[Array[Float]],` * `class RobustPLSA(@transient protected val sc: SparkContext,` * `trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification ` * `trait TopicsRegularizer extends MatrixInPlaceModification ` * `class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer ` * `class UniformTopicRegularizer extends TopicsRegularizer ` * `class Document(val tokens: SparseVector[Int]) extends Serializable ` * `class TokenEnumerator extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66611502 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24363/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-66611497 [Test build #24363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24363/consoleFull) for PR 2405 at commit [`fa0d2c7`](https://github.com/apache/spark/commit/fa0d2c78aba12201098e3e4db2d0fda9e357d0bd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedGetField(child: Expression, fieldName: String) extends UnaryExpression ` * `trait GetField extends UnaryExpression ` * `case class StructGetField(child: Expression, field: StructField, ordinal: Int)` * `case class ArrayGetField(child: Expression, field: StructField, ordinal: Int, containsNull: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66612337 [Test build #24367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24367/consoleFull) for PR 1269 at commit [`8e953e7`](https://github.com/apache/spark/commit/8e953e7d378fe012b2c3364b9cc570cd1af57f0e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3672#issuecomment-66612787 Could you please also add a section in the SQL programming guide page to introduce how to enable the HTTP mode? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SparkSQL, Thrift] SPARK-4700: Add HTTP protoc...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3672#discussion_r21673560 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -121,15 +127,28 @@ class HiveThriftServer2Suite extends FunSuite with Logging { val warehousePath = getTempFilePath(warehouse) val metastorePath = getTempFilePath(metastore) val metastoreJdbcUri = sjdbc:derby:;databaseName=$metastorePath;create=true + val command = - s$startScript - | --master local - | --hiveconf hive.root.logger=INFO,console - | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$metastoreJdbcUri - | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath - | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=${localhost} - | --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_PORT}=$port - .stripMargin.split(\\s+).toSeq + if (httpMode){ --- End diff -- A space before `{` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66613030 @jkbradley I moved Dirichlet to mllib/stats and added setters to `TokenEnumerator`. BTW, why was it decided to use setter instead of constructors? We can set default parameter values in constructor... I don't contest the decision -- I'm just curious. As far as I can see, we've got only two things left -- scalastyle and testing against another open source project. I definitely can test it against [tm project](https://github.com/ispras/tm). Is it enough to run both implementation on the same data and obtain nearly the same perplexity values? Is it necessary to add a unit-test for this? (It may be a headache, because tm was not tested against scala 2.10... ). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3676#issuecomment-66613387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24364/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4829] [SQL] add rule to fold count(expr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3676#issuecomment-66613382 [Test build #24364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24364/consoleFull) for PR 3676 at commit [`dc5765b`](https://github.com/apache/spark/commit/dc5765b1cf01553bcf2a24fee2b8447c951cd3ed). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4458][Build] Eliminate compilation of t...
Github user tdas closed the pull request at: https://github.com/apache/spark/pull/3324 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4458][Build] Eliminate compilation of t...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3324#issuecomment-66613560 Alright, I am going to close this PR then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-66614419 Hi, intersect, diff and foreach are all replaced with while-loop in the new commit to follow BLAS.dot pattern. Please see if there is any problem. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66616629 [Test build #24365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24365/consoleFull) for PR 1269 at commit [`4ac42d1`](https://github.com/apache/spark/commit/4ac42d1dad85593b1f05c02b2a2b48080abaaa05). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DocumentParameters(val document: Document,` * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)` * `class PLSA(@transient protected val sc: SparkContext,` * `class RobustDocumentParameters(document: Document,` * `class RobustGlobalParameters(phi : Array[Array[Float]],` * `class RobustPLSA(@transient protected val sc: SparkContext,` * `trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification ` * `trait TopicsRegularizer extends MatrixInPlaceModification ` * `class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer ` * `class UniformTopicRegularizer extends TopicsRegularizer ` * `class Document(val tokens: SparseVector[Int]) extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66616637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24365/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/3677 [SPARK-4526][MLLIB]Gradient should be added batch computing interface. You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-4526 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3677 commit 1fd1d020cb05f3d7d09289d7ae64869dbea58695 Author: GuoQiang Li wi...@qq.com Date: 2014-12-11T13:19:26Z Gradient should be added batch computing interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3677#issuecomment-66617874 [Test build #24368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24368/consoleFull) for PR 3677 at commit [`1fd1d02`](https://github.com/apache/spark/commit/1fd1d020cb05f3d7d09289d7ae64869dbea58695). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66617927 [Test build #24367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24367/consoleFull) for PR 1269 at commit [`8e953e7`](https://github.com/apache/spark/commit/8e953e7d378fe012b2c3364b9cc570cd1af57f0e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DocumentParameters(val document: Document,` * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)` * `class PLSA(@transient protected val sc: SparkContext,` * `class RobustDocumentParameters(document: Document,` * `class RobustGlobalParameters(phi : Array[Array[Float]],` * `class RobustPLSA(@transient protected val sc: SparkContext,` * `trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification ` * `trait TopicsRegularizer extends MatrixInPlaceModification ` * `class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer ` * `class UniformTopicRegularizer extends TopicsRegularizer ` * `class Document(val tokens: SparseVector[Int]) extends Serializable ` * `class TokenEnumerator extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66617936 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24367/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/3677#issuecomment-66618650 cc /@mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66619759 [Test build #24369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24369/consoleFull) for PR 1269 at commit [`c54afc9`](https://github.com/apache/spark/commit/c54afc96bb493143d9ce0484118a452ad8c7514d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3653 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3653#issuecomment-66626293 @JoshRosen I have addressed your final comments and merged it. Thank you very much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3653#issuecomment-66627047 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24370/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66627555 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24369/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66627541 [Test build #24369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24369/consoleFull) for PR 1269 at commit [`c54afc9`](https://github.com/apache/spark/commit/c54afc96bb493143d9ce0484118a452ad8c7514d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DocumentParameters(val document: Document,` * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)` * `class PLSA(@transient protected val sc: SparkContext,` * `class RobustDocumentParameters(document: Document,` * `class RobustGlobalParameters(phi : Array[Array[Float]],` * `class RobustPLSA(@transient protected val sc: SparkContext,` * `trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification ` * `trait TopicsRegularizer extends MatrixInPlaceModification ` * `class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer ` * `class UniformTopicRegularizer extends TopicsRegularizer ` * `class Document(val tokens: SparseVector[Int]) extends Serializable ` * `class TokenEnumerator extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3677#issuecomment-66627835 [Test build #24368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24368/consoleFull) for PR 3677 at commit [`1fd1d02`](https://github.com/apache/spark/commit/1fd1d020cb05f3d7d09289d7ae64869dbea58695). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4526][MLLIB]Gradient should be added ba...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3677#issuecomment-66627845 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24368/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66629318 [Test build #24371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24371/consoleFull) for PR 1269 at commit [`0764aaa`](https://github.com/apache/spark/commit/0764aaa9e8737c824ad0a71ec6ecb197476e2419). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21683030 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering + +import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = BreezeMatrix} +import breeze.linalg.{Transpose, det, inv} +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors} +import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext} +import org.apache.spark.SparkContext.DoubleAccumulatorParam + +/** + * Expectation-Maximization for multivariate Gaussian Mixture Models. + * + */ +object GMMExpectationMaximization { + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param maxIterations the maximum number of iterations to perform + * @param delta change in log-likelihood at which convergence is considered achieved + */ + def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k) + .setMaxIterations(maxIterations) + .setDelta(delta) + .run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param maxIterations the maximum number of iterations to perform + */ + def train(data: RDD[Vector], k: Int, maxIterations: Int): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param delta change in log-likelihood at which convergence is considered achieved + */ + def train(data: RDD[Vector], k: Int, delta: Double): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).setDelta(delta).run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + */ + def train(data: RDD[Vector], k: Int): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).run(data) + } +} + +/** + * This class performs multivariate Gaussian expectation maximization. It will + * maximize the log-likelihood for a mixture of k Gaussians, iterating until + * the log-likelihood changes by less than delta, or until it has reached + * the max number of iterations. + */ +class GMMExpectationMaximization private ( +private var k: Int, +private var delta: Double, +private var maxIterations: Int) extends Serializable { + + // Type aliases for convenience + private type DenseDoubleVector = BreezeVector[Double] + private type DenseDoubleMatrix = BreezeMatrix[Double] + + // number of samples per cluster to use when initializing Gaussians + private val nSamples = 5; + + // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood threshold + def this() = this(2, 0.01, 100) + + /** Set the number of Gaussians in the mixture model. Default: 2 */ + def setK(k: Int): this.type = { +this.k = k +this + } + + /** Set the maximum number of iterations to run. Default: 100 */ + def setMaxIterations(maxIterations: Int): this.type = { +this.maxIterations = maxIterations +this + } + + /** + * Set the largest
[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21683119 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering + +import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = BreezeMatrix} +import breeze.linalg.{Transpose, det, inv} +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors} +import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext} +import org.apache.spark.SparkContext.DoubleAccumulatorParam + +/** + * Expectation-Maximization for multivariate Gaussian Mixture Models. + * + */ +object GMMExpectationMaximization { + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param maxIterations the maximum number of iterations to perform + * @param delta change in log-likelihood at which convergence is considered achieved + */ + def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k) + .setMaxIterations(maxIterations) + .setDelta(delta) + .run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param maxIterations the maximum number of iterations to perform + */ + def train(data: RDD[Vector], k: Int, maxIterations: Int): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param delta change in log-likelihood at which convergence is considered achieved + */ + def train(data: RDD[Vector], k: Int, delta: Double): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).setDelta(delta).run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + */ + def train(data: RDD[Vector], k: Int): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).run(data) + } +} + +/** + * This class performs multivariate Gaussian expectation maximization. It will + * maximize the log-likelihood for a mixture of k Gaussians, iterating until + * the log-likelihood changes by less than delta, or until it has reached + * the max number of iterations. + */ +class GMMExpectationMaximization private ( +private var k: Int, +private var delta: Double, +private var maxIterations: Int) extends Serializable { + + // Type aliases for convenience + private type DenseDoubleVector = BreezeVector[Double] + private type DenseDoubleMatrix = BreezeMatrix[Double] + + // number of samples per cluster to use when initializing Gaussians + private val nSamples = 5; + + // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood threshold + def this() = this(2, 0.01, 100) + + /** Set the number of Gaussians in the mixture model. Default: 2 */ + def setK(k: Int): this.type = { +this.k = k +this + } + + /** Set the maximum number of iterations to run. Default: 100 */ + def setMaxIterations(maxIterations: Int): this.type = { +this.maxIterations = maxIterations +this + } + + /** + * Set the largest
[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-66636308 @jkbradley Thank you for your comments. I am working to resolve these issues and will push these changes in a day or two. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Do not include SPARK_CLASSPATH if empty
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3678#issuecomment-66638939 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...
GitHub user jbencook opened a pull request: https://github.com/apache/spark/pull/3679 [SPARK-2980][mllib] testing the Chi-squared hypothesis test This PR tests the pyspark Chi-squared hypothesis test from this commit: c8abddc5164d8cf11cdede6ab3d5d1ea08028708 and moves some of the error messaging in to python. It is a port of the Scala tests here: [HypothesisTestSuite.scala](https://github.com/apache/spark/blob/master/mllib/src/test/scala/orgapache/spark/mllib/stat/HypothesisTestSuite.scala) Hopefully, SPARK-2980 can be closed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jbencook/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3679.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3679 commit 3aeb0d91007960f33076b6e6775944bb9d81ead8 Author: jbencook jbenjaminc...@gmail.com Date: 2014-12-11T15:44:08Z [SPARK-2980][mllib] bringing Chi-squared error messages to the python side commit a17ee843185bdb1ee96574712450243d112fbce6 Author: jbencook jbenjaminc...@gmail.com Date: 2014-12-11T15:44:34Z [SPARK-2980][mllib] adding unit tests for the pyspark chi-squared test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2980][mllib] testing the Chi-squared hy...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3679#issuecomment-66639796 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Do not include SPARK_CLASSPATH if empty
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/3678 Do not include SPARK_CLASSPATH if empty My guess for fixing https://issues.apache.org/jira/browse/SPARK-4831. You can merge this pull request into a Git repository by running: $ git pull https://github.com/darabos/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3678 commit 36e12437a6cfd3eab1568ca50a5b8fc26ed275c1 Author: Daniel Darabos darabos.dan...@gmail.com Date: 2014-12-11T15:49:23Z Do not include SPARK_CLASSPATH if empty. Adding an empty string to the classpath adds the current directory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66642598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24371/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66642585 [Test build #24371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24371/consoleFull) for PR 1269 at commit [`0764aaa`](https://github.com/apache/spark/commit/0764aaa9e8737c824ad0a71ec6ecb197476e2419). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DocumentParameters(val document: Document,` * `class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)` * `class PLSA(@transient protected val sc: SparkContext,` * `class RobustDocumentParameters(document: Document,` * `class RobustGlobalParameters(phi : Array[Array[Float]],` * `class RobustPLSA(@transient protected val sc: SparkContext,` * `trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification ` * `trait TopicsRegularizer extends MatrixInPlaceModification ` * `class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer ` * `class UniformTopicRegularizer extends TopicsRegularizer ` * `class Document(val tokens: SparseVector[Int]) extends Serializable ` * `class TokenEnumerator extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4476][SQL] Use MapType for dict in...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3406#issuecomment-66643222 Yeah, I am sorry I have not got a change to continue my work. I need to finish the unit test part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3575][SQL] Removes the Metastore Parque...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3441#discussion_r21688244 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -81,9 +80,27 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with } // Since HiveQL is case insensitive for table names we make them all lowercase. - MetastoreRelation( + val relation = MetastoreRelation( databaseName, tblName, alias)( table.getTTable, partitions.map(part = part.getTPartition))(hive) + + if (hive.convertMetastoreParquet + relation.tableDesc.getSerdeClassName.toLowerCase.contains(parquet)) { +val path = if (relation.hiveQlTable.isPartitioned) { + partitions.map(_.getLocation).mkString(,) --- End diff -- Yea, forgot that in case of `MetastoreRelation` partition pruning is done within `HiveTableScan`... I'll add a WIP tag to this PR and add back partition pruning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66658459 Hi @JoshRosen - with the updates I've made is this ok to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21695326 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala --- @@ -0,0 +1,283 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering + +import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = BreezeMatrix} +import breeze.linalg.{Transpose, det, inv} +import org.apache.spark.rdd.RDD +import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors} +import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext} +import org.apache.spark.SparkContext.DoubleAccumulatorParam + +/** + * Expectation-Maximization for multivariate Gaussian Mixture Models. + * + */ +object GMMExpectationMaximization { + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param maxIterations the maximum number of iterations to perform + * @param delta change in log-likelihood at which convergence is considered achieved + */ + def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k) + .setMaxIterations(maxIterations) + .setDelta(delta) + .run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param maxIterations the maximum number of iterations to perform + */ + def train(data: RDD[Vector], k: Int, maxIterations: Int): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + * @param delta change in log-likelihood at which convergence is considered achieved + */ + def train(data: RDD[Vector], k: Int, delta: Double): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).setDelta(delta).run(data) + } + + /** + * Trains a GMM using the given parameters + * + * @param data training points stored as RDD[Vector] + * @param k the number of Gaussians in the mixture + */ + def train(data: RDD[Vector], k: Int): GaussianMixtureModel = { +new GMMExpectationMaximization().setK(k).run(data) + } +} + +/** + * This class performs multivariate Gaussian expectation maximization. It will + * maximize the log-likelihood for a mixture of k Gaussians, iterating until + * the log-likelihood changes by less than delta, or until it has reached + * the max number of iterations. + */ +class GMMExpectationMaximization private ( +private var k: Int, +private var delta: Double, +private var maxIterations: Int) extends Serializable { + + // Type aliases for convenience + private type DenseDoubleVector = BreezeVector[Double] + private type DenseDoubleMatrix = BreezeMatrix[Double] + + // number of samples per cluster to use when initializing Gaussians + private val nSamples = 5; + + // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood threshold + def this() = this(2, 0.01, 100) + + /** Set the number of Gaussians in the mixture model. Default: 2 */ + def setK(k: Int): this.type = { +this.k = k +this + } + + /** Set the maximum number of iterations to run. Default: 100 */ + def setMaxIterations(maxIterations: Int): this.type = { +this.maxIterations = maxIterations +this + } + + /** + * Set the
[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...
Github user tylerprete commented on the pull request: https://github.com/apache/spark/pull/2872#issuecomment-5398 @jontg I'm using this patch with your modifications (private_ip_address), but I'm getting the following errors when the script tries and starts the master: SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: ip-10-0-2-213: ip-10-0-2-213 10.0.2.213 is the master's ip in this case, but it looks like it's picking up ip-10-0-2-213 as the hostname and that isn't resolving. Did you run into anything like this, and if so, how'd you resolve it? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4728] Add exponential, gamma, and log n...
GitHub user rnowling opened a pull request: https://github.com/apache/spark/pull/3680 [SPARK-4728] Add exponential, gamma, and log normal sampling to MLlib da... ...ta generators This patch adds: * Exponential, gamma, and log normal generators that wrap Apache Commons math3 to the private API * Functions for generating exponential, gamma, and log normal RDDs and vector RDDs * Tests for the above You can merge this pull request into a Git repository by running: $ git pull https://github.com/rnowling/spark spark4728 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3680.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3680 commit 9f96232a675ae0850275347c3cc9bd69676df5af Author: RJ Nowling rnowl...@gmail.com Date: 2014-12-11T18:31:38Z [SPARK-4728] Add exponential, gamma, and log normal sampling to MLlib data generators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3651#discussion_r21697554 --- Diff: yarn/pom.xml --- @@ -152,6 +147,15 @@ /environmentVariables /configuration /plugin + plugin +groupIdorg.apache.maven.plugins/groupId +artifactIdmaven-surefire-plugin/artifactId +configuration + environmentVariables +SPARK_HOME${basedir}/../../SPARK_HOME --- End diff -- OK, in the name of keeping it simple I might not touch this this time. Since this occurs 2 places only, it doesn't save much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4728][MLLib] Add exponential, gamma, an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3680#issuecomment-7105 [Test build #24372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24372/consoleFull) for PR 3680 at commit [`9f96232`](https://github.com/apache/spark/commit/9f96232a675ae0850275347c3cc9bd69676df5af). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4740][investigation-only] Disable trans...
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/3667 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4740][investigation-only] Disable trans...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3667#issuecomment-8233 Alright closing now since transferTo isn't the issue at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3651#discussion_r21699001 --- Diff: pom.xml --- @@ -941,19 +950,38 @@ forktrue/fork /configuration /plugin +!-- Surefire runs all Java tests -- plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-surefire-plugin/artifactId - version2.17/version + version2.18/version + !-- Note config is repeated in scalatest config -- configuration -!-- Uses scalatest instead -- -skipTeststrue/skipTests +includes + include**/Test*.java/include + include**/*Test.java/include + include**/*TestCase.java/include + include**/*Suite.java/include +/includes + reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory +argLine-Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=512m/argLine +systemProperties + java.awt.headlesstrue/java.awt.headless + spark.test.home${session.executionRootDirectory}/spark.test.home + spark.testing1/spark.testing + spark.ui.enabledfalse/spark.ui.enabled + spark.ui.showConsoleProgressfalse/spark.ui.showConsoleProgress + spark.executor.extraClassPath${test_classpath}/spark.executor.extraClassPath + spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts +/systemProperties /configuration /plugin +!-- Scalatest runs all Scala tests -- plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version + !-- Note config is repeated in surefire config -- configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory --- End diff -- No, the files underneath are named by test suite, so they won't collide. I double-checked just now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3651#issuecomment-9398 Good point about `log4j.appender.file.append=false`. It looks like the Scala tests overwrite. Hm, why not set append to `true` indeed? it's in `target`, so gets deleted by `clean`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [PySpark] Fix tests with Python 2.6 in 1.0 bra...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3668#issuecomment-66670110 Yeah branch 0.9 is also having the same problem. I haven't looked deep into the issue yet but maybe @shaneknapp has a better idea? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4754] Refactor SparkContext into Execut...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3614#issuecomment-66670276 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4754] Refactor SparkContext into Execut...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3614#issuecomment-66670974 [Test build #24373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24373/consoleFull) for PR 3614 at commit [`187070d`](https://github.com/apache/spark/commit/187070d22b629a783203aa9d5013b4d38b769ca2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4728][MLLib] Add exponential, gamma, an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3680#issuecomment-66676689 [Test build #24374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24374/consoleFull) for PR 3680 at commit [`84fd98d`](https://github.com/apache/spark/commit/84fd98d6b1e625e1c143bf16fccbf91ff2040d08). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Remove Spillable minimum threshol...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3656#issuecomment-66676807 @lawlerd things are done this way because estimating the size for every record would be prohibitively expensive. Also, the trackMemoryThreshold is required at least until we figure out a solution for SPARK-4452. Without it, when there are multiple shuffle data structures in a thread and the first takes a bunch of memory, the second ends up spilling on every record (this was a blocker for 1.2). Your concern of course is valid - that we're not tracking memory 100% accurately. One response to this is that we're conservative with. E.g. we only use up to spark.shuffle.safetyFraction (default 80%) of the available shuffle memory. One improvement that might make sense would be to do the sampling based on memory size rather than number of records. So if we notice that records are larger we would sample more frequently and maybe adjust the trackMemoryThreshold. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org