[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193141362 Thank you! Talk to you tomorrow. BTW, we also need to fix a couple of windows expressions, for example, `row_number`, `cume_dist`, `rank`, `dense_rank` and `percent_rank`. We need to override the default `sql` functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193139472 have a good rest, we can discuss more tomorrow :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11554#issuecomment-193139012 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52543/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193139048 Sorry, I have an early morning conference call with the patent attorneys. Will reply your response tomorrow. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11554#issuecomment-193139011 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11554#issuecomment-193138893 **[Test build #52543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52543/consoleFull)** for PR 11554 at commit [`0250d32`](https://github.com/apache/spark/commit/0250d32c870686539d84a82a56098e144151b45d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193137731 So far, the test cases I wrote are listed below. I think we still need to add more to cover all the cases. ``` test("window basic") { checkHiveQl( s""" |select key, value, |round(avg(value) over (), 2) |from parquet_t1 order by key """.stripMargin) } test("window with different window specification") { checkHiveQl( s""" |select key, value, |dense_rank() over (order by key, value) as dr, |sum(value) over (partition by key order by key) as sum |from parquet_t1 """.stripMargin) } test("window with the same window specification with aggregate + having") { checkHiveQl( s""" |select key, value, |sum(value) over (partition by key % 5 order by key) as dr |from parquet_t1 group by key, value having key > 5 """.stripMargin) } test("window with the same window specification with aggregate functions") { checkHiveQl( s""" |select key, value, |sum(value) over (partition by key % 5 order by key) as dr |from parquet_t1 group by key, value """.stripMargin) } test("window with the same window specification with aggregate") { checkHiveQl( s""" |select key, value, |dense_rank() over (distribute by key sort by key, value) as dr, |count(key) |from parquet_t1 group by key, value """.stripMargin) } test("window with the same window specification without aggregate and filter") { checkHiveQl( s""" |select key, value, |dense_rank() over (distribute by key sort by key, value) as dr, |count(key) over(distribute by key sort by key, value) as ca |from parquet_t1 """.stripMargin) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193135944 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193135946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52541/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193135642 **[Test build #52541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52541/consoleFull)** for PR 11487 at commit [`0055fd1`](https://github.com/apache/spark/commit/0055fd1cf4b1f4cea91fd5a4f89589d82715f2c7). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193135603 @cloud-fan The issue is much more complex in my implementation. As you saw in the JIRA, I originally want to add extra subqueryAlias between each Window. However, I hit a couple of problem caused by `subqueryAlias`. Thus, I finally decided to recover the original SQL statement. Below is my code draft without code cleans. ```scala private def getAllWindowExprs( plan: Window, windowExprs: ArrayBuffer[NamedExpression]): (LogicalPlan, ArrayBuffer[NamedExpression]) = { plan.child match { case w: Window => getAllWindowExprs(plan.child.asInstanceOf[Window], windowExprs ++ plan.windowExpressions) case _ => (plan.child, windowExprs ++ plan.windowExpressions) } } // Replace the attributes of aliased expressions in windows expressions // by the original expressions in Project or Aggregate private def replaceAliasedByExpr( projectList: Seq[NamedExpression], windowExprs: Seq[NamedExpression]): Seq[Expression] = { val aliasMap = AttributeMap(projectList.collect { case a: Alias => (a.toAttribute, a.child) }) windowExprs.map { case expr => expr.transformDown { case ar: AttributeReference if aliasMap.contains(ar) => aliasMap(ar) } } } private def buildProjectListForWindow(plan: Window): (String, String, String, LogicalPlan) = { // get all the windowExpressions from all the adjacent Window val (child, windowExpressions) = getAllWindowExprs(plan, ArrayBuffer.empty[NamedExpression]) child match { case p: Project => val newWindowExpr = replaceAliasedByExpr(p.projectList, windowExpressions) ((p.projectList ++ newWindowExpr).map(_.sql).mkString(", "), "", "", p.child) case _: Aggregate | _ @ Filter(_, _: Aggregate) => val agg: Aggregate = child match { case a: Aggregate => a case Filter(_, a: Aggregate) => a } val newWindowExpr = replaceAliasedByExpr(agg.aggregateExpressions, windowExpressions) val groupingSQL = agg.groupingExpressions.map(_.sql).mkString(", ") val havingSQL = child match { case a: Aggregate => "" case Filter(condition, a: Aggregate) => "HAVING " + condition.sql } ((agg.aggregateExpressions ++ newWindowExpr) .map(_.sql).mkString(", "), groupingSQL, havingSQL, agg.child) } } private def windowToSQL(plan: Window): String = { val (selectList, groupingSQL, havingSQL, nextPlan) = buildProjectListForWindow(plan) build( "SELECT", selectList, if (nextPlan == OneRowRelation) "" else "FROM", toSQL(nextPlan), if (groupingSQL.isEmpty) "" else "GROUP BY", groupingSQL, havingSQL ) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193134125 **[Test build #52545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52545/consoleFull)** for PR 11555 at commit [`3ce072b`](https://github.com/apache/spark/commit/3ce072b4682a362d578a01181e3b8699cc38de93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193133649 **[Test build #52544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52544/consoleFull)** for PR 11555 at commit [`559bbc5`](https://github.com/apache/spark/commit/559bbc5bfb20105a5fead499e30583cbfa98d103). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193133656 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52544/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193133655 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193133481 **[Test build #52544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52544/consoleFull)** for PR 11555 at commit [`559bbc5`](https://github.com/apache/spark/commit/559bbc5bfb20105a5fead499e30583cbfa98d103). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11555#issuecomment-193133048 cc @liancheng @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/11555 [SPARK-12718][SQL] SQL generation support for window functions ## What changes were proposed in this pull request? Add SQL generation support for window functions ## How was this patch tested? new tests in `LogicalPlanToSQLSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark window Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11555.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11555 commit 559bbc5bfb20105a5fead499e30583cbfa98d103 Author: Wenchen Fan Date: 2016-03-07T07:07:26Z SQL generation support for window functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11554#issuecomment-193131502 **[Test build #52543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52543/consoleFull)** for PR 11554 at commit [`0250d32`](https://github.com/apache/spark/commit/0250d32c870686539d84a82a56098e144151b45d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/11554 [SPARK-13712] [ML] Add OneVsOne to ML JIRA: https://issues.apache.org/jira/browse/SPARK-13712 ## What changes were proposed in this pull request? Add OneVsOne meta method for multi-class classification to ML ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark onevsone Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11554 commit f12a554dfdabc8b3b8cdba50e00128fada981733 Author: Zheng RuiFeng Date: 2016-03-05T05:02:40Z create onevsone commit 76dff5cb4c1c454afe7a434e4e626a01af3ff2b2 Author: Zheng RuiFeng Date: 2016-03-05T08:47:55Z add test commit 0250d32c870686539d84a82a56098e144151b45d Author: Zheng RuiFeng Date: 2016-03-07T06:23:59Z fix bug in sql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193124038 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193124043 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52542/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193123747 **[Test build #52542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52542/consoleFull)** for PR 11487 at commit [`7ac9648`](https://github.com/apache/spark/commit/7ac9648f43ce6989827c793f3f1872558baaa4ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193123600 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193123602 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52540/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193123443 **[Test build #52540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52540/consoleFull)** for PR 11497 at commit [`93d6e69`](https://github.com/apache/spark/commit/93d6e6970325d67ebb6b92e0c77b078507627843). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13025] Allow users to set initial model...
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/11459#issuecomment-193122668 Should this wait until [PR-9](https://github.com/apache/spark/pull/9) is merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/11301#issuecomment-193118103 @kiszk It seems this PR covers only `Expression`. Why don't you cover operators like sort and join too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/11301#discussion_r55165010 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -418,6 +419,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { override def toString: String = treeString + def toOriginString: String = +if (this.origin.callSite.isDefined && !this.isInstanceOf[BoundReference]) { --- End diff -- Could you tell me why `BoundReference` is exceptional? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/11301#discussion_r55165016 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala --- @@ -50,7 +50,7 @@ object ExpressionSet { class ExpressionSet protected( protected val baseSet: mutable.Set[Expression] = new mutable.HashSet, protected val originals: mutable.Buffer[Expression] = new ArrayBuffer) - extends Set[Expression] { + extends Set[Expression] with Serializable { --- End diff -- If `ExpressionSet` is really serialized only in the case of `LogicalPlan`, we could move `constraints` from `QueryPlan` to `LogicalPlan` but I'm not sure it's correct way. Have you ever got any problem because `ExpressionSet` is not `Serializable` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/11301#discussion_r55165012 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -57,15 +58,15 @@ object CurrentOrigin { def reset(): Unit = value.set(Origin()) - def setPosition(line: Int, start: Int): Unit = { + def setPosition(callSite: String, line: Int, start: Int): Unit = { value.set( - value.get.copy(line = Some(line), startPosition = Some(start))) + value.get.copy(callSite = Some(callSite), line = Some(line), startPosition = Some(start))) } def withOrigin[A](o: Origin)(f: => A): A = { +val current = get set(o) -val ret = try f finally { reset() } -reset() +val ret = try f finally { set(current) } --- End diff -- It might correct change but I noticed that after this change, we have another issue when we operate `DataFrame` using both DSL like API and SQL/HiveQL. For example, If we have follwing code and run it. ``` val df = sc.parallelize(1 to 10).toDF val filtered = df.filter("_1 > 4") val selected = filtered.select($"_1" * 10) selected.show() ``` And then, we have generated code like as follows. ``` ... /* 055 */ while (rdd_batchIdx < numRows) { /* 056 */ InternalRow rdd_row = rdd_batch.getRow(rdd_batchIdx++); /* 057 */ /* input[0, int] */ /* 058 */ boolean rdd_isNull = rdd_row.isNullAt(0); /* 059 */ int rdd_value = rdd_isNull ? -1 : (rdd_row.getInt(0)); /* 060 */ /* (input[0, int] > 4) @ filter at SPARK13432.scala:14 */ /* 061 */ boolean filter_isNull = true; /* 062 */ boolean filter_value = false; /* 063 */ /* 064 */ if (!rdd_isNull) { /* 065 */ filter_isNull = false; // resultCode could change nullability. /* 066 */ filter_value = rdd_value > 4; /* 067 */ /* 068 */ } /* 069 */ if (!filter_isNull && filter_value) { /* 070 */ filter_metricValue.add(1); /* 071 */ /* 072 */ /* (input[0, int] * 10) @ filter at SPARK13432.scala:14 */ /* 073 */ boolean project_isNull = true; /* 074 */ int project_value = -1; ... ``` At the line #072, it should not be `filter` and the line of original code is not 14. I think, the comment should just say /* (input[0, int] * 10 */. This issue is because origin is not reset properly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193113588 Now, it's **763** seconds. It looks minimal and seems to use 4 processes fully. ``` Tests passed in 763 seconds ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13231] Make count failed values a user ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/5#issuecomment-193112524 @andrewor14 Can you take a look ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193098087 **[Test build #52542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52542/consoleFull)** for PR 11487 at commit [`7ac9648`](https://github.com/apache/spark/commit/7ac9648f43ce6989827c793f3f1872558baaa4ef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-193096199 **[Test build #52541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52541/consoleFull)** for PR 11487 at commit [`0055fd1`](https://github.com/apache/spark/commit/0055fd1cf4b1f4cea91fd5a4f89589d82715f2c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193095096 **[Test build #52540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52540/consoleFull)** for PR 11497 at commit [`93d6e69`](https://github.com/apache/spark/commit/93d6e6970325d67ebb6b92e0c77b078507627843). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193094294 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193089248 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193089252 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52539/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193089010 **[Test build #52539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52539/consoleFull)** for PR 11551 at commit [`668c7b1`](https://github.com/apache/spark/commit/668c7b12b380f5f8f1020faf2594a95cac95453c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/11487#discussion_r55160833 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -237,4 +241,45 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(anyTypes.forall(!_.isPrimitive)) assert(anyTypes === Seq(classOf[java.lang.Object], classOf[java.lang.Object])) } + + private def testThreadSafetyFor(name: String)(exec: () => Any) = { +test(s"thread safety of ${name}") { + for (_ <- 0 until 100) { --- End diff -- @srowen Thank you for your comment. > `(0 until 100).foreach`? I repeated the test 100 times here because it is for thread-safety. Thread safety problem sometimes happens but sometimes doesn't. > You can import `java.net.URLClassLoader`. I'll modify to use import. > It doesn't really seem like you need a method here; it took a moment to see there was a test in here. I'll modify to move out of the method. > Maybe it's obvious to you but why do all these classes/methods need to be tested separately? The methods are public, i.e. can be called by multi-thread, so I thought these also need to be tested. But I'm wondering some of them could be removed? > And is this locking still safe in 2.11? Yes, reflection in Scala 2.11 is thread-safe. If we don't support Scala 2.10, these lockings in `ScalaReflection` would not be needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193082740 **[Test build #52539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52539/consoleFull)** for PR 11551 at commit [`668c7b1`](https://github.com/apache/spark/commit/668c7b12b380f5f8f1020faf2594a95cac95453c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193082802 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193074477 @cloud-fan Can we trigger a test please ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...
Github user oliverpierson commented on the pull request: https://github.com/apache/spark/pull/11553#issuecomment-193073200 This is still a work in progress, just wanted to get the PR up so it's on the radar. Still need to: - [ ] add an external Parameter (with default value) for setting the acceptable error - [ ] Investigated whether or not +/- Infinity need to be add to the splits/quantiles given by approxQuantiles --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11553#issuecomment-193072808 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...
GitHub user oliverpierson opened a pull request: https://github.com/apache/spark/pull/11553 [SPARK-13600] [MLlib] [WIP] Incorrect number of buckets in QuantileDiscretizer ## What changes were proposed in this pull request? QuantileDiscretizer can return an unexpected number of buckets in certain cases. This PR proposes to fix this issue and also refactor QuantileDiscretizer to use approxQuantiles from DataFrame stats functions. ## How was this patch tested? QuantileDiscretizerSuite unit tests (some existing tests will change or even be removed in this PR) You can merge this pull request into a Git repository by running: $ git pull https://github.com/oliverpierson/spark SPARK-13600 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11553.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11553 commit 7ff2da10141378cb9511672964f85615f937484d Author: Oliver Pierson Date: 2016-03-07T03:07:20Z refactored QuantileDiscretizer to use dataframe stats --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193071199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193071201 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52535/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193070983 **[Test build #52535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52535/consoleFull)** for PR 11550 at commit [`5c990cd`](https://github.com/apache/spark/commit/5c990cd8c996c9a624439749d8809624c2457051). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11497#issuecomment-193066450 LGTM, cc @davies (who fixed this special case before) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193065038 Hi, @JoshRosen . Could you review this, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/11301#issuecomment-193065041 Sorry for the late reply and I have some comments. I'll leave it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193064926 As I wrote in Jira Issue, the total time of all tests are **3077s**. So, the minimum required time for 4 processes was 769s. According to the real Jenkins result, it is observed **804** now; about 160 seconds reduction. ``` Tests passed in 804 seconds ``` In case of removal `PyPy` and `Python3.4`, this priority queue reduces the total running time than FIFO queue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11235#issuecomment-193064804 ping @marmbrus @rxin @davies @liancheng Is this ready to go? Or you have other comments? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193062807 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52538/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193062804 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193062184 **[Test build #52538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52538/consoleFull)** for PR 11551 at commit [`6a0e099`](https://github.com/apache/spark/commit/6a0e09907eb60c12074fb32b6bdfff574d64ccf2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13034] Add export/import for all estima...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11552#issuecomment-193057635 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13034] Add export/import for all estima...
GitHub user GayathriMurali opened a pull request: https://github.com/apache/spark/pull/11552 [SPARK-13034] Add export/import for all estimators and transformers(w… ## What changes were proposed in this pull request? Add export/import for all estimators and transformers(which have Scala implementation) under pyspark/ml/classification.py. JIRA : https://issues.apache.org/jira/browse/SPARK-13034 ## How was this patch tested? Unit tests added to tests.py …hich have Scala implementation) under pyspark/ml/classification.py You can merge this pull request into a Git repository by running: $ git pull https://github.com/GayathriMurali/spark SPARK-13034 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11552.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11552 commit 4760bcb73913dcb80f2419ce0fc989a119f02044 Author: GayathriMurali Date: 2016-03-07T02:18:18Z [SPARK-13034] Add export/import for all estimators and transformers(which have Scala implementation) under pyspark/ml/classification.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193048081 **[Test build #52538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52538/consoleFull)** for PR 11551 at commit [`6a0e099`](https://github.com/apache/spark/commit/6a0e09907eb60c12074fb32b6bdfff574d64ccf2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193042342 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193042335 **[Test build #52537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52537/consoleFull)** for PR 11551 at commit [`69fc65b`](https://github.com/apache/spark/commit/69fc65b63cac71fd733976862abc902cb2e37ecc). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193042343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52537/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11551#issuecomment-193041678 **[Test build #52537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52537/consoleFull)** for PR 11551 at commit [`69fc65b`](https://github.com/apache/spark/commit/69fc65b63cac71fd733976862abc902cb2e37ecc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11128#issuecomment-193039764 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11128#issuecomment-193039715 **[Test build #52536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52536/consoleFull)** for PR 11128 at commit [`02b4e22`](https://github.com/apache/spark/commit/02b4e2235c5421458cb8fa96c734c54c9bad9457). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11128#issuecomment-193039765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52536/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/11551 [SPARK-12243][BUILD][PYTHON] PySpark tests are slow in Jenkins. ## What changes were proposed in this pull request? In the Jenkins pull request builder, PySpark tests take around [962 seconds ](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/console) of end-to-end time to run, despite the fact that we run four Python test suites in parallel. According to the log, the basic reason is that the long running test starts at the end due to FIFO queue. We first try to reduce the test time by just starting some long running tests first with simple priority queue. ``` Running PySpark tests ... Finished test(python3.4): pyspark.streaming.tests (213s) Finished test(pypy): pyspark.sql.tests (92s) Finished test(pypy): pyspark.streaming.tests (280s) Tests passed in 962 seconds ``` ## How was this patch tested? Manual check. Check 'Running PySpark tests' part of the Jenkins log. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-12243 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11551.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11551 commit 69fc65b63cac71fd733976862abc902cb2e37ecc Author: Dongjoon Hyun Date: 2016-03-06T19:52:01Z PySpark tests are slow in Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11128#issuecomment-193038016 **[Test build #52536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52536/consoleFull)** for PR 11128 at commit [`02b4e22`](https://github.com/apache/spark/commit/02b4e2235c5421458cb8fa96c734c54c9bad9457). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193035803 @falaki Would you maybe review this please..? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193035983 **[Test build #52535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52535/consoleFull)** for PR 11550 at commit [`5c990cd`](https://github.com/apache/spark/commit/5c990cd8c996c9a624439749d8809624c2457051). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-193035562 @rxin There should be a conflict with https://github.com/apache/spark/pull/11315 which I think it's supposed to be merged (assuming from your comment). I will resolve the conflict as soon as either this one or that one is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11550 [SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource. ## What changes were proposed in this pull request? This PR adds the support to specify custom date format for `DateType` and `TimestampType`. For `TimestampType`, this uses the given format to infer schema and also to convert the values For `DateType`, this uses the given format to convert the values. If the `dateFormat` is not given, then it works with `Timestamp.valueOf()` and `Date.valueOf()` for backwords compatibility. When it's given, then it uses `SimpleDateFormat` for parsing data. In addition, `IntegerType`, `DoubleType` and `LongType` have a higher priority than `TimestampType` in type inference. This means even if the given format is `` or `.MM`, it will be inferred as `IntegerType` or `DoubleType`. Since it is type inference, I think it is okay to give such precedences. In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON datasource has `json.InferSchema`. Although they have the same names, I did this because I thought the parent package name can still differentiate each. Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to `InferSchemaSuite`. ## How was this patch tested? unit tests are used and `./dev/run_tests` for coding style tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-13667 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11550.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11550 commit 5c990cd8c996c9a624439749d8809624c2457051 Author: hyukjinkwon Date: 2016-03-07T01:16:07Z Support for specifying custom date format for date and timestamp types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-193020982 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-193020984 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52534/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-193020961 **[Test build #52534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52534/consoleFull)** for PR 11549 at commit [`1cca19e`](https://github.com/apache/spark/commit/1cca19e68d9ef256769594e02d123ce6e3b0bd7d). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-193015610 **[Test build #52534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52534/consoleFull)** for PR 11549 at commit [`1cca19e`](https://github.com/apache/spark/commit/1cca19e68d9ef256769594e02d123ce6e3b0bd7d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/11549#issuecomment-193015522 Since we already have a `glm` in SparkR which is based on `LogisticRegressionModel` and `LinearRegressionModel`. There're three ways to extend it as I understand: 1. Change the current glm to use `GeneralizedLinearRegression`. Create another `lm` interface for sparkR, and use LR as the model. 2. Keep glm R interface. and replace its implementation with GLM. This means R can not invoke LR anymore. 2. Keep glm R interface, and combine the implementation with both LR and GLM based on different solver parameter. I'd prefer to use option 1. And I'm gonna send one PR(WIP) for solution 2, which can later be adjusted to 1 or 3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/11549 [SPARK-12566] [ML] [WIP] GLM model family, link function support in SparkR:::glm ## What changes were proposed in this pull request? This JIRA is for extending the support of MLlib's Generalized Linear Models (GLMs) to more model families and link functions in SparkR. After SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR with support of popular families and link functions. ## How was this patch tested? WIP, some manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark glmR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11549 commit 6c933650389798f8e3caf3e50604bceae79a126e Author: Yuhao Yang Date: 2016-03-06T23:00:44Z change R glm to use GLM commit 1cca19e68d9ef256769594e02d123ce6e3b0bd7d Author: Yuhao Yang Date: 2016-03-06T23:27:58Z refine family --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193013801 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193013802 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52533/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193013759 **[Test build #52533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52533/consoleFull)** for PR 11108 at commit [`3329394`](https://github.com/apache/spark/commit/33293947bde90fd29014587cd42533df121bd783). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193012431 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193012434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52532/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193012216 **[Test build #52532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52532/consoleFull)** for PR 11108 at commit [`e2737ee`](https://github.com/apache/spark/commit/e2737eedd6c45c82f25045442b1d811ab2c395ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193011222 **[Test build #52533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52533/consoleFull)** for PR 11108 at commit [`3329394`](https://github.com/apache/spark/commit/33293947bde90fd29014587cd42533df121bd783). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193010336 **[Test build #52532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52532/consoleFull)** for PR 11108 at commit [`e2737ee`](https://github.com/apache/spark/commit/e2737eedd6c45c82f25045442b1d811ab2c395ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...
Github user keypointt commented on the pull request: https://github.com/apache/spark/pull/11108#issuecomment-193010119 thanks a lot @yinxusen , I'm fixing the import format in the other PRs, will commit soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11527#issuecomment-193008392 Hi, @mengxr and @jkbradley . Could you review this PR, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/11497#discussion_r55149604 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -512,6 +512,9 @@ class Analyzer( // A special case for Generate, because the output of Generate should not be resolved by // ResolveReferences. Attributes in the output will be resolved by ResolveGenerate. + case g @ Generate(generator, _, _, _, _, _) +if !g.resolved && generator.resolved => g + case g @ Generate(generator, join, outer, qualifier, output, child) --- End diff -- @cloud-fan Thanks !! Made the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11527#issuecomment-193003088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52530/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11527#issuecomment-193003083 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11527#issuecomment-193002661 **[Test build #52530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/consoleFull)** for PR 11527 at commit [`92be84f`](https://github.com/apache/spark/commit/92be84f7b6bc45fdd82ae21d8e1245d0549e0f83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9912#issuecomment-193001422 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9912#issuecomment-193001424 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52531/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9912#issuecomment-193001372 **[Test build #52531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52531/consoleFull)** for PR 9912 at commit [`cc2eb44`](https://github.com/apache/spark/commit/cc2eb44afecc442649e2b20369b78c31506d3597). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9912#discussion_r55147885 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -167,19 +167,15 @@ class RandomForestClassifierSuite extends SparkFunSuite with MLlibTestSparkConte .setSeed(123) // In this data, feature 1 is very important. -val data: RDD[LabeledPoint] = sc.parallelize(Seq( - new LabeledPoint(0, Vectors.dense(1, 0, 0, 0, 1)), - new LabeledPoint(1, Vectors.dense(1, 1, 0, 1, 0)), - new LabeledPoint(1, Vectors.dense(1, 1, 0, 0, 0)), - new LabeledPoint(0, Vectors.dense(1, 0, 0, 0, 0)), - new LabeledPoint(1, Vectors.dense(1, 1, 0, 0, 0)) -)) +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) val categoricalFeatures = Map.empty[Int, Int] val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) val importances = rf.fit(df).featureImportances val mostImportantFeature = importances.argmax assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) --- End diff -- I updated the feature importance tests here, as well, with additional checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/9912#discussion_r55147865 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -169,6 +169,20 @@ final class DecisionTreeClassificationModel private[ml] ( s"DecisionTreeClassificationModel (uid=$uid) of depth $depth with $numNodes nodes" } + /** + * Estimate of the importance of each feature. + * + * This generalizes the idea of "Gini" importance to other losses, + * following the explanation of Gini importance from "Random Forests" documentation + * by Leo Breiman and Adele Cutler, and following the implementation from scikit-learn. --- End diff -- I added a note in the docs for `DecisionTreeRegressor` and `DecisionTreeClassifier`. I can update the format or the wording if needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9912#issuecomment-192986914 **[Test build #52531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52531/consoleFull)** for PR 9912 at commit [`cc2eb44`](https://github.com/apache/spark/commit/cc2eb44afecc442649e2b20369b78c31506d3597). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org