[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193141362
  
Thank you! Talk to you tomorrow.

BTW, we also need to fix a couple of windows expressions, for example, 
`row_number`, `cume_dist`, `rank`, `dense_rank` and `percent_rank`. We need to 
override the default `sql` functions.

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193139472
  
have a good rest, we can discuss more tomorrow :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11554#issuecomment-193139012
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52543/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193139048
  
Sorry, I have an early morning conference call with the patent attorneys. 
Will reply your response tomorrow. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11554#issuecomment-193139011
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11554#issuecomment-193138893
  
**[Test build #52543 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52543/consoleFull)**
 for PR 11554 at commit 
[`0250d32`](https://github.com/apache/spark/commit/0250d32c870686539d84a82a56098e144151b45d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193137731
  
So far, the test cases I wrote are listed below. I think we still need to 
add more to cover all the cases. 
```
  test("window basic") {
checkHiveQl(
  s"""
 |select key, value,
 |round(avg(value) over (), 2)
 |from parquet_t1 order by key
  """.stripMargin)
  }

  test("window with different window specification") {
checkHiveQl(
  s"""
 |select key, value,
 |dense_rank() over (order by key, value) as dr,
 |sum(value) over (partition by key order by key) as sum
 |from parquet_t1
  """.stripMargin)
  }

  test("window with the same window specification with aggregate + having") 
{
checkHiveQl(
  s"""
|select key, value,
|sum(value) over (partition by key % 5 order by key) as dr
|from parquet_t1 group by key, value having key > 5
  """.stripMargin)
  }

  test("window with the same window specification with aggregate 
functions") {
checkHiveQl(
  s"""
|select key, value,
|sum(value) over (partition by key % 5 order by key) as dr
|from parquet_t1 group by key, value
  """.stripMargin)
  }

  test("window with the same window specification with aggregate") {
checkHiveQl(
  s"""
|select key, value,
|dense_rank() over (distribute by key sort by key, value) as dr,
|count(key)
|from parquet_t1 group by key, value
  """.stripMargin)
  }

  test("window with the same window specification without aggregate and 
filter") {
checkHiveQl(
  s"""
|select key, value,
|dense_rank() over (distribute by key sort by key, value) as dr,
|count(key) over(distribute by key sort by key, value) as ca
|from parquet_t1
  """.stripMargin)
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193135944
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193135946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52541/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193135642
  
**[Test build #52541 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52541/consoleFull)**
 for PR 11487 at commit 
[`0055fd1`](https://github.com/apache/spark/commit/0055fd1cf4b1f4cea91fd5a4f89589d82715f2c7).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193135603
  
@cloud-fan The issue is much more complex in my implementation. As you saw 
in the JIRA, I originally want to add extra subqueryAlias between each Window. 
However, I hit a couple of problem caused by `subqueryAlias`. Thus, I finally 
decided to recover the original SQL statement. Below is my code draft without 
code cleans.

```scala
  private def getAllWindowExprs(
  plan: Window,
  windowExprs: ArrayBuffer[NamedExpression]): (LogicalPlan, 
ArrayBuffer[NamedExpression]) = {
plan.child match {
  case w: Window =>
getAllWindowExprs(plan.child.asInstanceOf[Window], windowExprs ++ 
plan.windowExpressions)
  case _ => (plan.child, windowExprs ++ plan.windowExpressions)
}
  }

  // Replace the attributes of aliased expressions in windows expressions
  // by the original expressions in Project or Aggregate
  private def replaceAliasedByExpr(
  projectList: Seq[NamedExpression],
  windowExprs: Seq[NamedExpression]): Seq[Expression] = {
val aliasMap = AttributeMap(projectList.collect {
  case a: Alias => (a.toAttribute, a.child)
})

windowExprs.map { case expr =>
  expr.transformDown {
case ar: AttributeReference if aliasMap.contains(ar) => aliasMap(ar)
  }
}
  }

  private def buildProjectListForWindow(plan: Window): (String, String, 
String, LogicalPlan) = {
// get all the windowExpressions from all the adjacent Window
val (child, windowExpressions) = getAllWindowExprs(plan, 
ArrayBuffer.empty[NamedExpression])

child match {
  case p: Project =>
val newWindowExpr = replaceAliasedByExpr(p.projectList, 
windowExpressions)
((p.projectList ++ newWindowExpr).map(_.sql).mkString(", "), "", 
"", p.child)

  case _: Aggregate | _ @ Filter(_, _: Aggregate) =>
val agg: Aggregate = child match {
  case a: Aggregate => a
  case Filter(_, a: Aggregate) => a
}

val newWindowExpr = replaceAliasedByExpr(agg.aggregateExpressions, 
windowExpressions)

val groupingSQL = agg.groupingExpressions.map(_.sql).mkString(", ")

val havingSQL = child match {
  case a: Aggregate => ""
  case Filter(condition, a: Aggregate) => "HAVING " + condition.sql
}

((agg.aggregateExpressions ++ newWindowExpr)
  .map(_.sql).mkString(", "),
  groupingSQL,
  havingSQL,
  agg.child)
}
  }

  private def windowToSQL(plan: Window): String = {

val (selectList, groupingSQL, havingSQL, nextPlan) = 
buildProjectListForWindow(plan)

build(
  "SELECT",
  selectList,
  if (nextPlan == OneRowRelation) "" else "FROM",
  toSQL(nextPlan),
  if (groupingSQL.isEmpty) "" else "GROUP BY",
  groupingSQL,
  havingSQL
)
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193134125
  
**[Test build #52545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52545/consoleFull)**
 for PR 11555 at commit 
[`3ce072b`](https://github.com/apache/spark/commit/3ce072b4682a362d578a01181e3b8699cc38de93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193133649
  
**[Test build #52544 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52544/consoleFull)**
 for PR 11555 at commit 
[`559bbc5`](https://github.com/apache/spark/commit/559bbc5bfb20105a5fead499e30583cbfa98d103).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193133656
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52544/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193133655
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193133481
  
**[Test build #52544 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52544/consoleFull)**
 for PR 11555 at commit 
[`559bbc5`](https://github.com/apache/spark/commit/559bbc5bfb20105a5fead499e30583cbfa98d103).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193133048
  
cc @liancheng @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SQL] SQL generation support for ...

2016-03-06 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/11555

[SPARK-12718][SQL] SQL generation support for window functions

## What changes were proposed in this pull request?

Add SQL generation support for window functions


## How was this patch tested?

new tests in `LogicalPlanToSQLSuite`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark window

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11555


commit 559bbc5bfb20105a5fead499e30583cbfa98d103
Author: Wenchen Fan 
Date:   2016-03-07T07:07:26Z

SQL generation support for window functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11554#issuecomment-193131502
  
**[Test build #52543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52543/consoleFull)**
 for PR 11554 at commit 
[`0250d32`](https://github.com/apache/spark/commit/0250d32c870686539d84a82a56098e144151b45d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13712] [ML] Add OneVsOne to ML

2016-03-06 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/11554

[SPARK-13712] [ML] Add OneVsOne to ML

JIRA: https://issues.apache.org/jira/browse/SPARK-13712

## What changes were proposed in this pull request?

Add OneVsOne meta method for multi-class classification to ML


## How was this patch tested?

manual tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark onevsone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11554.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11554


commit f12a554dfdabc8b3b8cdba50e00128fada981733
Author: Zheng RuiFeng 
Date:   2016-03-05T05:02:40Z

create onevsone

commit 76dff5cb4c1c454afe7a434e4e626a01af3ff2b2
Author: Zheng RuiFeng 
Date:   2016-03-05T08:47:55Z

add test

commit 0250d32c870686539d84a82a56098e144151b45d
Author: Zheng RuiFeng 
Date:   2016-03-07T06:23:59Z

fix bug in sql




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193124038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193124043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52542/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193123747
  
**[Test build #52542 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52542/consoleFull)**
 for PR 11487 at commit 
[`7ac9648`](https://github.com/apache/spark/commit/7ac9648f43ce6989827c793f3f1872558baaa4ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193123600
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193123602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52540/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193123443
  
**[Test build #52540 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52540/consoleFull)**
 for PR 11497 at commit 
[`93d6e69`](https://github.com/apache/spark/commit/93d6e6970325d67ebb6b92e0c77b078507627843).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13025] Allow users to set initial model...

2016-03-06 Thread sethah
Github user sethah commented on the pull request:

https://github.com/apache/spark/pull/11459#issuecomment-193122668
  
Should this wait until 
[PR-9](https://github.com/apache/spark/pull/9) is merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-06 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-193118103
  
@kiszk It seems this PR covers only `Expression`. Why don't you cover 
operators like sort and join too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-06 Thread sarutak
Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/11301#discussion_r55165010
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -418,6 +419,13 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
 
   override def toString: String = treeString
 
+  def toOriginString: String =
+if (this.origin.callSite.isDefined && 
!this.isInstanceOf[BoundReference]) {
--- End diff --

Could you tell me why `BoundReference` is exceptional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-06 Thread sarutak
Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/11301#discussion_r55165016
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 ---
@@ -50,7 +50,7 @@ object ExpressionSet {
 class ExpressionSet protected(
 protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
 protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
-  extends Set[Expression] {
+  extends Set[Expression] with Serializable {
--- End diff --

If `ExpressionSet` is really serialized only in the case of `LogicalPlan`,  
we could move `constraints` from `QueryPlan` to `LogicalPlan` but I'm not sure 
it's correct way.
Have you ever got any problem because `ExpressionSet` is not `Serializable` 
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-06 Thread sarutak
Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/11301#discussion_r55165012
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -57,15 +58,15 @@ object CurrentOrigin {
 
   def reset(): Unit = value.set(Origin())
 
-  def setPosition(line: Int, start: Int): Unit = {
+  def setPosition(callSite: String, line: Int, start: Int): Unit = {
 value.set(
-  value.get.copy(line = Some(line), startPosition = Some(start)))
+  value.get.copy(callSite = Some(callSite), line = Some(line), 
startPosition = Some(start)))
   }
 
   def withOrigin[A](o: Origin)(f: => A): A = {
+val current = get
 set(o)
-val ret = try f finally { reset() }
-reset()
+val ret = try f finally { set(current) }
--- End diff --

It might correct change but I noticed that after this change, we have 
another issue when we operate `DataFrame` using both DSL like API and 
SQL/HiveQL.

For example, If we have follwing code and run it.
```
val df = sc.parallelize(1 to 10).toDF
val filtered = df.filter("_1 > 4")
val selected = filtered.select($"_1" * 10)
selected.show()
```

And then, we have generated code like as follows.

```
...

/* 055 */   while (rdd_batchIdx < numRows) {
/* 056 */ InternalRow rdd_row = rdd_batch.getRow(rdd_batchIdx++);
/* 057 */ /* input[0, int] */
/* 058 */ boolean rdd_isNull = rdd_row.isNullAt(0);
/* 059 */ int rdd_value = rdd_isNull ? -1 : (rdd_row.getInt(0));
/* 060 */ /* (input[0, int] > 4) @ filter at SPARK13432.scala:14 */
/* 061 */ boolean filter_isNull = true;
/* 062 */ boolean filter_value = false;
/* 063 */ 
/* 064 */ if (!rdd_isNull) {
/* 065 */   filter_isNull = false; // resultCode could change 
nullability.
/* 066 */   filter_value = rdd_value > 4;
/* 067 */   
/* 068 */ }
/* 069 */ if (!filter_isNull && filter_value) {
/* 070 */   filter_metricValue.add(1);
/* 071 */   
/* 072 */   /* (input[0, int] * 10) @ filter at SPARK13432.scala:14 
*/
/* 073 */   boolean project_isNull = true;
/* 074 */   int project_value = -1;

...
```
At the line #072, it should not be `filter` and the line of original code 
is not 14.
I think, the comment should just say /* (input[0, int] * 10 */.

This issue is because origin is not reset properly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193113588
  
Now, it's **763** seconds. It looks minimal and seems to use 4 processes 
fully.
```
Tests passed in 763 seconds
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13231] Make count failed values a user ...

2016-03-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/5#issuecomment-193112524
  
@andrewor14 Can you take a look ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193098087
  
**[Test build #52542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52542/consoleFull)**
 for PR 11487 at commit 
[`7ac9648`](https://github.com/apache/spark/commit/7ac9648f43ce6989827c793f3f1872558baaa4ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-193096199
  
**[Test build #52541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52541/consoleFull)**
 for PR 11487 at commit 
[`0055fd1`](https://github.com/apache/spark/commit/0055fd1cf4b1f4cea91fd5a4f89589d82715f2c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193095096
  
**[Test build #52540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52540/consoleFull)**
 for PR 11497 at commit 
[`93d6e69`](https://github.com/apache/spark/commit/93d6e6970325d67ebb6b92e0c77b078507627843).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193094294
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193089248
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193089252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52539/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193089010
  
**[Test build #52539 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52539/consoleFull)**
 for PR 11551 at commit 
[`668c7b1`](https://github.com/apache/spark/commit/668c7b12b380f5f8f1020faf2594a95cac95453c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-06 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11487#discussion_r55160833
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -237,4 +241,45 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(anyTypes.forall(!_.isPrimitive))
 assert(anyTypes === Seq(classOf[java.lang.Object], 
classOf[java.lang.Object]))
   }
+
+  private def testThreadSafetyFor(name: String)(exec: () => Any) = {
+test(s"thread safety of ${name}") {
+  for (_ <- 0 until 100) {
--- End diff --

@srowen Thank you for your comment.

> `(0 until 100).foreach`?

I repeated the test 100 times here because it is for thread-safety. Thread 
safety problem sometimes happens but sometimes doesn't.

> You can import `java.net.URLClassLoader`.

I'll modify to use import.

> It doesn't really seem like you need a method here; it took a moment to 
see there was a test in here.

I'll modify to move out of the method.

> Maybe it's obvious to you but why do all these classes/methods need to be 
tested separately?

The methods are public, i.e. can be called by multi-thread, so I thought 
these also need to be tested.
But I'm wondering some of them could be removed?

> And is this locking still safe in 2.11?

Yes, reflection in Scala 2.11 is thread-safe.
If we don't support Scala 2.10, these lockings in `ScalaReflection` would 
not be needed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193082740
  
**[Test build #52539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52539/consoleFull)**
 for PR 11551 at commit 
[`668c7b1`](https://github.com/apache/spark/commit/668c7b12b380f5f8f1020faf2594a95cac95453c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193082802
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread dilipbiswal
Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193074477
  
@cloud-fan Can we trigger a test please ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...

2016-03-06 Thread oliverpierson
Github user oliverpierson commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-193073200
  
This is still a work in progress, just wanted to get the PR up so it's on 
the radar.  Still need to:

- [ ] add an external Parameter (with default value) for setting the 
acceptable error
- [ ] Investigated whether or not +/- Infinity need to be add to the 
splits/quantiles given by approxQuantiles


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-193072808
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...

2016-03-06 Thread oliverpierson
GitHub user oliverpierson opened a pull request:

https://github.com/apache/spark/pull/11553

[SPARK-13600] [MLlib] [WIP] Incorrect number of buckets in 
QuantileDiscretizer

## What changes were proposed in this pull request?
QuantileDiscretizer can return an unexpected number of buckets in certain 
cases.  This PR proposes to fix this issue and also refactor 
QuantileDiscretizer to use approxQuantiles from DataFrame stats functions.
## How was this patch tested?
QuantileDiscretizerSuite unit tests (some existing tests will change or 
even be removed in this PR)





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/oliverpierson/spark SPARK-13600

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11553


commit 7ff2da10141378cb9511672964f85615f937484d
Author: Oliver Pierson 
Date:   2016-03-07T03:07:20Z

refactored QuantileDiscretizer to use dataframe stats




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193071199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193071201
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52535/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193070983
  
**[Test build #52535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52535/consoleFull)**
 for PR 11550 at commit 
[`5c990cd`](https://github.com/apache/spark/commit/5c990cd8c996c9a624439749d8809624c2457051).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11497#issuecomment-193066450
  
LGTM, cc @davies (who fixed this special case before)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193065038
  
Hi, @JoshRosen .
Could you review this, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-06 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-193065041
  
Sorry for the late reply and I have some comments. I'll leave it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193064926
  
As I wrote in Jira Issue, the total time of all tests are **3077s**. So, 
the minimum required time for 4 processes was 769s. According to the real 
Jenkins result, it is observed **804** now; about 160 seconds reduction.
```
Tests passed in 804 seconds
```

In case of removal `PyPy` and `Python3.4`, this priority queue reduces the 
total running time than FIFO queue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-06 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193064804
  
ping @marmbrus @rxin @davies @liancheng Is this ready to go? Or you have 
other comments? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193062807
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52538/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193062804
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193062184
  
**[Test build #52538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52538/consoleFull)**
 for PR 11551 at commit 
[`6a0e099`](https://github.com/apache/spark/commit/6a0e09907eb60c12074fb32b6bdfff574d64ccf2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13034] Add export/import for all estima...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11552#issuecomment-193057635
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13034] Add export/import for all estima...

2016-03-06 Thread GayathriMurali
GitHub user GayathriMurali opened a pull request:

https://github.com/apache/spark/pull/11552

[SPARK-13034] Add export/import for all estimators and transformers(w…

## What changes were proposed in this pull request?
Add export/import for all estimators and transformers(which have Scala 
implementation) under pyspark/ml/classification.py.

JIRA : https://issues.apache.org/jira/browse/SPARK-13034

## How was this patch tested?

Unit tests added to tests.py

…hich have Scala implementation) under pyspark/ml/classification.py

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GayathriMurali/spark SPARK-13034

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11552


commit 4760bcb73913dcb80f2419ce0fc989a119f02044
Author: GayathriMurali 
Date:   2016-03-07T02:18:18Z

[SPARK-13034] Add export/import for all estimators and transformers(which 
have Scala implementation) under pyspark/ml/classification.py




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193048081
  
**[Test build #52538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52538/consoleFull)**
 for PR 11551 at commit 
[`6a0e099`](https://github.com/apache/spark/commit/6a0e09907eb60c12074fb32b6bdfff574d64ccf2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193042342
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193042335
  
**[Test build #52537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52537/consoleFull)**
 for PR 11551 at commit 
[`69fc65b`](https://github.com/apache/spark/commit/69fc65b63cac71fd733976862abc902cb2e37ecc).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193042343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52537/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11551#issuecomment-193041678
  
**[Test build #52537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52537/consoleFull)**
 for PR 11551 at commit 
[`69fc65b`](https://github.com/apache/spark/commit/69fc65b63cac71fd733976862abc902cb2e37ecc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11128#issuecomment-193039764
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11128#issuecomment-193039715
  
**[Test build #52536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52536/consoleFull)**
 for PR 11128 at commit 
[`02b4e22`](https://github.com/apache/spark/commit/02b4e2235c5421458cb8fa96c734c54c9bad9457).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11128#issuecomment-193039765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52536/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12243][BUILD][PYTHON] PySpark tests are...

2016-03-06 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/11551

[SPARK-12243][BUILD][PYTHON] PySpark tests are slow in Jenkins.

## What changes were proposed in this pull request?

In the Jenkins pull request builder, PySpark tests take around [962 seconds 
](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/console)
 of end-to-end time to run, despite the fact that we run four Python test 
suites in parallel. According to the log, the basic reason is that the long 
running test starts at the end due to FIFO queue. We first try to reduce the 
test time by just starting some long running tests first with simple priority 
queue.

```

Running PySpark tests

...
Finished test(python3.4): pyspark.streaming.tests (213s)
Finished test(pypy): pyspark.sql.tests (92s)
Finished test(pypy): pyspark.streaming.tests (280s)
Tests passed in 962 seconds
```

## How was this patch tested?

Manual check.
Check 'Running PySpark tests' part of the Jenkins log.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-12243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11551.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11551


commit 69fc65b63cac71fd733976862abc902cb2e37ecc
Author: Dongjoon Hyun 
Date:   2016-03-06T19:52:01Z

PySpark tests are slow in Jenkins.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13015][Docs] Replace example code in ml...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11128#issuecomment-193038016
  
**[Test build #52536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52536/consoleFull)**
 for PR 11128 at commit 
[`02b4e22`](https://github.com/apache/spark/commit/02b4e2235c5421458cb8fa96c734c54c9bad9457).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193035803
  
@falaki Would you maybe review this please..?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193035983
  
**[Test build #52535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52535/consoleFull)**
 for PR 11550 at commit 
[`5c990cd`](https://github.com/apache/spark/commit/5c990cd8c996c9a624439749d8809624c2457051).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193035562
  
@rxin There should be a conflict with 
https://github.com/apache/spark/pull/11315 which I think it's supposed to be 
merged (assuming from your comment).

I will resolve the conflict as soon as either this one or that one is 
merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-06 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/11550

[SPARK-13667][SQL] Support for specifying custom date format for date and 
timestamp types at CSV datasource.

## What changes were proposed in this pull request?

This PR adds the support to specify custom date format for `DateType` and 
`TimestampType`.

For `TimestampType`, this uses the given format to infer schema and also to 
convert the values
For `DateType`, this uses the given format to convert the values.
If the `dateFormat` is not given, then it works with `Timestamp.valueOf()` 
and `Date.valueOf()` for backwords compatibility.
When it's given, then it uses `SimpleDateFormat` for parsing data.

In addition, `IntegerType`, `DoubleType` and `LongType` have a higher 
priority than `TimestampType` in type inference. This means even if the given 
format is `` or `.MM`, it will be inferred as `IntegerType` or 
`DoubleType`. Since it is type inference, I think it is okay to give such 
precedences.

In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON 
datasource has `json.InferSchema`. Although they have the same names, I did 
this because I thought the parent package name can still differentiate each.  
Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to 
`InferSchemaSuite`.



## How was this patch tested?

unit tests are used and `./dev/run_tests` for coding style tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-13667

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11550


commit 5c990cd8c996c9a624439749d8809624c2457051
Author: hyukjinkwon 
Date:   2016-03-07T01:16:07Z

Support for specifying custom date format for date and timestamp types.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11549#issuecomment-193020982
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11549#issuecomment-193020984
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52534/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11549#issuecomment-193020961
  
**[Test build #52534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52534/consoleFull)**
 for PR 11549 at commit 
[`1cca19e`](https://github.com/apache/spark/commit/1cca19e68d9ef256769594e02d123ce6e3b0bd7d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11549#issuecomment-193015610
  
**[Test build #52534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52534/consoleFull)**
 for PR 11549 at commit 
[`1cca19e`](https://github.com/apache/spark/commit/1cca19e68d9ef256769594e02d123ce6e3b0bd7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-06 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/11549#issuecomment-193015522
  
Since we already have a `glm` in SparkR which is based on 
`LogisticRegressionModel` and `LinearRegressionModel`. There're three ways to 
extend it as I understand:

1. Change the current glm to use `GeneralizedLinearRegression`. Create 
another `lm` interface for sparkR, and use LR as the model. 
2. Keep glm R interface. and replace its implementation with GLM. This 
means R can not invoke LR anymore.
2. Keep glm R interface, and combine the implementation with both LR and 
GLM based on different solver parameter.
I'd prefer to use option 1. And I'm gonna send one PR(WIP) for solution 2, 
which can later be adjusted to 1 or 3.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-03-06 Thread hhbyyh
GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/11549

[SPARK-12566] [ML] [WIP] GLM model family, link function support in 
SparkR:::glm

## What changes were proposed in this pull request?

This JIRA is for extending the support of MLlib's Generalized Linear Models 
(GLMs) to more model families and link functions in SparkR. After SPARK-12811, 
we should be able to wrap GeneralizedLinearRegression in SparkR with support of 
popular families and link functions.


## How was this patch tested?

WIP, some manual test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark glmR

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11549


commit 6c933650389798f8e3caf3e50604bceae79a126e
Author: Yuhao Yang 
Date:   2016-03-06T23:00:44Z

change R glm to use GLM

commit 1cca19e68d9ef256769594e02d123ce6e3b0bd7d
Author: Yuhao Yang 
Date:   2016-03-06T23:27:58Z

refine family




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193013801
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193013802
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52533/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193013759
  
**[Test build #52533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52533/consoleFull)**
 for PR 11108 at commit 
[`3329394`](https://github.com/apache/spark/commit/33293947bde90fd29014587cd42533df121bd783).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193012431
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193012434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52532/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193012216
  
**[Test build #52532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52532/consoleFull)**
 for PR 11108 at commit 
[`e2737ee`](https://github.com/apache/spark/commit/e2737eedd6c45c82f25045442b1d811ab2c395ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193011222
  
**[Test build #52533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52533/consoleFull)**
 for PR 11108 at commit 
[`3329394`](https://github.com/apache/spark/commit/33293947bde90fd29014587cd42533df121bd783).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193010336
  
**[Test build #52532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52532/consoleFull)**
 for PR 11108 at commit 
[`e2737ee`](https://github.com/apache/spark/commit/e2737eedd6c45c82f25045442b1d811ab2c395ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13019][Docs] Replace example code in ml...

2016-03-06 Thread keypointt
Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/11108#issuecomment-193010119
  
thanks a lot @yinxusen , I'm fixing the import format in the other PRs, 
will commit soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...

2016-03-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11527#issuecomment-193008392
  
Hi, @mengxr and @jkbradley .
Could you review this PR, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13651] Generator outputs are not resolv...

2016-03-06 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/11497#discussion_r55149604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -512,6 +512,9 @@ class Analyzer(
 
   // A special case for Generate, because the output of Generate 
should not be resolved by
   // ResolveReferences. Attributes in the output will be resolved by 
ResolveGenerate.
+  case g @ Generate(generator, _, _, _, _, _)
+if !g.resolved && generator.resolved => g
+
   case g @ Generate(generator, join, outer, qualifier, output, child)
--- End diff --

@cloud-fan Thanks !! Made the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11527#issuecomment-193003088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52530/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11527#issuecomment-193003083
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13686][MLLIB][STREAMING] Add a construc...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11527#issuecomment-193002661
  
**[Test build #52530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/consoleFull)**
 for PR 11527 at commit 
[`92be84f`](https://github.com/apache/spark/commit/92be84f7b6bc45fdd82ae21d8e1245d0549e0f83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9912#issuecomment-193001422
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9912#issuecomment-193001424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52531/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9912#issuecomment-193001372
  
**[Test build #52531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52531/consoleFull)**
 for PR 9912 at commit 
[`cc2eb44`](https://github.com/apache/spark/commit/cc2eb44afecc442649e2b20369b78c31506d3597).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...

2016-03-06 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9912#discussion_r55147885
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
 ---
@@ -167,19 +167,15 @@ class RandomForestClassifierSuite extends 
SparkFunSuite with MLlibTestSparkConte
   .setSeed(123)
 
 // In this data, feature 1 is very important.
-val data: RDD[LabeledPoint] = sc.parallelize(Seq(
-  new LabeledPoint(0, Vectors.dense(1, 0, 0, 0, 1)),
-  new LabeledPoint(1, Vectors.dense(1, 1, 0, 1, 0)),
-  new LabeledPoint(1, Vectors.dense(1, 1, 0, 0, 0)),
-  new LabeledPoint(0, Vectors.dense(1, 0, 0, 0, 0)),
-  new LabeledPoint(1, Vectors.dense(1, 1, 0, 0, 0))
-))
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
 val categoricalFeatures = Map.empty[Int, Int]
 val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
 
 val importances = rf.fit(df).featureImportances
 val mostImportantFeature = importances.argmax
 assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
--- End diff --

I updated the feature importance tests here, as well, with additional 
checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...

2016-03-06 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9912#discussion_r55147865
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -169,6 +169,20 @@ final class DecisionTreeClassificationModel 
private[ml] (
 s"DecisionTreeClassificationModel (uid=$uid) of depth $depth with 
$numNodes nodes"
   }
 
+  /**
+   * Estimate of the importance of each feature.
+   *
+   * This generalizes the idea of "Gini" importance to other losses,
+   * following the explanation of Gini importance from "Random Forests" 
documentation
+   * by Leo Breiman and Adele Cutler, and following the implementation 
from scikit-learn.
--- End diff --

I added a note in the docs for `DecisionTreeRegressor` and 
`DecisionTreeClassifier`. I can update the format or the wording if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11861][ML] Add feature importances for ...

2016-03-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9912#issuecomment-192986914
  
**[Test build #52531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52531/consoleFull)**
 for PR 9912 at commit 
[`cc2eb44`](https://github.com/apache/spark/commit/cc2eb44afecc442649e2b20369b78c31506d3597).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >