date:20170511

[GitHub] spark pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for v...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17953#discussion_r116157223
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1504,6 +1504,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case ("decimal", precision :: Nil) => 
DecimalType(precision.getText.toInt, 0)
   case ("decimal", precision :: scale :: Nil) =>
 DecimalType(precision.getText.toInt, scale.getText.toInt)
+  case ("void", Nil) => NullType
--- End diff --

Hive 2.x disables it. Could you add some test cases by reading and writing 
the tables? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17959
  
merged to master/2.2, please send a follow-up PR to address @gatorsmile 's 
comments, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r116156736
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence (We follow Oracle operator precedence: 
https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691)
--- End diff --

ok!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17711
  
@maropu The solution using `tailrec` looks more straightforward. Could you 
submit the PR based on that? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r116156556
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence (We follow Oracle operator precedence: 
https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691)
--- End diff --

The link could be ineffective in the future. Could you also copy the table 
contents here? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...

2017-05-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17959


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17959
  
How about `HiveTableScanExec`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17959#discussion_r116156087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -519,8 +519,18 @@ case class FileSourceScanExec(
   relation,
   output.map(QueryPlan.normalizeExprId(_, output)),
   requiredSchema,
-  partitionFilters.map(QueryPlan.normalizeExprId(_, output)),
-  dataFilters.map(QueryPlan.normalizeExprId(_, output)),
+  canonicalizeFilters(partitionFilters, output),
+  canonicalizeFilters(dataFilters, output),
   None)
   }
+
+  private def canonicalizeFilters(filters: Seq[Expression], output: 
Seq[Attribute])
--- End diff --

Add a function description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116155996
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
+def mayResolveAttrByAggregateExprs(exprs: Seq[Expression]): 
Seq[Expression] = exprs.map {
--- End diff --

I think we should do `exprs.map { _.transform { ...` like above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14582
  
We could ask it to mailing-list if you strongly feel about this. For 
example, `from_json` function was also asked too to mailing list before getting 
merged.

I think we should not add all the variants just for consistency and this is 
why I asked more interests. There are many variants for language-specific and 
application-specific and I usually stay against if there is an easy workaround 
and looks a kind of variant.

I wouldn't stay against if there are more demands or interests for this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116155780
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
--- End diff --

`groups` should cover `selectedGroups`. So we may not need to add 
`selectedGroups` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17948
  
LGTM too.  : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread ghoto

Github user ghoto commented on a diff in the pull request:

https://github.com/apache/spark/pull/17940#discussion_r116155652
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -992,7 +992,20 @@ object Matrices {
 new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
   case sm: BSM[Double] =>
 // There is no isTranspose flag for sparse matrices in Breeze
-new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, 
sm.data)
+
+// Some Breeze CSCMatrices may have extra trailing zeros in
+// .rowIndices and .data, which are added after some matrix
+// operations for efficiency.
+//
+// Therefore the last element of sm.colPtrs would no longer be
+// coherent with the size of sm.rowIndices and sm.data
+// despite sm being a valid CSCMatrix.
+// We need to truncate both arrays (rowIndices, data)
+// to the real size of the vector sm.activeSize to allow valid 
conversion
+
+val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize)
+val truncData = sm.data.slice(0, sm.activeSize)
--- End diff --

I'm implementing both suggestions, however, wouldn't be the sm.copy more 
expensive than just doing those 2 slices?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76843/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17959
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17959
  
**[Test build #76843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76843/testReport)**
 for PR 17959 at commit 
[`9ec86ec`](https://github.com/apache/spark/commit/9ec86ec1941bf0c329f4c6a1fb75271e91e51660).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait DataSourceScanExec extends LeafExecNode with CodegenSupport with 
PredicateHelper `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17957
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17957
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76842/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17957
  
**[Test build #76842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76842/testReport)**
 for PR 17957 at commit 
[`4032940`](https://github.com/apache/spark/commit/40329404299ece70aef7ef245704978fb9d1e6f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-05-11 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16985
  
@cloud-fan : I have made suggested change(s).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17956
  
LGTM except for a minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116154314
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   assert(errMsg.startsWith("The field for corrupt records must be 
string type and nullable"))
 }
   }
+
+  test("SPARK-18772: Parse special floats correctly") {
+// positive cases
+val jsons = Seq(
+  """{"a": "-INF"}""",
+  """{"a": "INF"}""",
+  """{"a": "-INF"}""",
+  """{"a": "NaN"}""",
+  """{"a": "Infinity"}""",
+  """{"a": "+Infinity"}""",
+  """{"a": "-Infinity"}""")
+
+val checks: Seq[Double => Boolean] = Seq(
+  _.isNegInfinity,
+  _.isPosInfinity,
+  _.isNegInfinity,
+  _.isNaN,
+  _.isPosInfinity,
+  _.isPosInfinity,
+  _.isNegInfinity)
+
+Seq(FloatType, DoubleType).foreach { dt =>
+  jsons.zip(checks).foreach { case (json, check) =>
+val ds = spark.read
+  .schema(StructType(Seq(StructField("a", dt
+  .json(Seq(json).toDS())
+  .select($"a".cast(DoubleType)).as[Double]
+assert(check(ds.first()))
+  }
+}
+
+// negative case
+Seq(FloatType, DoubleType).foreach { dt =>
+  val e = intercept[SparkException] {
+spark.read
+  .option("mode", "FAILFAST")
+  .schema(StructType(Seq(StructField("a", dt
+  .json(Seq( """{"a": "nan"}""").toDS())
--- End diff --

Shall we also test other negative cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17052
  
Yea, I just wanted to check if it is in progress in any way. Thanks for 
your input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...

2017-05-11 Thread lalinsky

Github user lalinsky commented on the issue:

https://github.com/apache/spark/pull/14582
  
You were asking for more interest in the feature, there was no way I could 
answer that. :)

Regarding the change itself, the system can already auto cast integer to a 
timestamp, but not a floating point number. Floating number timestamps are 
pretty common in a Python ecosystem, more so than integer ones. From my point 
of view, that's an inconsistent and surprising behavior, that I wanted to 
correct. I wouldn't send the patch if it didn't work for any number, but having 
it done for just one number type seemed wrong to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17711
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76840/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17711
  
**[Test build #76840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76840/testReport)**
 for PR 17711 at commit 
[`089db30`](https://github.com/apache/spark/commit/089db30958d2d78b131ed10eea0b733a18056bf7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-05-11 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17052
  
@HyukjinKwon Sorry! Busy for this period of time. Let me resolve this 
conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76841/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17956
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17956
  
**[Test build #76841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)**
 for PR 17956 at commit 
[`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17935: [SPARK-20690][SQL] Analyzer shouldn't add missing...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17935#discussion_r116153501
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -868,6 +868,29 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(3, 3.0, 2, 3.0) :: Row(3, 3.0, 2, 3.0) :: Nil)
   }
 
+  test("SPARK-20690: Do not add missing attributes through subqueries") {
+withTempView("onerow") {
+  Seq(1).toDF("c1").createOrReplaceTempView("onerow")
+
+  val e = intercept[AnalysisException] {
+sql(
+  """
+| select 1
+| from   (select 1 from onerow t1 LIMIT 1)
--- End diff --

I'm surprised we support this syntax, I think the FROM clause must have an 
alias.

I checked with postgres, it will throw exception `subquery in FROM must 
have an alias`, can you check with other databases? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17948
  
**[Test build #76849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76849/testReport)**
 for PR 17948 at commit 
[`a809274`](https://github.com/apache/spark/commit/a8092742b99c9d43b04b4a4941345f179996a50f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17960#discussion_r116153263
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/limit.sql ---
@@ -1,23 +1,27 @@
 
 -- limit on various data types
-select * from testdata limit 2;
-select * from arraydata limit 2;
-select * from mapdata limit 2;
+SELECT * FROM testdata LIMIT 2;
--- End diff --

I just wonder why these should be upper-cased just for curiosity. Is this 
way preferred?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116153034
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,31 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean =
+  !child.output.exists(a => resolver(a.name, attrName))
--- End diff --

Thanks! Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17644#discussion_r116152814
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -871,6 +886,23 @@ private[hive] object HiveClientImpl {
   hiveTable.setViewOriginalText(t)
   hiveTable.setViewExpandedText(t)
 }
+
+table.bucketSpec match {
+  case Some(bucketSpec) =>
+hiveTable.setNumBuckets(bucketSpec.numBuckets)
+hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava)
--- End diff --

For data source table, which can be created by `CREATE TABLE src(...) USING 
parquet ...`, the bucketing information is in table properties, and hive will 
always read this table as a non-bucketed table.

After your PR, for bucketed data source tables written by Spark, Hive will 
read them as bucketed tables and cause problems, because the bucket hashing 
function is different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116152643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,31 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean =
+  !child.output.exists(a => resolver(a.name, attrName))
--- End diff --

Nit: style
```Scala
private def notResolvableByChild(attrName: String, child: LogicalPlan): 
Boolean = {
  !child.output.exists(a => resolver(a.name, attrName))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17959
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17948
  
**[Test build #76848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76848/testReport)**
 for PR 17948 at commit 
[`0163656`](https://github.com/apache/spark/commit/0163656b8e5325cda7b80e0c0268c24608e9b871).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17960
  
**[Test build #76847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76847/testReport)**
 for PR 17960 at commit 
[`b4a4b0a`](https://github.com/apache/spark/commit/b4a4b0aee836cd6f8944716bc84323487527fc19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-11 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17960

[SPARK-20719] [SQL] Support LIMIT ALL

### What changes were proposed in this pull request?
`LIMIT ALL` is the same as omitting the `LIMIT` clause. It is supported by 
both PrestgreSQL and Presto. This PR is to support it by adding it in the 
parser. 

### How was this patch tested?
Added a test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark LimitAll

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17960


commit b4a4b0aee836cd6f8944716bc84323487527fc19
Author: Xiao Li 
Date:   2017-05-12T04:34:37Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116151791
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,30 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
in alias expressions
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17948
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116150714
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,30 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
in alias expressions
--- End diff --

`... only if the expression is not resolvable by child`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17936: [SPARK-20638][Core][WIP]Optimize the CartesianRDD to red...

2017-05-11 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17936
  
The cluster test result. The `RDD.cartesian` is used in Spark mllib ALS 
algorithm, and compared with the latest spark master branch. 

Environments:  Spark on Yarn with 9 executors(10 cores & 30 GB Mem) on 
three nodes.
Test Data: The Data: User 480,000, and Item 17,000.

Test Case:
```
object TestNetflixlib {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Test Netflix mlib")
val sc = new SparkContext(conf)

val data = sc.textFile("hdfs://10.1.2.173:9000/nf_training_set.txt")

val ratings = data.map(_.split("::") match {
  case Array(user, item, rate) => Rating(user.toInt, item.toInt, 
rate.toDouble)
})

val rank = 0
val numIterations = 10
val train_start = System.nanoTime()
val model = ALS.train(ratings, rank, numIterations, 0.01)
val training_time = (System.nanoTime() - train_start)/ 1e9
println(s"Training time(s): $training_time")

val rec_start = System.nanoTime()
val userRec = model.recommendProductsForUsers(20)
println(userRec.count())
val rec_time = (System.nanoTime() - rec_start) / 1e9
println(s"Recommend time(s): $rec_time")
  }
}
```

Test Results:
| Master Branch | Improved Branch | Percentage of ascension |
| --| -- | -- |
| 139.934s | 162.597s | 16 % |
| 148.138s | 157.597s | 6% |
| 157.899s | 189.580s | 20% |
| 135.520s | 152.486s | 13% |
| 166.101s | 184.485s | 11 % |


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-05-11 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16697
  
Yes. If inside, you are right - only the first will be logged !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-05-11 Thread CodingCat

Github user CodingCat commented on the issue:

https://github.com/apache/spark/pull/16697
  
you mean outside of 
https://github.com/apache/spark/pull/16697/files#diff-ca0fe05a42fd5edcab8a1bdaa8e58db9R210?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17942#discussion_r116148995
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -115,26 +115,33 @@ private[spark] abstract class Task[T](
   case t: Throwable =>
 e.addSuppressed(t)
 }
+context.markTaskCompleted(Some(e))
 throw e
 } finally {
-  // Call the task completion callbacks.
-  context.markTaskCompleted()
   try {
-Utils.tryLogNonFatalError {
-  // Release memory used by this thread for unrolling blocks
-  
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP)
-  
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.OFF_HEAP)
-  // Notify any tasks waiting for execution memory to be freed to 
wake up and try to
-  // acquire memory again. This makes impossible the scenario 
where a task sleeps forever
-  // because there are no other tasks left to notify it. Since 
this is safe to do but may
-  // not be strictly necessary, we should revisit whether we can 
remove this in the future.
-  val memoryManager = SparkEnv.get.memoryManager
-  memoryManager.synchronized { memoryManager.notifyAll() }
-}
+// Call the task completion callbacks. If "markTaskCompleted" is 
called twice, the second
+// one is no-op.
--- End diff --

Missed this comment.
LGTM. Thanks for clarifying @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17942#discussion_r116148841
  
--- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala ---
@@ -55,14 +55,16 @@ class TaskCompletionListenerException(
   extends RuntimeException {
 
   override def getMessage: String = {
-if (errorMessages.size == 1) {
--- End diff --

Thx for clarifying !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17956
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17906: [SPARK-20665][SQL]"Bround" and "Round" function return N...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17906
  
thanks, merging to master/2.2/2.1/2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17906: [SPARK-20665][SQL]"Bround" and "Round" function r...

2017-05-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17906


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17959
  
also cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17956
  
**[Test build #76846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)**
 for PR 17956 at commit 
[`aa7c658`](https://github.com/apache/spark/commit/aa7c6580a733c0d964cabd1fcabf1f2730227f10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17958
  
**[Test build #76845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76845/testReport)**
 for PR 17958 at commit 
[`1e36134`](https://github.com/apache/spark/commit/1e361344101ccaf3d8a9ddebe6767b610f0916ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/17958
  
@zsxwing @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17958
  
**[Test build #76844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76844/testReport)**
 for PR 17958 at commit 
[`e10101e`](https://github.com/apache/spark/commit/e10101eafe2329031d079977ab3f3e0aaee98908).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17959
  
**[Test build #76843 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76843/testReport)**
 for PR 17959 at commit 
[`9ec86ec`](https://github.com/apache/spark/commit/9ec86ec1941bf0c329f4c6a1fb75271e91e51660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17959
  
cc @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...

2017-05-11 Thread wzhfy

GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/17959

[SPARK-20718][SQL] FileSourceScanExec with different filter orders should 
be the same after canonicalization

## What changes were proposed in this pull request?

Since `constraints` in `QueryPlan` is a set, the order of filters can 
differ. Usually this is ok because of canonicalization. However, in 
`FileSourceScanExec`, its data filters and partition filters are sequences, and 
their orders are not canonicalized. So `def sameResult` returns different 
results for different orders of data/partition filters. This leads to, e.g. 
different decision for `ReuseExchange`, and thus results in unstable 
performance.

## How was this patch tested?

Added a new test for `FileSourceScanExec.sameResult`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark canonicalizeFileSourceScanExec

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17959


commit 9ec86ec1941bf0c329f4c6a1fb75271e91e51660
Author: wangzhenhua 
Date:   2017-05-12T03:11:34Z

same result for FileSourceScanExec with different filter orders




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17958: [SPARK-20716][SS] StateStore.abort() should not t...

2017-05-11 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/17958

[SPARK-20716][SS] StateStore.abort() should not throw exceptions

## What changes were proposed in this pull request?

StateStore.abort() should do a best effort attempt to clean up temporary 
resources. It should not throw errors, especially because its called in a 
TaskCompletionListener, because this error could hide previous real errors in 
the task.

## How was this patch tested?
No unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-20716

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17958


commit e10101eafe2329031d079977ab3f3e0aaee98908
Author: Tathagata Das 
Date:   2017-05-12T03:21:54Z

Ignored exceptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17887: [SPARK-20399][SQL] Add a config to fallback strin...

2017-05-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17887


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-11 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17887
  
Thanks @cloud-fan @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17887
  
thanks, merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17954: [SPARK-20714] [SS] Fix match error when watermark is set...

2017-05-11 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/17954
  
@marmbrus @zsxwing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116145813
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -127,13 +126,15 @@ class JacksonParser(
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
   val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
+  if (value.equals("NaN") ||
+value.equals("Infinity") ||
+value.equals("+Infinity") ||
+value.equals("-Infinity")) {
 value.toFloat
+  } else if (value.equals("+INF") || value.equals("INF")) {
--- End diff --

how about
```
parser.getText match {
  case "NaN" => Float.NaN
  case "+INF" | "INF" | "+Infinity" | "Infinity" => Float.PositiveInfinity
  case "-INF" | "-Infinity" => Float.NegativeInfinity
  case other => throw new RuntimeException(s"Cannot parse $other as 
FloatType.")
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17957
  
**[Test build #76842 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76842/testReport)**
 for PR 17957 at commit 
[`4032940`](https://github.com/apache/spark/commit/40329404299ece70aef7ef245704978fb9d1e6f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/17957
  
@marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWi...

2017-05-11 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/17957

[SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior

## What changes were proposed in this pull request?

Timeout and state data are two independent entities and should be settable 
independently. Therefore, in the same call of the user-defined function, one 
should be able to set the timeout before initializing the state and also after 
removing the state. Whether timeouts can be set or not, should not depend on 
the current state, and vice versa. 

However, a limitation of the current implementation is that state cannot be 
null while timeout is set. This is checked lazily after the function call has 
completed.

## How was this patch tested?
- Updated existing unit tests that test the behavior of 
GroupState.setTimeout*** wrt to the current state
- Added new tests that verify the disallowed cases where state is undefined 
but timeout is set.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-20717

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17957.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17957


commit 3cfd446b0a926aad115acdf5bd8dc8cd03ee743a
Author: Tathagata Das 
Date:   2017-05-12T02:31:15Z

Tweaks to the mapGroupsWithState behavior

commit 40329404299ece70aef7ef245704978fb9d1e6f9
Author: Tathagata Das 
Date:   2017-05-12T03:06:52Z

More tweaks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76839/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17858
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17858
  
**[Test build #76839 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76839/testReport)**
 for PR 17858 at commit 
[`639d63a`](https://github.com/apache/spark/commit/639d63a20e94523f3443bc83b272fc60c1f5627a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17222#discussion_r116145107
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
 ---
@@ -20,16 +20,19 @@ package org.apache.spark.sql.hive.execution
 import scala.collection.JavaConverters._
 import scala.util.Random
 
+import _root_.test.org.apache.spark.sql.MyDoubleAvg
+import _root_.test.org.apache.spark.sql.MyDoubleSum
--- End diff --

`_root_`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17955
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76838/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17955
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17955
  
**[Test build #76838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76838/testReport)**
 for PR 17955 at commit 
[`e3da298`](https://github.com/apache/spark/commit/e3da298d59c764388ec6ca93ec23ba3eb8de96d3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17222#discussion_r116144809
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -491,20 +491,42 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
 case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
-case n => logError(s"UDF class with ${n} type arguments is not 
supported ")
+case n =>
+  throw new IOException(s"UDF class with ${n} type arguments 
is not supported.")
--- End diff --

why IOException?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116144531
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -127,13 +126,15 @@ class JacksonParser(
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
   val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
+  if (value.equals("NaN") ||
--- End diff --

https://github.com/apache/spark/pull/9759#r63321521

> "infinity".toDouble, "inf".toDouble are not legal. These non-numeric 
numbers are case-sensitive, both for Jackson and Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17956
  
cc @NathanHowell, @cloud-fan and @viirya.

(I just want to note this will not change any input/output but just the 
exception type and avoid additional conversion try.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17956
  
**[Test build #76841 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)**
 for PR 17956 at commit 
[`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/17956

[SPARK-18772][SQL] Avoid unnecessary conversion try for special floats in 
JSON and add related tests

## What changes were proposed in this pull request?

This PR is based on  https://github.com/apache/spark/pull/16199 and 
extracts the valid change from https://github.com/apache/spark/pull/9759 to 
resolve SPARK-18772

It hardly changes input/output at all but just avoid additional conversion 
try with `toFloat` and `toDouble`.

**Before**

```scala
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala> spark.read.schema(StructType(Seq(StructField("a", 
DoubleType.option("mode", "FAILFAST").json(Seq("""{"a": 
"nan"}""").toDS).show()
17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.NumberFormatException: For input string: "nan"
...
```

**After**

```scala
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala> spark.read.schema(StructType(Seq(StructField("a", 
DoubleType.option("mode", "FAILFAST").json(Seq("""{"a": 
"nan"}""").toDS).show()
17/05/12 11:44:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: Cannot parse nan as DoubleType.
...
```

## How was this patch tested?

Unit tests added in `JsonSuite`.

Closes #16199

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-18772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17956


commit f83eec7a4c8fce223c5fd28b411fe2d1ae1da8dd
Author: Nathan Howell 
Date:   2016-12-07T23:32:14Z

[SPARK-18772][SQL] NaN/Infinite float parsing in JSON is inconsistent

commit 660a2843050d99b15ca7676ead8b8be4117267f1
Author: hyukjinkwon 
Date:   2017-05-12T02:22:27Z

Avoid unnecessary cast try for special floats in JSON and add related tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/17942#discussion_r116143769
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -115,26 +115,33 @@ private[spark] abstract class Task[T](
   case t: Throwable =>
 e.addSuppressed(t)
 }
+context.markTaskCompleted(Some(e))
 throw e
 } finally {
-  // Call the task completion callbacks.
-  context.markTaskCompleted()
   try {
-Utils.tryLogNonFatalError {
-  // Release memory used by this thread for unrolling blocks
-  
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP)
-  
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.OFF_HEAP)
-  // Notify any tasks waiting for execution memory to be freed to 
wake up and try to
-  // acquire memory again. This makes impossible the scenario 
where a task sleeps forever
-  // because there are no other tasks left to notify it. Since 
this is safe to do but may
-  // not be strictly necessary, we should revisit whether we can 
remove this in the future.
-  val memoryManager = SparkEnv.get.memoryManager
-  memoryManager.synchronized { memoryManager.notifyAll() }
-}
+// Call the task completion callbacks. If "markTaskCompleted" is 
called twice, the second
+// one is no-op.
+context.markTaskCompleted(None)
--- End diff --

@mridulm there is a `completed` flag in `markTaskCompleted`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-11 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/17633
  
Hey guys. Just a quick update. I made good progress on implementing 
multi-version testing today, however it's not quite ready. I'm going to be on 
leave from tomorrow through the rest of next week, so I'm kind of doubtful I'll 
push anything new until May 22nd.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17711
  
I quickly brushed up the Optimizer code based on your advice:
Using `Stack`:

https://github.com/apache/spark/commit/a17d933aaad5c38a9d9ff20f978c1bad6c774fb1#diff-a1acb054bc376603ef510e6d0ee0R551

Using `tailrec`:

https://github.com/apache/spark/compare/master...maropu:SPARK-19951-3#diff-a1acb054bc376603ef510e6d0ee0R552

I checked the spark style-guide and I probably think we'd better to use 
more readable one. So, `tailrec` is better? I'll submit the `tailrec` one after 
this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76837/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17887
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17887
  
**[Test build #76837 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76837/testReport)**
 for PR 17887 at commit 
[`375eb9c`](https://github.com/apache/spark/commit/375eb9cd747cc75d2f51da1dabe824dbbce52790).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17711
  
**[Test build #76840 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76840/testReport)**
 for PR 17711 at commit 
[`089db30`](https://github.com/apache/spark/commit/089db30958d2d78b131ed10eea0b733a18056bf7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17942#discussion_r116143097
  
--- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala ---
@@ -55,14 +55,16 @@ class TaskCompletionListenerException(
   extends RuntimeException {
 
   override def getMessage: String = {
-if (errorMessages.size == 1) {
--- End diff --

It's a common pattern in scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r116142572
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2017-05-11 Thread sarutak

Github user sarutak commented on the issue:

https://github.com/apache/spark/pull/14719
  
@HyukjinKwon Thanks for pinging me! I still think this issue should be 
fixed but I didn't notice @nsyca's last comment. I'll consider the problem 
which he mentioned soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-05-11 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16697
  
`onDropEvent` is invoked for every dropped event : not just the first.
If all you need is a way to find out what the dropped events were - simply 
enable trace logging for the class after addition of the log line.
With everything else being the same, you will see TRACE log message with 
all the dropped events.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17906: [SPARK-20665][SQL]"Bround" and "Round" function return N...

2017-05-11 Thread 10110346

Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/17906
  
@cloud-fan  Spark 2.0 and  Spark 2.1 have the same issue. I have updated 
the affected versions in the JIRA. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13621: [SPARK-10408] [ML] Implement stacked autoencoder

2017-05-11 Thread JeremyNixon

Github user JeremyNixon commented on the issue:

https://github.com/apache/spark/pull/13621
  
I ran the Keras experiment with code up at [[GitHub link] 
](https://github.com/JeremyNixon/autoencoder) if anyone wants to build on this 
or replicate it.

Running Sethâs example on the training data set, I was able to get the 
results below. 

![screen shot 2017-05-11 at 10 08 37 
pm](https://cloud.githubusercontent.com/assets/4738024/25979615/9567a8bc-3697-11e7-81ed-be3fd073f4c5.png)

I agree that we should add modern activation functions. More importantly, 
we should add improved optimizers and a modular API to make this valuable to 
real users. 

Iâm going to do a code review here and at scalable-deeplearning in the 
next few days regardless of the decision we make around this. I think that 
these improvements (activation functions, optimizers) should be a part of a 
flexible modular library if we want to give users a modern experience. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit tests h...

2017-05-11 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/17869
  
@HyukjinKwon 
I would like to suggest that
modify ALSCleanerSuite with another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17948
  
@cloud-fan Could you check this? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17952: [SPARK-20705][WEB-UI]The sort function can not be used i...

2017-05-11 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/17952
  
Obviously you said it was right. I have modified as requested. I have been 
manually tested, it is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17940#discussion_r116139174
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -992,7 +992,20 @@ object Matrices {
 new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
   case sm: BSM[Double] =>
 // There is no isTranspose flag for sparse matrices in Breeze
-new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, 
sm.data)
+
+// Some Breeze CSCMatrices may have extra trailing zeros in
+// .rowIndices and .data, which are added after some matrix
+// operations for efficiency.
+//
+// Therefore the last element of sm.colPtrs would no longer be
+// coherent with the size of sm.rowIndices and sm.data
+// despite sm being a valid CSCMatrix.
+// We need to truncate both arrays (rowIndices, data)
+// to the real size of the vector sm.activeSize to allow valid 
conversion
+
+val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize)
+val truncData = sm.data.slice(0, sm.activeSize)
--- End diff --

This is the same as calling compact(). But the good thing is that it won't 
impact the original matrix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17940#discussion_r116139610
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -46,6 +46,26 @@ class MatricesSuite extends SparkFunSuite {
 }
   }
 
+  test("Test Breeze Conversion Bug - SPARK-20687") {
--- End diff --

specific name: Test FromBreeze when Breeze.CSCMatrix.rowIndices has 
trailing zeros.

And move the test after another unit test "fromBreeze with sparse matrix"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17940#discussion_r116139038
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -992,7 +992,20 @@ object Matrices {
 new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
   case sm: BSM[Double] =>
 // There is no isTranspose flag for sparse matrices in Breeze
-new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, 
sm.data)
+
+// Some Breeze CSCMatrices may have extra trailing zeros in
+// .rowIndices and .data, which are added after some matrix
+// operations for efficiency.
+//
+// Therefore the last element of sm.colPtrs would no longer be
+// coherent with the size of sm.rowIndices and sm.data
+// despite sm being a valid CSCMatrix.
+// We need to truncate both arrays (rowIndices, data)
+// to the real size of the vector sm.activeSize to allow valid 
conversion
+
--- End diff --

Maybe we can add some if else here, since slice will copy the array and 
often that's not needed. 
Please refer to 
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/CSCMatrix.scala#L130


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 621 matches

Mail list logo