date:20170509

[GitHub] spark pull request #17928: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-09 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17928#discussion_r115664938
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -468,7 +468,18 @@ class PlanParserSuite extends PlanTest {
   test("table valued function") {
 assertEqual(
   "select * from range(2)",
-  UnresolvedTableValuedFunction("range", Literal(2) :: 
Nil).select(star()))
+  UnresolvedTableValuedFunction("range", Literal(2) :: Nil, 
Seq.empty).select(star()))
+  }
+
+  test("SPARK-20311 range(N) as alias") {
+assertEqual(
+  "select * from range(10) AS t",
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17930: [SPARK-20688][SQL] correctly check analysis for scalar s...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17930
  
LGTM 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17713: [SPARK-20417][SQL] Move subquery error handling t...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17713#discussion_r115664356
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -414,4 +350,272 @@ trait CheckAnalysis extends PredicateHelper {
 
 plan.foreach(_.setAnalyzed())
   }
+
+  /**
+   * Validates subquery expressions in the plan. Upon failure, returns an 
user facing error.
+   */
+  private def checkSubqueryExpression(plan: LogicalPlan, expr: 
SubqueryExpression): Unit = {
+def checkAggregateInScalarSubquery(
+conditions: Seq[Expression],
+query: LogicalPlan, agg: Aggregate): Unit = {
+  // Make sure correlated scalar subqueries contain one row for every 
outer row by
+  // enforcing that they are aggregates containing exactly one 
aggregate expression.
+  val aggregates = agg.expressions.flatMap(_.collect {
+case a: AggregateExpression => a
+  })
+  if (aggregates.isEmpty) {
+failAnalysis("The output of a correlated scalar subquery must be 
aggregated")
+  }
+
+  // SPARK-18504/SPARK-18814: Block cases where GROUP BY columns
+  // are not part of the correlated columns.
+  val groupByCols = 
AttributeSet(agg.groupingExpressions.flatMap(_.references))
+  // Collect the local references from the correlated predicate in the 
subquery.
+  val subqueryColumns = 
getCorrelatedPredicates(query).flatMap(_.references)
+.filterNot(conditions.flatMap(_.references).contains)
+  val correlatedCols = AttributeSet(subqueryColumns)
+  val invalidCols = groupByCols -- correlatedCols
+  // GROUP BY columns must be a subset of columns in the predicates
+  if (invalidCols.nonEmpty) {
+failAnalysis(
+  "A GROUP BY clause in a scalar correlated subquery " +
+"cannot contain non-correlated columns: " +
+invalidCols.mkString(","))
+  }
+}
+
+// Skip subquery aliases added by the Analyzer.
+// For projects, do the necessary mapping and skip to its child.
+def cleanQueryInScalarSubquery(p: LogicalPlan): LogicalPlan = p match {
+  case s: SubqueryAlias => cleanQueryInScalarSubquery(s.child)
+  case p: Project => cleanQueryInScalarSubquery(p.child)
+  case child => child
+}
+
+expr match {
+  case ScalarSubquery(query, conditions, _) =>
+// Scalar subquery must return one column as output.
+if (query.output.size != 1) {
+  failAnalysis(
+s"Scalar subquery must return only one column, but got 
${query.output.size}")
+}
+
+if (conditions.nonEmpty) {
+  cleanQueryInScalarSubquery(query) match {
+case a: Aggregate => 
checkAggregateInScalarSubquery(conditions, query, a)
+case Filter(_, a: Aggregate) => 
checkAggregateInScalarSubquery(conditions, query, a)
+case fail => failAnalysis(s"Correlated scalar subqueries must 
be aggregated: $fail")
+  }
+
+  // Only certain operators are allowed to host subquery 
expression containing
+  // outer references.
+  plan match {
+case _: Filter | _: Aggregate | _: Project => // Ok
+case other => failAnalysis(
+  "Correlated scalar sub-queries can only be used in a " +
+s"Filter/Aggregate/Project: $plan")
+  }
+}
+
+  case inSubqueryOrExistsSubquery =>
+plan match {
+  case _: Filter => // Ok
+  case _ =>
+failAnalysis(s"IN/EXISTS predicate sub-queries can only be 
used in a Filter: $plan")
+}
+}
+
+// Validate the subquery plan.
+checkAnalysis(expr.plan)
--- End diff --

See the PR: https://github.com/apache/spark/pull/17930

It should be moved earlier. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle null...

2017-05-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17911


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17918: [SPARK-20678][SQL] Ndv for columns not in filter ...

2017-05-09 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17918#discussion_r115664182
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
 ---
@@ -217,32 +217,18 @@ case class InnerOuterEstimation(conf: SQLConf, join: 
Join) extends Logging {
   if (joinKeyStats.contains(a)) {
 outputAttrStats += a -> joinKeyStats(a)
   } else {
-val leftRatio = if (leftRows != 0) {
-  BigDecimal(outputRows) / BigDecimal(leftRows)
-} else {
-  BigDecimal(0)
-}
-val rightRatio = if (rightRows != 0) {
-  BigDecimal(outputRows) / BigDecimal(rightRows)
-} else {
-  BigDecimal(0)
-}
 val oldColStat = oldAttrStats(a)
 val oldNdv = oldColStat.distinctCount
-// We only change (scale down) the number of distinct values if 
the number of rows
-// decreases after join, because join won't produce new values 
even if the number of
-// rows increases.
-val newNdv = if (join.left.outputSet.contains(a) && leftRatio < 1) 
{
-  ceil(BigDecimal(oldNdv) * leftRatio)
-} else if (join.right.outputSet.contains(a) && rightRatio < 1) {
-  ceil(BigDecimal(oldNdv) * rightRatio)
+val newNdv = if (join.left.outputSet.contains(a)) {
+  updateNdv(oldNumRows = leftRows, newNumRows = outputRows, oldNdv 
= oldNdv)
 } else {
-  oldNdv
+  updateNdv(oldNumRows = rightRows, newNumRows = outputRows, 
oldNdv = oldNdv)
 }
+val newColStat =
--- End diff --

Oh, I looked at the wrong file. I'll fix it now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17918: [SPARK-20678][SQL] Ndv for columns not in filter ...

2017-05-09 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17918#discussion_r115664061
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
 ---
@@ -217,32 +217,18 @@ case class InnerOuterEstimation(conf: SQLConf, join: 
Join) extends Logging {
   if (joinKeyStats.contains(a)) {
 outputAttrStats += a -> joinKeyStats(a)
   } else {
-val leftRatio = if (leftRows != 0) {
-  BigDecimal(outputRows) / BigDecimal(leftRows)
-} else {
-  BigDecimal(0)
-}
-val rightRatio = if (rightRows != 0) {
-  BigDecimal(outputRows) / BigDecimal(rightRows)
-} else {
-  BigDecimal(0)
-}
 val oldColStat = oldAttrStats(a)
 val oldNdv = oldColStat.distinctCount
-// We only change (scale down) the number of distinct values if 
the number of rows
-// decreases after join, because join won't produce new values 
even if the number of
-// rows increases.
-val newNdv = if (join.left.outputSet.contains(a) && leftRatio < 1) 
{
-  ceil(BigDecimal(oldNdv) * leftRatio)
-} else if (join.right.outputSet.contains(a) && rightRatio < 1) {
-  ceil(BigDecimal(oldNdv) * rightRatio)
+val newNdv = if (join.left.outputSet.contains(a)) {
+  updateNdv(oldNumRows = leftRows, newNumRows = outputRows, oldNdv 
= oldNdv)
 } else {
-  oldNdv
+  updateNdv(oldNumRows = rightRows, newNumRows = outputRows, 
oldNdv = oldNdv)
 }
+val newColStat =
--- End diff --

Yes, please look at [this 
line](https://github.com/apache/spark/pull/17918/files#diff-e068b2e4d8b82a9587450cd17d8d7226R791).
 I don't know why it is not folded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17911
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17918: [SPARK-20678][SQL] Ndv for columns not in filter ...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17918#discussion_r115663707
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
 ---
@@ -217,32 +217,18 @@ case class InnerOuterEstimation(conf: SQLConf, join: 
Join) extends Logging {
   if (joinKeyStats.contains(a)) {
 outputAttrStats += a -> joinKeyStats(a)
   } else {
-val leftRatio = if (leftRows != 0) {
-  BigDecimal(outputRows) / BigDecimal(leftRows)
-} else {
-  BigDecimal(0)
-}
-val rightRatio = if (rightRows != 0) {
-  BigDecimal(outputRows) / BigDecimal(rightRows)
-} else {
-  BigDecimal(0)
-}
 val oldColStat = oldAttrStats(a)
 val oldNdv = oldColStat.distinctCount
-// We only change (scale down) the number of distinct values if 
the number of rows
-// decreases after join, because join won't produce new values 
even if the number of
-// rows increases.
-val newNdv = if (join.left.outputSet.contains(a) && leftRatio < 1) 
{
-  ceil(BigDecimal(oldNdv) * leftRatio)
-} else if (join.right.outputSet.contains(a) && rightRatio < 1) {
-  ceil(BigDecimal(oldNdv) * rightRatio)
+val newNdv = if (join.left.outputSet.contains(a)) {
+  updateNdv(oldNumRows = leftRows, newNumRows = outputRows, oldNdv 
= oldNdv)
 } else {
-  oldNdv
+  updateNdv(oldNumRows = rightRows, newNumRows = outputRows, 
oldNdv = oldNdv)
 }
+val newColStat =
--- End diff --

is it fixed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17928: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17928#discussion_r115663460
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -468,7 +468,18 @@ class PlanParserSuite extends PlanTest {
   test("table valued function") {
 assertEqual(
   "select * from range(2)",
-  UnresolvedTableValuedFunction("range", Literal(2) :: 
Nil).select(star()))
+  UnresolvedTableValuedFunction("range", Literal(2) :: Nil, 
Seq.empty).select(star()))
+  }
+
+  test("SPARK-20311 range(N) as alias") {
+assertEqual(
+  "select * from range(10) AS t",
--- End diff --

You also can update the similar issues in your test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17928: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17928
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17928: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17928#discussion_r115663412
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -468,7 +468,18 @@ class PlanParserSuite extends PlanTest {
   test("table valued function") {
 assertEqual(
   "select * from range(2)",
-  UnresolvedTableValuedFunction("range", Literal(2) :: 
Nil).select(star()))
+  UnresolvedTableValuedFunction("range", Literal(2) :: Nil, 
Seq.empty).select(star()))
+  }
+
+  test("SPARK-20311 range(N) as alias") {
+assertEqual(
+  "select * from range(10) AS t",
--- End diff --

Nit: ` SELECT * FROM range(10) AS t`
BTW, we prefer to use upper case for the SQL keywords. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17912: [SPARK-20670] [ML] Simplify FPGrowth transform

2017-05-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17912


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17929
  
thanks, merging to master/2.2/2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17928: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17928#discussion_r115663190
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -472,15 +472,23 @@ identifierComment
 ;
 
 relationPrimary
-: tableIdentifier sample? (AS? strictIdentifier)?   
#tableName
-| '(' queryNoWith ')' sample? (AS? strictIdentifier)?   
#aliasedQuery
-| '(' relation ')' sample? (AS? strictIdentifier)?  
#aliasedRelation
-| inlineTable   
#inlineTableDefault2
-| identifier '(' (expression (',' expression)*)? ')'
#tableValuedFunction
+: tableIdentifier sample? (AS? strictIdentifier)?  #tableName
+| '(' queryNoWith ')' sample? (AS? strictIdentifier)?  #aliasedQuery
+| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
+| inlineTable  
#inlineTableDefault2
+| functionTable
#tableValuedFunction
 ;
 
 inlineTable
-: VALUES expression (',' expression)*  (AS? identifier 
identifierList?)?
+: VALUES expression (',' expression)* tableAlias
+;
+
+functionTable
+: identifier '(' (expression (',' expression)*)? ')' tableAlias
+;
+
+tableAlias
+: (AS? strictIdentifier identifierList?)?
--- End diff --

This also hits another bug in inline tables. Maybe you also can include the 
following query in the test case `inline-table.sql`?
```
sql("SELECT * FROM VALUES (\"one\", 1), (\"two\", 2), (\"three\", null) 
CROSS JOIN VALUES (\"one\", 1), (\"two\", 2), (\"three\", null)")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17912: [SPARK-20670] [ML] Simplify FPGrowth transform

2017-05-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17912
  
merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorre...

2017-05-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17929


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17887
  
LGTM except the document change like @gatorsmile suggested


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76724/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17929
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17929
  
**[Test build #76724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76724/testReport)**
 for PR 17929 at commit 
[`fe8fe9a`](https://github.com/apache/spark/commit/fe8fe9a79f6a0228a08298c60942d6becfe20338).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17932: [SPARK-20689][PYSPARK] python doctest leaking bucketed t...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17932
  
**[Test build #76730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76730/testReport)**
 for PR 17932 at commit 
[`8890b60`](https://github.com/apache/spark/commit/8890b60de7b94b97b9d87560cbb06faa8a838bf3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17909: [SPARK-20661][WIP] try to dump table names

2017-05-09 Thread felixcheung

Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/17909


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17909: [SPARK-20661][WIP] try to dump table names

2017-05-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17909
  
opened PR 17932


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17887
  
OK. I also think about it too after reading the doc of RLike.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17887
  
**[Test build #76729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76729/testReport)**
 for PR 17887 at commit 
[`3241b88`](https://github.com/apache/spark/commit/3241b88c37478652c78b1d8d4809385b47410c51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17932: [SPARK-20689][PYSPARK] python doctest leaking buc...

2017-05-09 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/17932

[SPARK-20689][PYSPARK] python doctest leaking bucketed table

## What changes were proposed in this pull request?

It turns out pyspark doctest is calling saveAsTable without ever dropping 
them. Since we have separately python tests for bucketed table, and there is 
not testing of results, there is really no need to run the doctest 

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark pytablecleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17932


commit 8890b60de7b94b97b9d87560cbb06faa8a838bf3
Author: Felix Cheung 
Date:   2017-05-10T06:16:16Z

disable run bucketBy saveAsTable in pyspark doctest




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17887
  
Thanks @viirya  We'll backport it and test it out soon. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17887
  
Please also add some examples in the function descriptions? It might help 
users understand how to correctly escape it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-09 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r115661251
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operator.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17887: [SPARK-20399][SQL] Add a config to fallback strin...

2017-05-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17887#discussion_r115661190
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -196,6 +196,14 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val ESCAPED_STRING_LITERALS = 
buildConf("spark.sql.parser.escapedStringLiterals")
+.internal()
+.doc("When true, string literals (including regex patterns) remains 
escaped in our SQL " +
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17887
  
@gatorsmile OK. Let me update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17887
  
Could you update the involved function description? For example, `RLike`? I 
believe not only @dbtsai 's team hit this issue. It should be documented in the 
function description. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-09 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r115661240
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 ---
@@ -290,4 +290,15 @@ class SparkSqlParserSuite extends PlanTest {
   basePlan,
   numPartitions = newConf.numShufflePartitions)))
   }
+
+  test("pipeline concatenation") {
+val concat = Concat(
+  Concat(UnresolvedAttribute("a") :: UnresolvedAttribute("b") :: Nil) 
::
+  UnresolvedAttribute("c") ::
+  Nil
+)
+assertEqual(
+  "SELECT a || b || c FROM t",
--- End diff --

ok, I'll try to add a new rule for that. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17711
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-05-09 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/17742
  
Thanks, I will do some test based on BLAS 3. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76723/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17931: [SPARK-12837][CORE][FOLLOWUP] getting name should not fa...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17931
  
we will keep `SQLMetrics` that sent from executors as UI data, while the 
actual accumulator registered for `SQLMetrics` may be garbage collected as the 
`SparkPlan` linked with it is garbage collected.

task context accumulators don't have this problem as we always keep the 
registered accumulators in DAGScheduler


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17711
  
**[Test build #76723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76723/testReport)**
 for PR 17711 at commit 
[`8890b94`](https://github.com/apache/spark/commit/8890b94189eb087bf51da5c3dd0880c33b8a1f20).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17887: [SPARK-20399][SQL] Add a config to fallback strin...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17887#discussion_r115660739
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -196,6 +196,14 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val ESCAPED_STRING_LITERALS = 
buildConf("spark.sql.parser.escapedStringLiterals")
+.internal()
+.doc("When true, string literals (including regex patterns) remains 
escaped in our SQL " +
--- End diff --

Nit: `remains` -> `remain`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r115660393
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operator.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence
--- End diff --

Nit: rename it to `operators.sql`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17887
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r115660303
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 ---
@@ -290,4 +290,15 @@ class SparkSqlParserSuite extends PlanTest {
   basePlan,
   numPartitions = newConf.numShufflePartitions)))
   }
+
+  test("pipeline concatenation") {
+val concat = Concat(
+  Concat(UnresolvedAttribute("a") :: UnresolvedAttribute("b") :: Nil) 
::
+  UnresolvedAttribute("c") ::
+  Nil
+)
+assertEqual(
+  "SELECT a || b || c FROM t",
--- End diff --

Yes. I prefer to simpler codes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76722/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17887
  
**[Test build #76722 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76722/testReport)**
 for PR 17887 at commit 
[`9ce7eb0`](https://github.com/apache/spark/commit/9ce7eb0450249fdc25e19adf6bcfe35b274dd086).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76728/testReport)**
 for PR 17770 at commit 
[`c313e35`](https://github.com/apache/spark/commit/c313e353104fc93ba72a2152a7044a6ea8c06311).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17917: [SPARK-20600][SS] KafkaRelation should be pretty ...

2017-05-09 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17917#discussion_r115659920
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala
 ---
@@ -143,4 +143,6 @@ private[kafka010] class KafkaRelation(
 validateTopicPartitions(partitions, partitionOffsets)
 }
   }
+
+  override def toString: String = "kafka"
 }
--- End diff --

How about giving some more information about the kafka source? like topic, 
partition? refers to 
https://github.com/jaceklaskowski/spark/blob/2ffe4476553cfe50eb6392d8e573545a92fef737/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala#L140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-09 Thread debasish83

Github user debasish83 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17862#discussion_r115659479
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 
   test("linearSVC with sample weights") {
 def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
-  assert(m1.coefficients ~== m2.coefficients absTol 0.05)
+  assert(m1.coefficients ~== m2.coefficients absTol 0.07)
   assert(m1.intercept ~== m2.intercept absTol 0.05)
 }
-
-val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
-val dataset = smallBinaryDataset
-MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
-  dataset.as[LabeledPoint], estimator, modelEquals)
-MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
-  dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 
3)
-MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
-  dataset.as[LabeledPoint], estimator, modelEquals, 42L)
+LinearSVC.supportedOptimizers.foreach { opt =>
+  val estimator = new 
LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
+  val dataset = smallBinaryDataset
+  MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, 
LinearSVC](
+dataset.as[LabeledPoint], estimator, modelEquals)
+  MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, 
LinearSVC](
+dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio 
= 3)
+  MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, 
LinearSVC](
+dataset.as[LabeledPoint], estimator, modelEquals, 42L)
+}
   }
 
-  test("linearSVC comparison with R e1071 and scikit-learn") {
-val trainer1 = new LinearSVC()
+  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
+val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
   .setRegParam(0.2) // set regParam = 2.0 / datasize / c
--- End diff --

This slides also explain it...Please see slide 32...the max can be replaced 
by soft-max with the softness lambda can be tuned...log-sum-exp is a standard 
soft-max that can be used which is similar to ReLu functions and we can re-use 
it from MLP:
ftp://ftp.cs.wisc.edu/math-prog/talks/informs99ssv.ps
ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.pdf
I can add the formulation if there is interest...it needs some tuning for 
soft-max parameter but the convergence will be good with LBFGS (OWLQN is not 
needed)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17913: [SPARK-20672][SS] Keep the `isStreaming` property...

2017-05-09 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17913#discussion_r115659132
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala
 ---
@@ -48,7 +48,7 @@ case class StreamingRelation(dataSource: DataSource, 
sourceName: String, output:
  * Used to link a streaming [[Source]] of data into a
  * [[org.apache.spark.sql.catalyst.plans.logical.LogicalPlan]].
  */
-case class StreamingExecutionRelation(source: Source, output: 
Seq[Attribute]) extends LeafNode {
+case class StreamingSourceRelation(source: Source, output: Seq[Attribute]) 
extends LeafNode {
   override def isStreaming: Boolean = true
--- End diff --

just one renaming


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-05-09 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/17742
  
A single buffer doesn't lead to long GC pause. If it request lot of memory, 
it might trigger GC to collect other objects. But itself is a single object, 
which can be easily GC'ed. The problem here is having many small long-living 
objects as in `output`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17913: [SPARK-20672][SS] Keep the `isStreaming` property in tri...

2017-05-09 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17913
  
cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17929
  
**[Test build #76727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76727/testReport)**
 for PR 17929 at commit 
[`fe8fe9a`](https://github.com/apache/spark/commit/fe8fe9a79f6a0228a08298c60942d6becfe20338).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/17929
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-05-09 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/17742
  
I don't think we should use BLAS 3 here, because no matter use output or 
not here, you need a big buffer to save the BLAS result. That still cause GC 
problem. 
I also want to test: build a handwritten native gemm,  which not returns C 
(C=A*B), but only return the topK elements of each row.  This maybe better 
performance than current solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17929
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17929
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76718/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17929
  
**[Test build #76718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76718/testReport)**
 for PR 17929 at commit 
[`ce56596`](https://github.com/apache/spark/commit/ce565965f293c753d32a30a41eb624a9aabfa94e).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-09 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17862#discussion_r115657829
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -223,6 +229,25 @@ class LinearSVCSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(model1.coefficients ~== coefficientsSK relTol 4E-3)
   }
 
+  test("linearSVC L-BFGS comparison with R e1071 and scikit-learn") {
+val trainer1 = new LinearSVC().setSolver(LinearSVC.LBFGS)
+  .setRegParam(0.3)
--- End diff --

Indeed. I can use your help here since I cannot find the theory proof for 
this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17916: [SPARK-20590][SQL] Use Spark internal datasource if mult...

2017-05-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17916
  
Thanks everyone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17916: [SPARK-20590][SQL] Use Spark internal datasource if mult...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17916
  
thanks, merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17916: [SPARK-20590][SQL] Use Spark internal datasource ...

2017-05-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17916


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76721/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit tests h...

2017-05-09 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/17869
  
To run again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #76721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76721/testReport)**
 for PR 16989 at commit 
[`cfa54d7`](https://github.com/apache/spark/commit/cfa54d775f043f3019aad2e8535276d249dab64d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17909: [SPARK-20661][WIP] try to dump table names

2017-05-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17909
  
ah, you are right. I'm going to fix the python code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17928: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17928
  
LGTM, cc @gatorsmile to take another look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17931: [SPARK-12837][CORE][FOLLOWUP] getting name should not fa...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17931
  
**[Test build #76726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76726/testReport)**
 for PR 17931 at commit 
[`5ae4f6e`](https://github.com/apache/spark/commit/5ae4f6e147a97c4c4af5d82810597ad92b10ad1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-09 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17862#discussion_r115656752
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 
   test("linearSVC with sample weights") {
 def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
-  assert(m1.coefficients ~== m2.coefficients absTol 0.05)
+  assert(m1.coefficients ~== m2.coefficients absTol 0.07)
   assert(m1.intercept ~== m2.intercept absTol 0.05)
 }
-
-val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
-val dataset = smallBinaryDataset
-MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
-  dataset.as[LabeledPoint], estimator, modelEquals)
-MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
-  dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 
3)
-MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
-  dataset.as[LabeledPoint], estimator, modelEquals, 42L)
+LinearSVC.supportedOptimizers.foreach { opt =>
+  val estimator = new 
LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
+  val dataset = smallBinaryDataset
+  MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, 
LinearSVC](
+dataset.as[LabeledPoint], estimator, modelEquals)
+  MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, 
LinearSVC](
+dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio 
= 3)
+  MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, 
LinearSVC](
+dataset.as[LabeledPoint], estimator, modelEquals, 42L)
+}
   }
 
-  test("linearSVC comparison with R e1071 and scikit-learn") {
-val trainer1 = new LinearSVC()
+  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
+val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
   .setRegParam(0.2) // set regParam = 2.0 / datasize / c
--- End diff --

http://www.robots.ox.ac.uk/~az/lectures/ml/lect2.pdf 
Please refer to page 36.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17931: [SPARK-12837][CORE][FOLLOWUP] getting name should not fa...

2017-05-09 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17931
  
What's the issue with SQL metrics?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17931: [SPARK-12837][CORE][FOLLOWUP] getting name should not fa...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17931
  
cc @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17931: [SPARK-12837][CORE][FOLLOWUP] getting name should...

2017-05-09 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/17931

[SPARK-12837][CORE][FOLLOWUP] getting name should not fail if accumulator 
is garbage collected

## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/17596 , we do not send internal 
accumulator name to executor side anymore, and always look up the accumulator 
name in `AccumulatorContext`.

This cause a regression if the accumulator is already garbage collected, 
this PR fixes this by still sending accumulator name for `SQLMetrics`.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17931.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17931


commit 5ae4f6e147a97c4c4af5d82810597ad92b10ad1f
Author: Wenchen Fan 
Date:   2017-05-10T05:26:32Z

getting name should not fail if accumulator is garbage collected




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17930: [SPARK-20688][SQL] correctly check analysis for scalar s...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17930
  
**[Test build #76725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76725/testReport)**
 for PR 17930 at commit 
[`3ccff2e`](https://github.com/apache/spark/commit/3ccff2e552a42af4c69adaf4d8a1a430a98f85b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-05-09 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/17742
  
Thanks @mengxr , glad to meet you here. I am Meng Peng.
I have tested different blockSize, see 
https://issues.apache.org/jira/browse/SPARK-20443
I will test the other methods you mentioned. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17930: [SPARK-20688][SQL] correctly check analysis for s...

2017-05-09 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/17930

[SPARK-20688][SQL] correctly check analysis for scalar sub-queries

## What changes were proposed in this pull request?

In `CheckAnalysis`, we should call `checkAnalysis` for `ScalarSubquery` at 
the beginning, as later we will call `plan.output` which is invalid if `plan` 
is not resolved.

## How was this patch tested?

new regression test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark tmp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17930.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17930


commit 3ccff2e552a42af4c69adaf4d8a1a430a98f85b0
Author: Wenchen Fan 
Date:   2017-05-10T04:58:59Z

correctly check analysis for scalar sub-queries




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17930: [SPARK-20688][SQL] correctly check analysis for scalar s...

2017-05-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17930
  
cc @rxin @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-05-09 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/17742
  
I think the problem is not BLAS-3 ops, nor the 256MB total memory. The `val 
output = new Array[(Int, (Int, Double))](m * n)` is not specialized. Each 
element holds two references. If `m=4096` and `n=4096`, in total we have 33.5 
million objects, which caused GC. The implementation in this PR changed `n` to 
`k`, which significantly reduced the total number of temp objects. But it 
doesn't mean that we should drop BLAS-3.

@mpjlu Could you test the following?

* change block size to 2048, which reduced the max possible 
* After `val ratings = srcFactors.transpose.multiply(dstFactors)`, do not 
construct `output`. There are two options:
** The most optimized version would be doing a quickselect on each row and 
select top k.
** An easy-to-implement version would be:

~~~scala
Iterator.range(0, m).flatMap { i => 
  Iterator.range(0, n).map { j =>
(srcIds(i), (dstIds(j), ratings(i, j)))
 }
}
~~~

The second option is just a quick test, scarifying some performance. The 
temp objects created this way have very short life, and GC should be able to 
handle it. Then very likely we don't need to do top-k inside ALS, because the 
`topByKey` implementation is doing the same: 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala#L42.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17770
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76719/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #76719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76719/testReport)**
 for PR 17770 at commit 
[`4629959`](https://github.com/apache/spark/commit/462995943b2ff9dd222d545265b4404b2297040e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17929: [SPARK-20686][SQL] PropagateEmptyRelation incorrectly ha...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17929
  
**[Test build #76724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76724/testReport)**
 for PR 17929 at commit 
[`fe8fe9a`](https://github.com/apache/spark/commit/fe8fe9a79f6a0228a08298c60942d6becfe20338).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17924
  
Yep. Since this is an approach adding new dependency on Apache ORC, the 
non-vectorized PR also will need more supports(or approval) from the 
committers. I'll wait for more opinions at the current status for a while.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17924
  
@dongjoon-hyun It is good for me. We can reduce the size of this PR too and 
mitigate review job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17924
  
@cloud-fan and @viirya .

Shall we remove the vectorized part from this PR?
- The non-vectorized ORCFileFormat is mandatory and also the performance is 
better than the current one.
- After merging `sql/core` ORCFileFormat, many people (including @viirya 
and me) can work together in parallel.

How do you think about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17924: [SPARK-20682][SQL] Support a new faster ORC data ...

2017-05-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17924#discussion_r115650693
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
@@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import java.io.File
+
+import scala.util.Try
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.io.IntWritable
+import org.apache.hadoop.mapreduce.{JobID, TaskAttemptID, TaskID, TaskType}
+import org.apache.hadoop.mapreduce.lib.input.FileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.orc.OrcFile
+import org.apache.orc.mapred.OrcStruct
+import org.apache.orc.mapreduce.OrcInputFormat
+import org.apache.orc.storage.ql.exec.vector.{BytesColumnVector, 
LongColumnVector}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.SparkSession
+import 
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure orc read performance.
+ *
+ * This is in `sql/hive` module in order to compare `sql/core` and 
`sql/hive` ORC data sources.
+ * After removing `sql/hive` ORC data sources, we need to move this into 
`sql/core` module
+ * like the other ORC test suites.
+ */
+object OrcReadBenchmark {
+  val conf = new SparkConf()
+  conf.set("orc.compression", "snappy")
+
+  private val spark = SparkSession.builder()
+.master("local[1]")
+.appName("OrcReadBenchmark")
+.config(conf)
+.getOrCreate()
+
+  // Set default configs. Individual cases will change them if necessary.
+  spark.conf.set(SQLConf.ORC_VECTORIZED_READER_ENABLED.key, "true")
+  spark.conf.set(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key, "true")
+  spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private val SQL_ORC_FILE_FORMAT = 
"org.apache.spark.sql.execution.datasources.orc.OrcFileFormat"
+  private val HIVE_ORC_FILE_FORMAT = 
"org.apache.spark.sql.hive.orc.OrcFileFormat"
--- End diff --

So to avoid datasource name conflict, we may change Hive ORC datasource's 
shortName to other than "orc".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17711
  
**[Test build #76723 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76723/testReport)**
 for PR 17711 at commit 
[`8890b94`](https://github.com/apache/spark/commit/8890b94189eb087bf51da5c3dd0880c33b8a1f20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17924: [SPARK-20682][SQL] Support a new faster ORC data ...

2017-05-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17924#discussion_r115650375
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
@@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import java.io.File
+
+import scala.util.Try
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.io.IntWritable
+import org.apache.hadoop.mapreduce.{JobID, TaskAttemptID, TaskID, TaskType}
+import org.apache.hadoop.mapreduce.lib.input.FileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.orc.OrcFile
+import org.apache.orc.mapred.OrcStruct
+import org.apache.orc.mapreduce.OrcInputFormat
+import org.apache.orc.storage.ql.exec.vector.{BytesColumnVector, 
LongColumnVector}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.SparkSession
+import 
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure orc read performance.
+ *
+ * This is in `sql/hive` module in order to compare `sql/core` and 
`sql/hive` ORC data sources.
+ * After removing `sql/hive` ORC data sources, we need to move this into 
`sql/core` module
+ * like the other ORC test suites.
+ */
+object OrcReadBenchmark {
+  val conf = new SparkConf()
+  conf.set("orc.compression", "snappy")
+
+  private val spark = SparkSession.builder()
+.master("local[1]")
+.appName("OrcReadBenchmark")
+.config(conf)
+.getOrCreate()
+
+  // Set default configs. Individual cases will change them if necessary.
+  spark.conf.set(SQLConf.ORC_VECTORIZED_READER_ENABLED.key, "true")
+  spark.conf.set(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key, "true")
+  spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private val SQL_ORC_FILE_FORMAT = 
"org.apache.spark.sql.execution.datasources.orc.OrcFileFormat"
+  private val HIVE_ORC_FILE_FORMAT = 
"org.apache.spark.sql.hive.orc.OrcFileFormat"
--- End diff --

We need to keep both versions before complete transition and for safety. 
Instead, we can make configurable which file format is used for `orc` data 
source string, e.g, `USING ORC`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17911
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17911
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76717/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17911
  
**[Test build #76717 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76717/testReport)**
 for PR 17911 at commit 
[`948335c`](https://github.com/apache/spark/commit/948335c4617c82749f8c696b3d363fe193ce236d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...

2017-05-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17887
  
**[Test build #76722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76722/testReport)**
 for PR 17887 at commit 
[`9ce7eb0`](https://github.com/apache/spark/commit/9ce7eb0450249fdc25e19adf6bcfe35b274dd086).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17887: [SPARK-20399][SQL] Add a config to fallback strin...

2017-05-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17887#discussion_r115649694
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
 ---
@@ -413,38 +428,102 @@ class ExpressionParserSuite extends PlanTest {
   }
 
   test("strings") {
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-09 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r115649586
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 ---
@@ -290,4 +290,15 @@ class SparkSqlParserSuite extends PlanTest {
   basePlan,
   numPartitions = newConf.numShufflePartitions)))
   }
+
+  test("pipeline concatenation") {
+val concat = Concat(
+  Concat(UnresolvedAttribute("a") :: UnresolvedAttribute("b") :: Nil) 
::
+  UnresolvedAttribute("c") ::
+  Nil
+)
+assertEqual(
+  "SELECT a || b || c FROM t",
--- End diff --

aha, I see. WDYT, @gatorsmile ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17928: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17928
  
@cloud-fan ok, could you check again? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-09 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r115649532
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operator.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2017-05-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13775
  
Hmm. It seems `Merge remote-tracking branch` makes rebasing confused. Let 
me think how to compare this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17924: [SPARK-20682][SQL] Support a new faster ORC data ...

2017-05-09 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17924#discussion_r115649175
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala ---
@@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import java.io.File
+
+import scala.util.Try
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.io.IntWritable
+import org.apache.hadoop.mapreduce.{JobID, TaskAttemptID, TaskID, TaskType}
+import org.apache.hadoop.mapreduce.lib.input.FileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.orc.OrcFile
+import org.apache.orc.mapred.OrcStruct
+import org.apache.orc.mapreduce.OrcInputFormat
+import org.apache.orc.storage.ql.exec.vector.{BytesColumnVector, 
LongColumnVector}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.SparkSession
+import 
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.{Benchmark, Utils}
+
+
+/**
+ * Benchmark to measure orc read performance.
+ *
+ * This is in `sql/hive` module in order to compare `sql/core` and 
`sql/hive` ORC data sources.
+ * After removing `sql/hive` ORC data sources, we need to move this into 
`sql/core` module
+ * like the other ORC test suites.
+ */
+object OrcReadBenchmark {
+  val conf = new SparkConf()
+  conf.set("orc.compression", "snappy")
+
+  private val spark = SparkSession.builder()
+.master("local[1]")
+.appName("OrcReadBenchmark")
+.config(conf)
+.getOrCreate()
+
+  // Set default configs. Individual cases will change them if necessary.
+  spark.conf.set(SQLConf.ORC_VECTORIZED_READER_ENABLED.key, "true")
+  spark.conf.set(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key, "true")
+  spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+
+  def withTempPath(f: File => Unit): Unit = {
+val path = Utils.createTempDir()
+path.delete()
+try f(path) finally Utils.deleteRecursively(path)
+  }
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val (keys, values) = pairs.unzip
+val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
+(keys, values).zipped.foreach(spark.conf.set)
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => spark.conf.set(key, value)
+case (key, None) => spark.conf.unset(key)
+  }
+}
+  }
+
+  private val SQL_ORC_FILE_FORMAT = 
"org.apache.spark.sql.execution.datasources.orc.OrcFileFormat"
+  private val HIVE_ORC_FILE_FORMAT = 
"org.apache.spark.sql.hive.orc.OrcFileFormat"
--- End diff --

Will we keep current Hive ORC datasource even this is in Spark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17911: [SPARK-20668][SQL] Modify ScalaUDF to handle nullability...

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17911
  
LGTM pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17909: [SPARK-20661][WIP] try to dump table names

2017-05-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17909
  
Two tables `bucketed_table ` and `sorted_bucketed_table ` are from the same 
file `readwriter.py`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17928: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17928
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76716/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17928: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17928
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 681 matches

Mail list logo