[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17568#discussion_r111713040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
 ---
@@ -96,3 +98,22 @@ object CombineTypedFilters extends Rule[LogicalPlan] {
 }
   }
 }
+
+/**
+ * Removes MapObjects when the following conditions are satisfied
+ *   1. Mapobject(e) where e is lambdavariable(), which means types for 
input output
+ *  are primitive types
+ *   2. no custom collection class specified
+ *  representation of data item.  For example back to back map 
operations.
+ */
+object EliminateMapObjects extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case DeserializeToObject(Invoke(
+MapObjects(_, _, _, _ : LambdaVariable, inputData, None),
--- End diff --

can we just replace `MapObjects` with its child? Seems the only reason you 
match the whole `DeserializeToObject` is to make sure the `returnType` is 
object type, but that's guaranteed if the `collectionClass` is None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17655#discussion_r111713236
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2295,5 +2295,27 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+
+test(s"basic DDL using locale tr - caseSensitive $caseSensitive") {
+  withSQLConf(SQLConf.CASE_SENSITIVE.key -> s"$caseSensitive") {
+withLocale("tr") {
+  val dbName = "DaTaBaSeI"
+  withDatabase(dbName) {
+sql(s"CREATE DATABASE $dbName")
+sql(s"USE $dbName")
+
+val tabName = "tAbI"
+withTable(tabName) {
+  sql(s"CREATE TABLE $tabName(c1 int) USING PARQUET")
+  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
+  checkAnswer(sql(s"SELECT c1 FROM $tabName"), Row(1) :: Nil)
+  sql(s"DROP TABLE $tabName")
--- End diff --

is this needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17655#discussion_r111713226
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2295,5 +2295,27 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+
+test(s"basic DDL using locale tr - caseSensitive $caseSensitive") {
+  withSQLConf(SQLConf.CASE_SENSITIVE.key -> s"$caseSensitive") {
+withLocale("tr") {
+  val dbName = "DaTaBaSeI"
+  withDatabase(dbName) {
+sql(s"CREATE DATABASE $dbName")
+sql(s"USE $dbName")
+
+val tabName = "tAbI"
+withTable(tabName) {
+  sql(s"CREATE TABLE $tabName(c1 int) USING PARQUET")
+  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
+  checkAnswer(sql(s"SELECT c1 FROM $tabName"), Row(1) :: Nil)
+  sql(s"DROP TABLE $tabName")
+}
+
+sql(s"DROP DATABASE $dbName")
--- End diff --

is this needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17656: [SPARK-20354][CORE][REST-API]/api/v1/applications...

2017-04-17 Thread guoxiaolongzte
GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/17656

[SPARK-20354][CORE][REST-API]/api/v1/applications’ return sparkUser is 
null in REST API.

## What changes were proposed in this pull request?

When I request access to the 'http: //ip:port/api/v1/applications' link, 
get the json. I need the 'sparkUser' field specific value because my Spark big 
data management platform needs to filter through this field which user submits 
the application to facilitate my administration and query, but the current 
return of the json string is empty, causing me this Function can not be 
achieved, that is, I do not know who the specific application is submitted by 
this REST Api.

return json:
[ {
  "id" : "app-20170417152053-",
  "name" : "KafkaWordCount",
  "attempts" : [ {
"startTime" : "2017-04-17T07:20:51.395GMT",
"endTime" : "1969-12-31T23:59:59.999GMT",
"lastUpdated" : "2017-04-17T07:20:51.395GMT",
"duration" : 0,
"sparkUser" : "",
"completed" : false,
"endTimeEpoch" : -1,
"startTimeEpoch" : 1492413651395,
"lastUpdatedEpoch" : 1492413651395
  } ]
} ]

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-20354

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17656


commit d383efba12c66addb17006dea107bb0421d50bc3
Author: 郭小龙 10207633 
Date:   2017-03-31T13:57:09Z

[SPARK-20177]Document about compression way has some little detail changes.

commit 3059013e9d2aec76def14eb314b6761bea0e7ca0
Author: 郭小龙 10207633 
Date:   2017-04-01T01:38:02Z

[SPARK-20177] event log add a space

commit 555cef88fe09134ac98fd0ad056121c7df2539aa
Author: guoxiaolongzte 
Date:   2017-04-02T00:16:08Z

'/applications/[app-id]/jobs' in rest api,status should be 
[running|succeeded|failed|unknown]

commit 46bb1ad3ddd9fb55b5607ac4f20213a90186cfe9
Author: 郭小龙 10207633 
Date:   2017-04-05T03:16:50Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-20177

commit 0efb0dd9e404229cce638fe3fb0c966276784df7
Author: 郭小龙 10207633 
Date:   2017-04-05T03:47:53Z

[SPARK-20218]'/applications/[app-id]/stages' in REST API,add description.

commit 0e37fdeee28e31fc97436dabd001d3c85c5a7794
Author: 郭小龙 10207633 
Date:   2017-04-05T05:22:54Z

[SPARK-20218] '/applications/[app-id]/stages/[stage-id]' in REST API,remove 
redundant description.

commit 52641bb01e55b48bd9e8579fea217439d14c7dc7
Author: 郭小龙 10207633 
Date:   2017-04-07T06:24:58Z

Merge branch 'SPARK-20218'

commit d3977c9cab0722d279e3fae7aacbd4eb944c22f6
Author: 郭小龙 10207633 
Date:   2017-04-08T07:13:02Z

Merge branch 'master' of https://github.com/apache/spark

commit 137b90e5a85cde7e9b904b3e5ea0bb52518c4716
Author: 郭小龙 10207633 
Date:   2017-04-10T05:13:40Z

Merge branch 'master' of https://github.com/apache/spark

commit 0fe5865b8022aeacdb2d194699b990d8467f7a0a
Author: 郭小龙 10207633 
Date:   2017-04-10T10:25:22Z

Merge branch 'SPARK-20190' of https://github.com/guoxiaolongzte/spark

commit cf6f42ac84466960f2232c025b8faeb5d7378fe1
Author: 郭小龙 10207633 
Date:   2017-04-10T10:26:27Z

Merge branch 'master' of https://github.com/apache/spark

commit 685cd6b6e3799c7be65674b2670159ba725f0b8f
Author: 郭小龙 10207633 
Date:   2017-04-14T01:12:41Z

Merge branch 'master' of https://github.com/apache/spark

commit c716a9231e9ab117d2b03ba67a1c8903d8d9da93
Author: guoxiaolong 
Date:   2017-04-17T06:57:21Z

Merge branch 'master' of https://github.com/apache/spark

commit 9ecc1a0257964acca6aabddcfc26cbcecf5086e8
Author: guoxiaolong 
Date:   2017-04-17T07:25:36Z

[SPARK-20354]/api/v1/applications’ return sparkUser is null in REST API.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15398
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17656: [SPARK-20354][CORE][REST-API]When I request access to th...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17656
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17650#discussion_r111716124
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -153,6 +153,11 @@ object BooleanSimplification extends Rule[LogicalPlan] 
with PredicateHelper {
   case TrueLiteral Or _ => TrueLiteral
   case _ Or TrueLiteral => TrueLiteral
 
+  case a And b if Not(a).semanticEquals(b) => FalseLiteral
+  case a Or b if Not(a).semanticEquals(b) => TrueLiteral
+  case a And b if a.semanticEquals(Not(b)) => FalseLiteral
--- End diff --

do we need this? `Not(Not(a))` will be simplified to `a`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...

2017-04-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17568#discussion_r111716155
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends 
Rule[LogicalPlan] {
   case EqualNullSafe(Literal(null, _), r) => IsNull(r)
   case EqualNullSafe(l, Literal(null, _)) => IsNull(l)
 
+  case AssertNotNull(c, _) if !c.nullable => c
--- End diff --

I agree with @cloud-fan. I have also checked the usages of `AssertNotNull`. 
IIUC, all of them are used for throwing a runtime exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17650#discussion_r111716305
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
 ---
@@ -160,4 +166,12 @@ class BooleanSimplificationSuite extends PlanTest with 
PredicateHelper {
   testRelation.where('a > 2 || ('b > 3 && 'b < 5)))
 comparePlans(actual, expected)
   }
+
+  test("Complementation Laws") {
+checkCondition('a && !'a, LocalRelation(testRelation.output, 
Seq.empty))
--- End diff --

just `checkCondition('a && !'a, testRelation)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17650: [SPARK-20350] Add optimization rules to apply Complement...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17650
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17623
  
**[Test build #75851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75851/testReport)**
 for PR 17623 at commit 
[`b3284c3`](https://github.com/apache/spark/commit/b3284c346f36c3020d65b2010913b588f38caa94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17646: [SPARK-20349] [SQL] ListFunctions returns duplicate func...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17646
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17646: [SPARK-20349] [SQL] ListFunctions returns duplica...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17646#discussion_r111717028
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1202,15 +1203,25 @@ class SessionCatalog(
   def listFunctions(db: String, pattern: String): Seq[(FunctionIdentifier, 
String)] = {
 val dbName = formatDatabaseName(db)
 requireDbExists(dbName)
-val dbFunctions = externalCatalog.listFunctions(dbName, pattern)
-  .map { f => FunctionIdentifier(f, Some(dbName)) }
-val loadedFunctions = 
StringUtils.filterPattern(functionRegistry.listFunction(), pattern)
-  .map { f => FunctionIdentifier(f) }
+val dbFunctions = externalCatalog.listFunctions(dbName, pattern).map { 
f =>
+  FunctionIdentifier(f, Some(dbName)) }
+val loadedFunctions =
+  StringUtils.filterPattern(functionRegistry.listFunction(), 
pattern).map { f =>
+// In functionRegistry, function names are stored as an unquoted 
format.
--- End diff --

shall we use `FunctionIdentifier` as the key in functionRegistry?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expressions ...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17652
  
thanks, merging to 2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-04-17 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17191
  
ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17623: [SPARK-20292][SQL] Clean up string representation...

2017-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17623#discussion_r111717917
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -422,40 +422,62 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def nodeName: String = getClass.getSimpleName.replaceAll("Exec$", "")
 
   /**
+   * Returns a user-facing string representation of this node's name. By 
default it's `nodeName`.
+   */
+  def prettyName: String = nodeName
+
+  /**
* The arguments that should be included in the arg string.  Defaults to 
the `productIterator`.
*/
   protected def stringArgs: Iterator[Any] = productIterator
 
   private lazy val allChildren: Set[TreeNode[_]] = (children ++ 
innerChildren).toSet[TreeNode[_]]
 
+  /** Converts a node to string. Subclasses can override this to use other 
string representation. */
+  protected def argToString(arg: Any): String = arg match {
+case tn: TreeNode[_] => tn.verboseString
+case _ => arg.toString
+  }
+
   /** Returns a string representing the arguments to this node, minus any 
children */
   def argString: String = stringArgs.flatMap {
 case tn: TreeNode[_] if allChildren.contains(tn) => Nil
 case Some(tn: TreeNode[_]) if allChildren.contains(tn) => Nil
-case Some(tn: TreeNode[_]) => tn.simpleString :: Nil
-case tn: TreeNode[_] => tn.simpleString :: Nil
+case Some(tn: TreeNode[_]) => tn :: Nil
+case tn: TreeNode[_] => tn :: Nil
 case seq: Seq[Any] if 
seq.toSet.subsetOf(allChildren.asInstanceOf[Set[Any]]) => Nil
 case iter: Iterable[_] if iter.isEmpty => Nil
-case seq: Seq[_] => Utils.truncatedString(seq, "[", ", ", "]") :: Nil
-case set: Set[_] => Utils.truncatedString(set.toSeq, "{", ", ", "}") 
:: Nil
+case seq: Seq[_] => Utils.truncatedString(seq.map(argToString), "[", 
", ", "]") :: Nil
+case set: Set[_] => Utils.truncatedString(set.toSeq.map(argToString), 
"{", ", ", "}") :: Nil
 case array: Array[_] if array.isEmpty => Nil
-case array: Array[_] => Utils.truncatedString(array, "[", ", ", "]") 
:: Nil
+case array: Array[_] => Utils.truncatedString(array.map(argToString), 
"[", ", ", "]") :: Nil
 case null => Nil
 case None => Nil
 case Some(null) => Nil
 case Some(any) => any :: Nil
 case other => other :: Nil
-  }.mkString(", ")
+  }.map(argToString).mkString(", ")
 
-  /** ONE line description of this node. */
-  def simpleString: String = s"$nodeName $argString".trim
+  /** ONE line description of this node, not including any arguments and 
children information */
+  def simpleString: String = prettyName
 
-  /** ONE line description of this node with more information */
-  def verboseString: String
+  /**
+   * ONE line description of this node with more information, without any 
children information.
+   * By default, it includes the arguments to this node, minus any 
children.
+   * It is mainly called by `generateTreeString`, when constructing the 
string representation
+   * of the nodes in this tree. Subclasses can override it to provide more 
user-friendly
+   * representation.
+   */
+  def verboseString: String = if (argString != "") {
+s"$prettyName $argString".trim
+  } else {
+simpleString
+  }
 
-  /** ONE line description of this node with some suffix information */
+  /** ONE line description of this node by adding some suffix information 
to `verboseString` */
   def verboseStringWithSuffix: String = verboseString
--- End diff --

can we list all the string related interfaces defined in `TreeNode` in the 
PR description? e.g. `nodeName`, `prettyName`, `stringArgs`, etc. This can be 
very helpful to review this PR and a good reference for future contributors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17400: [SPARK-19981][SQL] Update output partitioning info. in P...

2017-04-17 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17400
  
@allengeorge yea, we could there. But, I think we should first make sure 
about how to fix this issue. I'm not sure that the approach of this pr is the 
best. cc: @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...

2017-04-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17568#discussion_r111719111
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
 ---
@@ -96,3 +98,22 @@ object CombineTypedFilters extends Rule[LogicalPlan] {
 }
   }
 }
+
+/**
+ * Removes MapObjects when the following conditions are satisfied
+ *   1. Mapobject(e) where e is lambdavariable(), which means types for 
input output
+ *  are primitive types
+ *   2. no custom collection class specified
+ *  representation of data item.  For example back to back map 
operations.
+ */
+object EliminateMapObjects extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case DeserializeToObject(Invoke(
+MapObjects(_, _, _, _ : LambdaVariable, inputData, None),
--- End diff --

To replace `MapObjects` with its child is a type `LogicalPlan => 
Expression` while this method requires `LogicalPlan => LogicalPlan`.
Is it fine to replace `Invoke(MapObject(..., inputData, None)...)` with 
`Invoke(inputData, ...)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17657: [TEST][MINOR] Replace repartitionBy with distribu...

2017-04-17 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17657

[TEST][MINOR] Replace repartitionBy with distribute in 
CollapseRepartitionSuite

## What changes were proposed in this pull request?

Replace non-existent `repartitionBy` with `distribute` in 
`CollapseRepartitionSuite`.

## How was this patch tested?

local build and `catalyst/testOnly *CollapseRepartitionSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark CollapseRepartitionSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17657.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17657


commit 427924ab99332f475ed0b46b261eb55eee560c4a
Author: Jacek Laskowski 
Date:   2017-04-17T08:24:38Z

[TEST][MINOR] Replace repartitionBy with distribute in 
CollapseRepartitionSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17657: [TEST][MINOR] Replace repartitionBy with distribute in C...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17657
  
**[Test build #75852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75852/testReport)**
 for PR 17657 at commit 
[`427924a`](https://github.com/apache/spark/commit/427924ab99332f475ed0b46b261eb55eee560c4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17623: [SPARK-20292][SQL] Clean up string representation...

2017-04-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17623#discussion_r111719916
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -422,40 +422,62 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def nodeName: String = getClass.getSimpleName.replaceAll("Exec$", "")
 
   /**
+   * Returns a user-facing string representation of this node's name. By 
default it's `nodeName`.
+   */
+  def prettyName: String = nodeName
+
+  /**
* The arguments that should be included in the arg string.  Defaults to 
the `productIterator`.
*/
   protected def stringArgs: Iterator[Any] = productIterator
 
   private lazy val allChildren: Set[TreeNode[_]] = (children ++ 
innerChildren).toSet[TreeNode[_]]
 
+  /** Converts a node to string. Subclasses can override this to use other 
string representation. */
+  protected def argToString(arg: Any): String = arg match {
+case tn: TreeNode[_] => tn.verboseString
+case _ => arg.toString
+  }
+
   /** Returns a string representing the arguments to this node, minus any 
children */
   def argString: String = stringArgs.flatMap {
 case tn: TreeNode[_] if allChildren.contains(tn) => Nil
 case Some(tn: TreeNode[_]) if allChildren.contains(tn) => Nil
-case Some(tn: TreeNode[_]) => tn.simpleString :: Nil
-case tn: TreeNode[_] => tn.simpleString :: Nil
+case Some(tn: TreeNode[_]) => tn :: Nil
+case tn: TreeNode[_] => tn :: Nil
 case seq: Seq[Any] if 
seq.toSet.subsetOf(allChildren.asInstanceOf[Set[Any]]) => Nil
 case iter: Iterable[_] if iter.isEmpty => Nil
-case seq: Seq[_] => Utils.truncatedString(seq, "[", ", ", "]") :: Nil
-case set: Set[_] => Utils.truncatedString(set.toSeq, "{", ", ", "}") 
:: Nil
+case seq: Seq[_] => Utils.truncatedString(seq.map(argToString), "[", 
", ", "]") :: Nil
+case set: Set[_] => Utils.truncatedString(set.toSeq.map(argToString), 
"{", ", ", "}") :: Nil
 case array: Array[_] if array.isEmpty => Nil
-case array: Array[_] => Utils.truncatedString(array, "[", ", ", "]") 
:: Nil
+case array: Array[_] => Utils.truncatedString(array.map(argToString), 
"[", ", ", "]") :: Nil
 case null => Nil
 case None => Nil
 case Some(null) => Nil
 case Some(any) => any :: Nil
 case other => other :: Nil
-  }.mkString(", ")
+  }.map(argToString).mkString(", ")
 
-  /** ONE line description of this node. */
-  def simpleString: String = s"$nodeName $argString".trim
+  /** ONE line description of this node, not including any arguments and 
children information */
+  def simpleString: String = prettyName
 
-  /** ONE line description of this node with more information */
-  def verboseString: String
+  /**
+   * ONE line description of this node with more information, without any 
children information.
+   * By default, it includes the arguments to this node, minus any 
children.
+   * It is mainly called by `generateTreeString`, when constructing the 
string representation
+   * of the nodes in this tree. Subclasses can override it to provide more 
user-friendly
+   * representation.
+   */
+  def verboseString: String = if (argString != "") {
+s"$prettyName $argString".trim
+  } else {
+simpleString
+  }
 
-  /** ONE line description of this node with some suffix information */
+  /** ONE line description of this node by adding some suffix information 
to `verboseString` */
   def verboseStringWithSuffix: String = verboseString
--- End diff --

Ok. Let me update this description later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the accumulator n...

2017-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17596#discussion_r111721093
  
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -154,18 +154,22 @@ abstract class AccumulatorV2[IN, OUT] extends 
Serializable {
 
   // Called by Java when serializing an object
   final protected def writeReplace(): Any = {
-if (atDriverSide) {
+val acc = if (atDriverSide) {
   if (!isRegistered) {
 throw new UnsupportedOperationException(
   "Accumulator must be registered before send to executor")
   }
   val copyAcc = copyAndReset()
   assert(copyAcc.isZero, "copyAndReset must return a zero value copy")
-  copyAcc.metadata = metadata
   copyAcc
 } else {
-  this
+  val copyAcc = copy()
+  copyAcc.atDriverSide = false
+  copyAcc
 }
+// Do not serialize the accumulator name and send it between executor 
and driver.
+acc.metadata = metadata.copy(name = None)
+acc
--- End diff --

I just took a look to help. It seems the cause here.
It seems throws an exception as below:

```
>>> from pyspark.accumulators import INT_ACCUMULATOR_PARAM
>>>
>>> acc1 = sc.accumulator(0, INT_ACCUMULATOR_PARAM)
>>> sc.parallelize(xrange(100), 20).foreach(lambda x: acc1.add(x))
17/04/17 17:10:39 ERROR DAGScheduler: Failed to update accumulators for 
task 2
java.lang.ClassCastException: org.apache.spark.util.CollectionAccumulator 
cannot be cast to org.apache.spark.api.python.PythonAccumulatorV2
at 
org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:903)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1105)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1097)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1097)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1173)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1716)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1674)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1663)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
```

It seems because `copy()` here returns a `CollectionAccumulator` from 
`PythonAccumulatorV2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the accumulator n...

2017-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17596#discussion_r111721127
  
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -154,18 +154,22 @@ abstract class AccumulatorV2[IN, OUT] extends 
Serializable {
 
   // Called by Java when serializing an object
   final protected def writeReplace(): Any = {
-if (atDriverSide) {
+val acc = if (atDriverSide) {
   if (!isRegistered) {
 throw new UnsupportedOperationException(
   "Accumulator must be registered before send to executor")
   }
   val copyAcc = copyAndReset()
   assert(copyAcc.isZero, "copyAndReset must return a zero value copy")
-  copyAcc.metadata = metadata
   copyAcc
 } else {
-  this
+  val copyAcc = copy()
--- End diff --

I just took a look to help. It seems the cause here.
It seems throws an exception as below:

```
>>> from pyspark.accumulators import INT_ACCUMULATOR_PARAM
>>>
>>> acc1 = sc.accumulator(0, INT_ACCUMULATOR_PARAM)
>>> sc.parallelize(xrange(100), 20).foreach(lambda x: acc1.add(x))
17/04/17 17:10:39 ERROR DAGScheduler: Failed to update accumulators for 
task 2
java.lang.ClassCastException: org.apache.spark.util.CollectionAccumulator 
cannot be cast to org.apache.spark.api.python.PythonAccumulatorV2
at 
org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:903)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1105)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1097)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1097)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1173)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1716)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1674)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1663)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
```

It seems because `copy()` here returns a `CollectionAccumulator` from 
`PythonAccumulatorV2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the accumulator n...

2017-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17596#discussion_r111722114
  
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -154,18 +154,22 @@ abstract class AccumulatorV2[IN, OUT] extends 
Serializable {
 
   // Called by Java when serializing an object
   final protected def writeReplace(): Any = {
-if (atDriverSide) {
+val acc = if (atDriverSide) {
   if (!isRegistered) {
 throw new UnsupportedOperationException(
   "Accumulator must be registered before send to executor")
   }
   val copyAcc = copyAndReset()
   assert(copyAcc.isZero, "copyAndReset must return a zero value copy")
-  copyAcc.metadata = metadata
   copyAcc
 } else {
-  this
+  val copyAcc = copy()
--- End diff --

Do you know it is okay to remove that validation? Just assuming the 
comments,  it sounds not needed. I just ran `AccumulatorV2Suite` and ` python 
run-tests.py --module=pyspark-core` and the failures are gone away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][R][DOCS] Document R fpGrowth suppor...

2017-04-17 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r111722945
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,37 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`. `itemsCol` should be an array of values.
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "T,R,U", "T,S", "V,R", "R,U,T,V", "R,S", "V,S,U", "U,R", "S,T", "V,R", 
"V,U,S",
+  "T,V,U", "R,V", "T,S", "T,S", "S,T", "S,U", "T,R", "V,R", "S,V", "T,S,U"
+))), "split(items, ',') AS items")
--- End diff --

@felixcheung  Updated. 

BTW There is a JIRA tracking SQL functions parity, isn't there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...

2017-04-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17620#discussion_r111723278
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -561,6 +561,11 @@ private[deploy] class Master(
 state = RecoveryState.ALIVE
 schedule()
 logInfo("Recovery complete - resuming operations!")
+   } catch {
--- End diff --

At the least, the spelling and spacing in this PR need fixing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17656: [SPARK-20354][CORE][REST-API]When I request access to th...

2017-04-17 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17656
  
@squito do you have an opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...

2017-04-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17655#discussion_r111724069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -114,14 +114,14 @@ class SessionCatalog(
* Format table name, taking into account case sensitivity.
*/
   protected[this] def formatTableName(name: String): String = {
-if (conf.caseSensitiveAnalysis) name else name.toLowerCase
+if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
--- End diff --

The problem I think is that this affects user apps and we were trying to 
avoid changes like this. The change was only about internal strings.

I would imagine the fix is in a test, not the main code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...

2017-04-17 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17651
  
I'd be interested to know if this resolves the issue, and the only real way 
is to merge it. It's not wrong, but it is hacky. That said I don't have a 
better idea at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17623
  
**[Test build #75851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75851/testReport)**
 for PR 17623 at commit 
[`b3284c3`](https://github.com/apache/spark/commit/b3284c346f36c3020d65b2010913b588f38caa94).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17623
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17623
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75851/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17657: [TEST][MINOR] Replace repartitionBy with distribute in C...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17657
  
**[Test build #75852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75852/testReport)**
 for PR 17657 at commit 
[`427924a`](https://github.com/apache/spark/commit/427924ab99332f475ed0b46b261eb55eee560c4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17657: [TEST][MINOR] Replace repartitionBy with distribute in C...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75852/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17657: [TEST][MINOR] Replace repartitionBy with distribute in C...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17657
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16989
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...

2017-04-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17568#discussion_r111730448
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends 
Rule[LogicalPlan] {
   case EqualNullSafe(Literal(null, _), r) => IsNull(r)
   case EqualNullSafe(l, Literal(null, _)) => IsNull(l)
 
+  case AssertNotNull(c, _) if !c.nullable => c
--- End diff --

I think the purpose of `AssertNotNull` is used to give proper exception in 
runtime when an expression (note: it can be nullable or non-nullable 
expression) evaluates to null value.

Maybe for `MapObjects`, we can safely remove it. But I am not sure other 
cases it is okay too.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...

2017-04-17 Thread lvdongr
Github user lvdongr commented on a diff in the pull request:

https://github.com/apache/spark/pull/17620#discussion_r111732189
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -561,6 +561,11 @@ private[deploy] class Master(
 state = RecoveryState.ALIVE
 schedule()
 logInfo("Recovery complete - resuming operations!")
+   } catch {
--- End diff --

Thank you very much, I've changed the commit,  you can see  if there are 
any other problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17647: [SPARK-20344][Scheduler]� Duplicate call in FairSchedula...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17647
  
**[Test build #3663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3663/testReport)**
 for PR 17647 at commit 
[`410df60`](https://github.com/apache/spark/commit/410df60dee822c36f1a924fbb141e0129259abd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #75853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75853/testReport)**
 for PR 16989 at commit 
[`63f059d`](https://github.com/apache/spark/commit/63f059de847264f0ecc66bfb83a575e2ca928ae6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #75855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75855/testReport)**
 for PR 16989 at commit 
[`63f059d`](https://github.com/apache/spark/commit/63f059de847264f0ecc66bfb83a575e2ca928ae6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17557
  
**[Test build #75854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75854/testReport)**
 for PR 17557 at commit 
[`4c02933`](https://github.com/apache/spark/commit/4c02933fb4e945940ddf3732d90082ec371ba5c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...

2017-04-17 Thread ptkool
Github user ptkool commented on a diff in the pull request:

https://github.com/apache/spark/pull/17650#discussion_r111733014
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
 ---
@@ -160,4 +166,12 @@ class BooleanSimplificationSuite extends PlanTest with 
PredicateHelper {
   testRelation.where('a > 2 || ('b > 3 && 'b < 5)))
 comparePlans(actual, expected)
   }
+
+  test("Complementation Laws") {
+checkCondition('a && !'a, LocalRelation(testRelation.output, 
Seq.empty))
--- End diff --

Yes, but I wanted to make sure it was clear that the resulting plan node 
was a LocalRelation node that produced no data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...

2017-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17651
  
Yes... I hope anyone is able to reproduce this by the steps I did - 
https://github.com/apache/spark/pull/17477#issuecomment-294094092 or confirm I 
did this wrong.

I am fine with leaving this open for some days more and see if there is 
anyone who at least confirm this fxies the issue or has a better idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...

2017-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17651
  
FWIW, I at least checked this overrides the dependency (after manually 
mismatching the versions between pom and sbt) if I haven't done something wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17557
  
**[Test build #75854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75854/testReport)**
 for PR 17557 at commit 
[`4c02933`](https://github.com/apache/spark/commit/4c02933fb4e945940ddf3732d90082ec371ba5c7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-04-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r111734780
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -133,36 +135,53 @@ private[spark] class HighlyCompressedMapStatus 
private (
 private[this] var loc: BlockManagerId,
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
-private[this] var avgSize: Long)
+private[this] var avgSize: Long,
+private[this] var hugeBlockSizes: HashMap[Int, Byte])
   extends MapStatus with Externalizable {
 
   // loc could be null when the default constructor is called during 
deserialization
   require(loc == null || avgSize > 0 || numNonEmptyBlocks == 0,
 "Average size can only be zero for map stages that produced no output")
 
-  protected def this() = this(null, -1, null, -1)  // For deserialization 
only
+  def this() = this(null, -1, null, -1, null)  // For deserialization only
--- End diff --

Remove the `protected` and make this visible for test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75854/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17557
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #75855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75855/testReport)**
 for PR 16989 at commit 
[`63f059d`](https://github.com/apache/spark/commit/63f059de847264f0ecc66bfb83a575e2ca928ae6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #75853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75853/testReport)**
 for PR 16989 at commit 
[`63f059d`](https://github.com/apache/spark/commit/63f059de847264f0ecc66bfb83a575e2ca928ae6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75855/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75853/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17647: [SPARK-20344][Scheduler]� Duplicate call in FairSchedula...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17647
  
**[Test build #3663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3663/testReport)**
 for PR 17647 at commit 
[`410df60`](https://github.com/apache/spark/commit/410df60dee822c36f1a924fbb141e0129259abd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-04-17 Thread redsanket
GitHub user redsanket opened a pull request:

https://github.com/apache/spark/pull/17658

[SPARK-20355] Add per application spark version on the history server 
headerpage

## What changes were proposed in this pull request?

Spark Version for a specific application is not displayed on the history 
page now. It should be nice to switch the spark version on the UI when we click 
on the specific application.
Currently there seems to be way as SparkListenerLogStart records the 
application version. So, it should be trivial to listen to this event and 
provision this change on the UI.
For Example
https://cloud.githubusercontent.com/assets/8295799/25092588/fd53325e-2353-11e7-9ac7-ba304f81ba1a.png";>
https://cloud.githubusercontent.com/assets/8295799/25092595/0549aace-2354-11e7-80a7-e044da2d5e0f.png";>


{"Event":"SparkListenerLogStart","Spark Version":"2.0.0"}
(Please fill in changes proposed in this fix)
Modified the SparkUI for History server to listen to SparkLogListenerStart 
event and extract the version and print it.

## How was this patch tested?
Manual testing of UI page. Attaching the UI screenshot changes here

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/redsanket/spark SPARK-20355

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17658.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17658


commit 1f50b2750714bfcb2c77b9932ed7c5fca3d7cfa3
Author: Sanket 
Date:   2017-04-06T13:50:22Z

Add per application spark version on the history server headerpage




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17658
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...

2017-04-17 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/17640
  
+1 on what @felixcheung said -- It'll be good to have more tests in 
test_Serde.R. Other than that the change looks fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-17 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17658
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17658
  
**[Test build #75856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75856/testReport)**
 for PR 17658 at commit 
[`1f50b27`](https://github.com/apache/spark/commit/1f50b2750714bfcb2c77b9932ed7c5fca3d7cfa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17658
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75856/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17658
  
**[Test build #75856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75856/testReport)**
 for PR 17658 at commit 
[`1f50b27`](https://github.com/apache/spark/commit/1f50b2750714bfcb2c77b9932ed7c5fca3d7cfa3).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17658
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-04-17 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/14617
  
hi @jerryshao good points.  First, we should probably move this discussion 
to jira so its more visible -- feel free to open two issues for these if you 
want, or first discuss on dev@.  (Sorry its my fault for starting a discussion 
here in the PR comments after the original change was merged ...)

On your first point, I like the idea of exposing more information on memory 
usage, but is there anything meaningful to report on execution memory in the 
rest api?  It doesn't really seem like there is.  Maybe we should rename the 
fields, but keep backwards compatibility in mind.

For the second point about storage memory limits -- its a good question 
about what it should report with the Unified Memory Manager.  I thought we'd 
just expose the portion of memory immune to eviction, 
`spark.memory.storageFraction`.  But perhaps I misunderstand what is going on 
now.

Again, probably better to have the design discussion on jiras.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json...

2017-04-17 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17653
  
merged to master. thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to ...

2017-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17653


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][R][DOCS] Document R fpGrowth suppor...

2017-04-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r111764966
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,37 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`. `itemsCol` should be an array of values.
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "T,R,U", "T,S", "V,R", "R,U,T,V", "R,S", "V,S,U", "U,R", "S,T", "V,R", 
"V,U,S",
+  "T,V,U", "R,V", "T,S", "T,S", "S,T", "S,U", "T,R", "V,R", "S,V", "T,S,U"
+))), "split(items, ',') AS items")
--- End diff --

do you mean making sure we have all the SQL functions in R?
we don't, actually, since it's a evolving tasks - there are constantly new 
functions being added.

I think you are referring to `split` - yes we should probably add that in R 
too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17625: [SPARK-9103][WIP] Add Memory Tracking UI and track Netty...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17625
  
**[Test build #75857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75857/testReport)**
 for PR 17625 at commit 
[`577d442`](https://github.com/apache/spark/commit/577d44211888232541a39026850ab7aa7b125e70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #75858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75858/testReport)**
 for PR 16989 at commit 
[`b6a8993`](https://github.com/apache/spark/commit/b6a8993ee96da8be336603e3699e29582c157130).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][R][DOCS] Document R fpGrowth suppor...

2017-04-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r111765227
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,37 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`. `itemsCol` should be an array of values.
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "T,R,U", "T,S", "V,R", "R,U,T,V", "R,S", "V,S,U", "U,R", "S,T", "V,R", 
"V,U,S",
+  "T,V,U", "R,V", "T,S", "T,S", "S,T", "S,U", "T,R", "V,R", "S,V", "T,S,U"
+))), "split(items, ',') AS items")
--- End diff --

nit: could you please rename the dataframe to `df` like the other example 
you have too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #75858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75858/testReport)**
 for PR 16989 at commit 
[`b6a8993`](https://github.com/apache/spark/commit/b6a8993ee96da8be336603e3699e29582c157130).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75858/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17625: [SPARK-9103][WIP] Add Memory Tracking UI and track Netty...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17625
  
**[Test build #75857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75857/testReport)**
 for PR 17625 at commit 
[`577d442`](https://github.com/apache/spark/commit/577d44211888232541a39026850ab7aa7b125e70).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17625: [SPARK-9103][WIP] Add Memory Tracking UI and track Netty...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17625
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-04-17 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r111766089
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -39,6 +39,32 @@ object SQLExecution {
 executionIdToQueryExecution.get(executionId)
   }
 
+  private val testing = sys.props.contains("spark.testing")
+
+  private[sql] def checkSQLExecutionId(sparkSession: SparkSession): Unit = 
{
--- End diff --

To keep this PR from growing too big, I want to just use it where I've 
removed `withNewExecutionId` to check for regressions. I'll follow up with 
another PR with more checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17625: [SPARK-9103][WIP] Add Memory Tracking UI and track Netty...

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17625
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75857/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17557
  
**[Test build #75859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75859/testReport)**
 for PR 17557 at commit 
[`ab251f1`](https://github.com/apache/spark/commit/ab251f1829906bb2be7ab3befb96fcacaa78f127).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][R][DOCS] Document R fpGrowth suppor...

2017-04-17 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r111767880
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,37 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`. `itemsCol` should be an array of values.
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "T,R,U", "T,S", "V,R", "R,U,T,V", "R,S", "V,S,U", "U,R", "S,T", "V,R", 
"V,U,S",
+  "T,V,U", "R,V", "T,S", "T,S", "S,T", "S,U", "T,R", "V,R", "S,V", "T,S,U"
+))), "split(items, ',') AS items")
--- End diff --

I was pretty sure I've seen one :) `split` and `array`. There are of course 
name conflicts involved (`spark.array`?) but it would be really useful to have 
these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-17 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/17540
  
Removed the failing SPARK-10548 test and rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expre...

2017-04-17 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/17652


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17540
  
**[Test build #75860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75860/testReport)**
 for PR 17540 at commit 
[`901cec8`](https://github.com/apache/spark/commit/901cec88d278ea63072f32dd3236780c6bb2d449).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17646: [SPARK-20349] [SQL] ListFunctions returns duplica...

2017-04-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17646#discussion_r111768721
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1202,15 +1203,25 @@ class SessionCatalog(
   def listFunctions(db: String, pattern: String): Seq[(FunctionIdentifier, 
String)] = {
 val dbName = formatDatabaseName(db)
 requireDbExists(dbName)
-val dbFunctions = externalCatalog.listFunctions(dbName, pattern)
-  .map { f => FunctionIdentifier(f, Some(dbName)) }
-val loadedFunctions = 
StringUtils.filterPattern(functionRegistry.listFunction(), pattern)
-  .map { f => FunctionIdentifier(f) }
+val dbFunctions = externalCatalog.listFunctions(dbName, pattern).map { 
f =>
+  FunctionIdentifier(f, Some(dbName)) }
+val loadedFunctions =
+  StringUtils.filterPattern(functionRegistry.listFunction(), 
pattern).map { f =>
+// In functionRegistry, function names are stored as an unquoted 
format.
--- End diff --

Yes. Will do it using this way in the following refactoring.

We can first fix the issue and then backport it to the previous branches. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...

2017-04-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17655#discussion_r111770874
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -114,14 +114,14 @@ class SessionCatalog(
* Format table name, taking into account case sensitivity.
*/
   protected[this] def formatTableName(name: String): String = {
-if (conf.caseSensitiveAnalysis) name else name.toLowerCase
+if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
--- End diff --

We have the restrictions on database/table names. That is, the names only 
contain characters, numbers, and _. 

Without the fixe in this PR, users are not allowed to read/write/create a 
table whose name containing `I`, because `toLowerCase` will convert it to `ı` 
when the locale is `tr`. The names become illegal. Is my understanding right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17646: [SPARK-20349] [SQL] ListFunctions returns duplicate func...

2017-04-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17646
  
Thanks! Merging to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17646: [SPARK-20349] [SQL] ListFunctions returns duplica...

2017-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17646


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17636: [SPARK-20334][SQL] Return a better error message when co...

2017-04-17 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/17636
  
cc @gatorsmile @hvanhovell @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17656: [SPARK-20354][CORE][REST-API]When I request access to th...

2017-04-17 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/17656
  
yeah, looks like the right change, I think it was just overlooked in 
https://issues.apache.org/jira/browse/SPARK-14245

I'd ask that you add an assertion to this unit test: 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala#L655

```scala
(attempts(0) \ "sparkUser").extract[String] should not be ("")
```

aside, I'm not sure why SPARK-14245 introduced a different way of getting 
the user from what the history server uses, but in any case I think this change 
is right, to do the same thing as the UI, even if those internals should be 
changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-04-17 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17375
  
LGTM I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17557
  
**[Test build #75859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75859/testReport)**
 for PR 17557 at commit 
[`ab251f1`](https://github.com/apache/spark/commit/ab251f1829906bb2be7ab3befb96fcacaa78f127).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75859/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17557: [SPARK-20208][R][DOCS] Document R fpGrowth support

2017-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17557
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17374: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collecti...

2017-04-17 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17374
  
LGTM I'll merge this today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-04-17 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/17633#discussion_r111773633
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-filters.collect {
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: 
IntegralType)) =>
-s"${a.name} ${op.symbol} $v"
-  case op @ BinaryComparison(Literal(v, _: IntegralType), a: 
Attribute) =>
-s"$v ${op.symbol} ${a.name}"
-  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
-  if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
-  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
-  if !varcharKeys.contains(a.name) =>
-s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
-}.mkString(" and ")
+def isFoldable(expr: Expression): Boolean =
+  (expr.dataType.isInstanceOf[IntegralType] || 
expr.dataType.isInstanceOf[StringType]) &&
--- End diff --

`IntegralType` encompasses all "integral" types, including `IntegerType`, 
`ByteType`, `ShortType`, etc.

I'm trying to be somewhat conservative in what we support here to ensure 
compatibility. Is there a particular type you'd like to see supported?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-04-17 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17375
  
Merged into 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17374: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collecti...

2017-04-17 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17374
  
@jbloom22 is there any particular reason your waiting on this in the 2.X 
branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17374: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collecti...

2017-04-17 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17374
  
Merged into branch-2.0.

Thanks for doing this @HyukjinKwon , sorry for my hesitation with merging 
backports (these are the first pure backport PRs I've merged rather than 
sanctimoniously merging into an old branch as well).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...

2017-04-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17655#discussion_r111781782
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -114,14 +114,14 @@ class SessionCatalog(
* Format table name, taking into account case sensitivity.
*/
   protected[this] def formatTableName(name: String): String = {
-if (conf.caseSensitiveAnalysis) name else name.toLowerCase
+if (conf.caseSensitiveAnalysis) name else name.toLowerCase(Locale.ROOT)
--- End diff --

Yes you are correct then, if these identifiers always have only 
alphanumeric characters. There's no case where lower-casing the table name 
should be locale-sensitive then.

Is this true of column names? 

It won't be true of data, and those are the cases I was trying to leave 
alone along with user-supplied table and col names, but maybe the latter two 
aren't locale-sensitive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...

2017-04-17 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/17640
  
I am adding more tests right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >