[GitHub] spark pull request #17724: [SPARK-18127] Add hooks and extension points to S...

2017-04-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17724#discussion_r112804293
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.collection.mutable
+
+import org.apache.spark.annotation.{DeveloperApi, Experimental, 
InterfaceStability}
+import org.apache.spark.sql.catalyst.parser.ParserInterface
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+ * :: Experimental ::
+ * Holder for injection points to the [[SparkSession]]. We make NO 
guarantee about the stability
+ * regarding binary compatibility and source compatibility of methods here.
+ *
+ * This current provides the following extension points:
+ * - Analyzer Rules.
+ * - Check Analysis Rules
+ * - Optimizer Rules.
+ * - Planning Strategies.
+ * - Customized Parser.
+ * - (External) Catalog listeners.
+ *
+ * The extensions can be used by calling withExtension on the 
[[SparkSession.Builder]], for
+ * example:
+ * {{{
+ *   SparkSession.builder()
+ * .master("...")
+ * .conf("...", true)
+ * .withExtensions { extensions =>
+ *   extensions.injectAnalyzerRule { session =>
--- End diff --

`injectAnalyzerRule` -> `buildResolutionRules`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76054/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17725
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17725
  
**[Test build #76054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76054/testReport)**
 for PR 17725 at commit 
[`0603686`](https://github.com/apache/spark/commit/06036867d96350a51e180565782ffee1515fbea4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17688
  
LGTM too. I just quickly checked if there are similar instances but I could 
not find and I checked R's one and Scala one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17680
  
gentle ping @liancheng and @davies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17724: [SPARK-18127] Add hooks and extension points to S...

2017-04-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17724#discussion_r112803968
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -848,6 +851,17 @@ object SparkSession {
 }
 
 /**
+ * Inject extensions into the [[SparkSession]]. This allows a user to 
add Analyzer rules,
+ * Optimizer rules, Planning Strategies or a customized parser.
+ *
+ * @since 2.3.0
--- End diff --

In the JIRA, the target version is 2.2. Do we still plan to backport it to 
2.2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...

2017-04-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17717
  
LGTM pending Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76057/testReport)**
 for PR 17712 at commit 
[`dd182e4`](https://github.com/apache/spark/commit/dd182e4f1981305852041debed23324c5f689a47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17717
  
**[Test build #76056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76056/testReport)**
 for PR 17717 at commit 
[`9b5bdc7`](https://github.com/apache/spark/commit/9b5bdc7199e0e5e3f9b3bf7cbaa79b698e5fe3f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...

2017-04-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17717#discussion_r112803448
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   .filter($"x1".isNotNull || !$"y".isin("a!"))
   .count
   }
+
+  test("SPARK-20430 Initialize Range parameters in a deriver side") {
--- End diff --

yea, will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17717#discussion_r112803232
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   .filter($"x1".isNotNull || !$"y".isin("a!"))
   .count
   }
+
+  test("SPARK-20430 Initialize Range parameters in a deriver side") {
--- End diff --

driver


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17717#discussion_r112803234
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   .filter($"x1".isNotNull || !$"y".isin("a!"))
   .count
   }
+
+  test("SPARK-20430 Initialize Range parameters in a deriver side") {
--- End diff --

also move this into dataframe range suite?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112803151
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+name: Option[String]) {
+
+  // Optionally used for printing an UDF name in EXPLAIN
+  def withName(name: String): UserDefinedFunction = {
+UserDefinedFunction(f, dataType, inputTypes, Option(name))
+  }
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   def apply(exprs: Column*): Column = {
-Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil)))
+Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), name))
+  }
+}
+
+object UserDefinedFunction {
--- End diff --

for now, I'll revert it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112803097
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+name: Option[String]) {
+
+  // Optionally used for printing an UDF name in EXPLAIN
+  def withName(name: String): UserDefinedFunction = {
+UserDefinedFunction(f, dataType, inputTypes, Option(name))
+  }
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   def apply(exprs: Column*): Column = {
-Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil)))
+Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), name))
+  }
+}
+
+object UserDefinedFunction {
--- End diff --

ah ok - that sucks. that means this will break compatibility ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17725
  
**[Test build #76055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76055/testReport)**
 for PR 17725 at commit 
[`14b0d72`](https://github.com/apache/spark/commit/14b0d72d5fffb69694a2442ade6399161f99545c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread markgrover
Github user markgrover commented on the issue:

https://github.com/apache/spark/pull/17725
  
Few decisions that I made here:
* I considered if just `sun.java.command` property should be checked and 
redacted but that seemed to specific and likely a bandaid to the current 
problem, not a long-term solution, so decided against doing it.
* Redaction for the `SparkListenerEnvironmentUpdate` event was solely being 
done on `Spark Properties`, while `sun.java.command` is a part of `System 
Properties`. I considered doing redaction for `System Properties` in addition 
to `Spark Properties` (that would have gone somewhere around 
[here](https://github.com/apache/spark/pull/17725/files#diff-e4a5a68c15eed95d038acfed84b0b66aL258))
 but decided against it because that would have even more hardcoding and I 
didn't see why these 2 special kinds of properties are special enough to be 
redacted but the rest of them. So, decided to redact information from all kinds 
of properties.
* One way to redact the property value would have been to redact the 
minimum possible set from the value while keeping the rest of the value intact. 
For example, if the following were the unredacted case:
`"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf 
spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password --conf 
spark.other.property=2"`
One option for the redacted output could have been:
`"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf 
spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=*(redacted) --conf 
spark.other.property=2"`
However, such a redaction is very hard to maintain. For example, we would 
had to take the current regex (which is `(?i)secret|password` by default and 
add matchers to it like so `(?i)secret|password` like 
`"("+SECRET_REDACTION_DEFAULT+"[^ ]*=)[^ ]*"`. That would allow us to squeeze 
out and replaced just the matched portion. But this all seemed very fragile and 
even worse when the user supplies a non-default regex so I decided it was 
easiest to simply replace the entire value, even though only a small part of it 
contained `secret` or `password` in it.
* One thing which I didn't explicitly check was the performance 
implications of this change. The reason I bring this up is because, previously 
we were comparing keys with a regex, now if the key 
doesn't match, we match the value with the regex. So, in the worst case, we 
are twice as many regex matches as before. Also, before we were simply doing 
regex matching on `Spark Properties`, now we do them on all properties - `Spark 
Properties`, `System Properties`, `JVM Properties` and `Classpath Properties`. 
I don't think this should have a big performance impact so I didn't invest time 
in it, mentioning here in interest of full disclosure.

Thanks in advance for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17725
  
**[Test build #76054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76054/testReport)**
 for PR 17725 at commit 
[`0603686`](https://github.com/apache/spark/commit/06036867d96350a51e180565782ffee1515fbea4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17693: [SPARK-16548][SQL] Inconsistent error handling in...

2017-04-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17693#discussion_r112801838
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -149,7 +149,8 @@ case class GetJsonObject(json: Expression, path: 
Expression)
 
 if (parsed.isDefined) {
   try {
-Utils.tryWithResource(jsonFactory.createParser(jsonStr.getBytes)) 
{ parser =>
+Utils.tryWithResource(jsonFactory.createParser(new 
InputStreamReader(
--- End diff --

please add some comments to say that, this is to avoid a bug in encoding 
detection, and we explicitly specify the encoding(UTF8) here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-21 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17693
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17712
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76053/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76053/testReport)**
 for PR 17712 at commit 
[`8800c3b`](https://github.com/apache/spark/commit/8800c3b15048bad5926e2b2ed280b042fa5c9d47).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17712
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17725
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17725
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76051/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76053/testReport)**
 for PR 17712 at commit 
[`8800c3b`](https://github.com/apache/spark/commit/8800c3b15048bad5926e2b2ed280b042fa5c9d47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17725
  
**[Test build #76051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76051/testReport)**
 for PR 17725 at commit 
[`2f5148a`](https://github.com/apache/spark/commit/2f5148a2e37d8d36006fb297b28a9e8c21a0026b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112801078
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+name: Option[String]) {
+
+  // Optionally used for printing an UDF name in EXPLAIN
+  def withName(name: String): UserDefinedFunction = {
+UserDefinedFunction(f, dataType, inputTypes, Option(name))
+  }
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   def apply(exprs: Column*): Column = {
-Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil)))
+Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), name))
+  }
+}
+
+object UserDefinedFunction {
--- End diff --

oh, it seems we couldn't add `unapply` there because:
```
[error] 
/Users/maropu/IdeaProjects/spark/spark-master/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala:45:
 method unapply is defined twic
e
[error]   conflicting symbols both originated in file 
'/Users/maropu/IdeaProjects/spark/spark-master/sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFu
nction.scala'
[error] case class UserDefinedFunction protected[sql] (
[error]^
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...

2017-04-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17672
  
Will be probably cleaner





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112800829
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+name: Option[String]) {
+
+  // Optionally used for printing an UDF name in EXPLAIN
+  def withName(name: String): UserDefinedFunction = {
+UserDefinedFunction(f, dataType, inputTypes, Option(name))
+  }
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   def apply(exprs: Column*): Column = {
-Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil)))
+Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), name))
+  }
+}
+
+object UserDefinedFunction {
--- End diff --

ok. Is it okay to update the MiMa file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112800640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -45,14 +45,33 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+name: Option[String]) {
+
+  // Optionally used for printing an UDF name in EXPLAIN
+  def withName(name: String): UserDefinedFunction = {
+UserDefinedFunction(f, dataType, inputTypes, Option(name))
+  }
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   def apply(exprs: Column*): Column = {
-Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil)))
+Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), name))
+  }
+}
+
+object UserDefinedFunction {
--- End diff --

also need an unapply function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17724
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17724
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76049/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17724
  
**[Test build #76049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76049/testReport)**
 for PR 17724 at commit 
[`105962a`](https://github.com/apache/spark/commit/105962a4e22e7eb7a668bc0793e7a22965a7a041).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17712
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17712
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76052/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76052 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76052/testReport)**
 for PR 17712 at commit 
[`96bc89d`](https://github.com/apache/spark/commit/96bc89d49456bcd00d863950dc0da1271153d186).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76052 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76052/testReport)**
 for PR 17712 at commit 
[`96bc89d`](https://github.com/apache/spark/commit/96bc89d49456bcd00d863950dc0da1271153d186).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...

2017-04-21 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17672
  
Yeah, I have this feeling that it could be deliberate, but I cannot figure 
out what is the purpose. Removing `@exports` should be enough, shouldn't it?

I thought about cleaning this up, but I wonder if it is better to wait for 
[SPARK-16693](https://issues.apache.org/jira/browse/SPARK-16693).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17725
  
**[Test build #76051 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76051/testReport)**
 for PR 17725 at commit 
[`2f5148a`](https://github.com/apache/spark/commit/2f5148a2e37d8d36006fb297b28a9e8c21a0026b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...

2017-04-21 Thread markgrover
GitHub user markgrover opened a pull request:

https://github.com/apache/spark/pull/17725

[SPARK-20435][CORE] More thorough redaction of sensitive information

This change does a more thorough redaction of sensitive information from 
logs and UI
Add unit tests that ensure that no regressions happen that leak sensitive 
information to the logs.

Previously redaction logic was only checking if the key matched the secret 
regex pattern, it'd redact it's value. That worked for most cases. However, in 
the above case, the key (sun.java.command) doesn't tell much, so the value 
needs to be searched. This PR expands the check to check for values as well.


## How was this patch tested?

New unit tests added that ensure that no sensitive information is present 
in the event logs or the yarn logs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markgrover/spark spark-20435

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17725.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17725


commit 2f5148a2e37d8d36006fb297b28a9e8c21a0026b
Author: Mark Grover 
Date:   2017-04-22T00:24:30Z

[SPARK-20435][CORE] More thorough redaction of sensitive information from 
logs/UI, more unit tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread mgummelt
Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/17723
  
cc @vanzin @jerryshao @skonto 

BTW @vanzin, I decided to parameterize `HadoopFSCredentialProvider` with a 
new `HadoopAccessManager` object, for which YARN provides a custom 
`YARNHadoopAccessManager`.  I did this instead of conditioning on 
`SparkHadoopUtil.get.isYarnMode`, since I prefer functional parameterization 
over global values. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17712
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76050/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76050 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76050/testReport)**
 for PR 17712 at commit 
[`5d797a9`](https://github.com/apache/spark/commit/5d797a97fa936a4534ea381f6aa3cfd5545310ce).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17712
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76047/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76047 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76047/testReport)**
 for PR 17723 at commit 
[`d6d21d1`](https://github.com/apache/spark/commit/d6d21d165a451ce7a285baa98387cbf341fb4739).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...

2017-04-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17672
  
Not really its just inconsistent handling.
Some comment changes can be deliberated though.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17712
  
**[Test build #76050 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76050/testReport)**
 for PR 17712 at commit 
[`5d797a9`](https://github.com/apache/spark/commit/5d797a97fa936a4534ea381f6aa3cfd5545310ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...

2017-04-21 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17672
  
BTW @felixcheung - is there any deeper reason behind current stat of 
`generics.R`? I mean:

- Inconsistent usage of standard and `roxygen` comments.
- Marking functions which are not to be exported with `@export`.
- Slightly mixed up order (both in groups and between groups).
- Some minor inconsistencies (like marking `asc` as `@rdname 
columnfunctions`).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17719: [SPARK-20431][SQL] Specify a schema by using a DDL-forma...

2017-04-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17719
  
cc: @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...

2017-04-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17717
  
cc: @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112795922
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -47,12 +47,20 @@ case class UserDefinedFunction protected[sql] (
 dataType: DataType,
 inputTypes: Option[Seq[DataType]]) {
 
+  // Optionally used for printing UDF names in EXPLAIN
+  private var nameOption: Option[String] = None
--- End diff --

okay, I'll recheck


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17724
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17724
  
**[Test build #76049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76049/testReport)**
 for PR 17724 at commit 
[`105962a`](https://github.com/apache/spark/commit/105962a4e22e7eb7a668bc0793e7a22965a7a041).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17648
  
I was saying rather than implementing them, just rewrite them into an 
aggregate on the conditions and compare them against the value.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434] Move kerberos delegation token code...

2017-04-21 Thread mgummelt
Github user mgummelt commented on a diff in the pull request:

https://github.com/apache/spark/pull/17723#discussion_r112788867
  
--- Diff: core/pom.xml ---
@@ -357,6 +357,34 @@
   org.apache.commons
   commons-crypto
 
+
+
+
+  ${hive.group}
+  hive-exec
--- End diff --

I still don't know how to place these in the `test` scope, which is where 
they belong.  See my comment here: 
https://github.com/apache/spark/pull/17665/files#r112337820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17713
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17713
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76043/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17713
  
**[Test build #76043 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76043/testReport)**
 for PR 17713 at commit 
[`39e8cf7`](https://github.com/apache/spark/commit/39e8cf752f5bd3325edbb93e69ee09b92026242f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17724
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17724
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76048/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17724
  
**[Test build #76048 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76048/testReport)**
 for PR 17724 at commit 
[`c83b4ee`](https://github.com/apache/spark/commit/c83b4ee67b9cf4506cc8ce1ce449055463b1bda9).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"Name of the class used to configure Spark Session 
extensions. The class should \" +`
  * `class SparkSessionExtensions `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17697: [SPARK-20414][MLLIB] avoid creating only 16 reduc...

2017-04-21 Thread yangyangyyy
Github user yangyangyyy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17697#discussion_r112783447
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala ---
@@ -39,8 +39,8 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: 
RDD[(K, V)]) extends Se
* @param ord the implicit ordering for T
* @return an RDD that contains the top k values for each key
*/
-  def topByKey(num: Int)(implicit ord: Ordering[V]): RDD[(K, Array[V])] = {
-self.aggregateByKey(new BoundedPriorityQueue[V](num)(ord))(
+  def topByKey(num: Int, bucketsCount: Int = 200)(implicit ord: 
Ordering[V]): RDD[(K, Array[V])] = {
--- End diff --

@HyukjinKwon   yes , updated that way



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17724
  
**[Test build #76048 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76048/testReport)**
 for PR 17724 at commit 
[`c83b4ee`](https://github.com/apache/spark/commit/c83b4ee67b9cf4506cc8ce1ce449055463b1bda9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17724: [SPARK-18127] Add hooks and extension points to Spark

2017-04-21 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/17724
  
cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17724: [SPARK-18127] Add hooks and extension points to S...

2017-04-21 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/17724

[SPARK-18127] Add hooks and extension points to Spark

## What changes were proposed in this pull request?

This patch adds support for customizing the spark session by injecting 
user-defined custom extensions. This allows a user to add custom analyzer 
rules/checks, optimizer rules, planning strategies or even a customized parser.

## How was this patch tested?

Unit Tests in SparkSessionExtensionSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark session-extensions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17724


commit c83b4ee67b9cf4506cc8ce1ce449055463b1bda9
Author: Sameer Agarwal 
Date:   2017-04-13T21:58:53Z

Add SparkSessionExtensions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76047 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76047/testReport)**
 for PR 17723 at commit 
[`d6d21d1`](https://github.com/apache/spark/commit/d6d21d165a451ce7a285baa98387cbf341fb4739).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-21 Thread ewasserman
Github user ewasserman commented on the issue:

https://github.com/apache/spark/pull/17693
  
Reverted from use of  toString on the 
org.apache.spark.unsafe.types.UTF8String by running the byte array through a 
java.io.Reader. This still fixes the bug and is also more efficient on the JSON 
parser side so it is a net performance win as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76046/testReport)**
 for PR 17723 at commit 
[`a546aab`](https://github.com/apache/spark/commit/a546aab923520ccec7683c3b320a5b92dedc3f1e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76046/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76046/testReport)**
 for PR 17723 at commit 
[`a546aab`](https://github.com/apache/spark/commit/a546aab923520ccec7683c3b320a5b92dedc3f1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76045/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76045 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76045/testReport)**
 for PR 17723 at commit 
[`e15f1ab`](https://github.com/apache/spark/commit/e15f1abcd708d32d863523135ba9fe8690ba2d9c).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76045 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76045/testReport)**
 for PR 17723 at commit 
[`e15f1ab`](https://github.com/apache/spark/commit/e15f1abcd708d32d863523135ba9fe8690ba2d9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76044 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76044/testReport)**
 for PR 17723 at commit 
[`ad4e33b`](https://github.com/apache/spark/commit/ad4e33b9f379538ddcbdb9468f4bb39cafc46057).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76044/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17723: [SPARK-20434] Move kerberos delegation token code from y...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #76044 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76044/testReport)**
 for PR 17723 at commit 
[`ad4e33b`](https://github.com/apache/spark/commit/ad4e33b9f379538ddcbdb9468f4bb39cafc46057).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17723: [SPARK-20434] Move kerberos delegation token code...

2017-04-21 Thread mgummelt
GitHub user mgummelt opened a pull request:

https://github.com/apache/spark/pull/17723

[SPARK-20434] Move kerberos delegation token code from yarn to core

## What changes were proposed in this pull request?

Move kerberos delegation token code from yarn to core, so that other 
schedulers (such as Mesos), may use it.

## How was this patch tested?

unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mesosphere/spark SPARK-20434-refactor-kerberos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17723


commit ce63a9b6399176b8fa2c59c1004d796ef77c3d71
Author: Dr. Stefan Schimanski 
Date:   2016-02-10T17:09:46Z

[Mesosphere SPARK-126] Move YarnSparkHadoopUtil token helpers into the 
generic SparkHadoopUtil class

commit 75d849a494519a5af97bf22df7676b336746ac92
Author: Dr. Stefan Schimanski 
Date:   2016-02-10T17:11:20Z

[Mesosphere SPARK-126] Add Mesos Kerberos support

commit 35002f2bd2e906bf1c6e6800f1f346e962edca75
Author: Michael Gummelt 
Date:   2017-04-17T22:31:25Z

Par down kerberos support

commit 13981c8fe7934a8cee53be4cfd59fb14c8d9b07c
Author: Michael Gummelt 
Date:   2017-04-17T22:57:51Z

cleanup

commit af4a3e4f53509ee1bee714d0846518d2696e0800
Author: Michael Gummelt 
Date:   2017-04-17T23:14:05Z

style

commit 5cc66dc91e7684c582b08a84b4901541dd60e38b
Author: Michael Gummelt 
Date:   2017-04-18T00:27:28Z

Add MesosSecurityManager

commit a47c9c04f61dce38f64e291c66793742239761b7
Author: Michael Gummelt 
Date:   2017-04-18T00:43:18Z

info logs

commit c8ec0496ca1c12e5eb43c530f08cb033a7c862fa
Author: Michael Gummelt 
Date:   2017-04-18T20:24:11Z

style

commit 954eeffda336bbbf6d5a588a38c95f092ecf1679
Author: Michael Gummelt 
Date:   2017-04-18T21:34:14Z

Re-add org.apache.spark.deploy.yarn.security.ServiceCredentialProvider for 
backwards compatibility

commit 2d769287edd2ac6867e9696798c116fdf9165411
Author: Michael Gummelt 
Date:   2017-04-18T21:43:56Z

move YARNHadoopFSCredentialProviderSuite

commit d8a968d66c577cc702d00e980c968a57c3f12565
Author: Michael Gummelt 
Date:   2017-04-19T17:35:03Z

Move hive test deps to the core module

commit b8093c863ce9af3eadc3fd2b371e1bafe4cf4a47
Author: Michael Gummelt 
Date:   2017-04-19T22:10:25Z

remove test scope

commit 25d508823d238d905b102196962f39900b5c526a
Author: Michael Gummelt 
Date:   2017-04-19T22:50:10Z

remove test scope

commit 4c387ebcb584732d0d67e83c0b9d5f4cfd1db247
Author: Michael Gummelt 
Date:   2017-04-20T22:15:51Z

Removed MesosSecurityManager, added RPC call, removed META-INF 
ServiceCredentialProvider from core

commit e32afeeac95883138751c060a3ebfaf309e3d22f
Author: Michael Gummelt 
Date:   2017-04-20T22:17:37Z

add InterfaceStability annotation to ServiceCredentialProvider

commit be69f5a639caad0abadafcae471e71847fc9f935
Author: Michael Gummelt 
Date:   2017-04-21T01:00:52Z

Add HadoopAccessManager

commit 55616da9f0fd15f1594233b5fe43b04ef1c901c8
Author: Michael Gummelt 
Date:   2017-04-21T19:28:43Z

Remove mesos code

commit 240df317dd42584349a3c4a0bf6f7d78a4fbe0e6
Author: Michael Gummelt 
Date:   2017-04-21T19:38:07Z

re-add mistakenly removed files

commit 810c6b26e3830e0f4e08e66df2d6a6f50cc65c7b
Author: Michael Gummelt 
Date:   2017-04-21T20:14:16Z

test ConfigurableCredentialManager.obtainUserTokens

commit ad4e33b9f379538ddcbdb9468f4bb39cafc46057
Author: Michael Gummelt 
Date:   2017-04-21T21:03:41Z

add tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-21 Thread ptkool
Github user ptkool commented on the issue:

https://github.com/apache/spark/pull/17648
  
@rxin I'm not sure where you're going with your proposal. These are 
aggregate functions, not scalar functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-21 Thread vundela
Github user vundela commented on the issue:

https://github.com/apache/spark/pull/17688
  
@holdenk Thanks for the review. Can you please let me know the line number 
where you are expecting list of types missing. Is this for fillna or other API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17683: [SPARK-20386][Spark Core]modify the log info if the bloc...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17683
  
**[Test build #3672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3672/testReport)**
 for PR 17683 at commit 
[`664dfb8`](https://github.com/apache/spark/commit/664dfb8848c38826886430700bfa926116ad28bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-21 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17688
  
We should also update the list of types a few lines up while we are fixing 
this. thanks a lot for catching this @vundela


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17694: [SPARK-12717][PYSPARK] Resolving race condition with pys...

2017-04-21 Thread vundela
Github user vundela commented on the issue:

https://github.com/apache/spark/pull/17694
  
Filed a PR for fixing the issue in spark1.6 branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112764788
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -147,6 +153,17 @@ class KinesisSequenceRangeIterator(
   private var lastSeqNumber: String = null
   private var internalIterator: Iterator[Record] = null
 
+  // variable for kinesis wait time interval between next retry
+  private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs(
+Try {sparkConf.get("spark.streaming.kinesis.retry.waitTime")}
--- End diff --

This complexity isn't necessary. You can achieve the same effect by using 
an alternate form of 
[```SparkConf.get()```](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkConf@get(key:String,defaultValue:String):String):

```scala
private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs(
  sparkConf.get("spark.streaming.kinesis.retry.waitTime", 
MIN_RETRY_WAIT_TIME_MS))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112764350
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -112,7 +116,8 @@ class KinesisBackedBlockRDD[T: ClassTag](
   val credentials = kinesisCreds.provider.getCredentials
   partition.seqNumberRanges.ranges.iterator.flatMap { range =>
 new KinesisSequenceRangeIterator(credentials, endpointUrl, 
regionName,
-  range, retryTimeoutMs).map(messageHandler)
+  range, retryTimeoutMs, sparkConf
+).map(messageHandler)
--- End diff --

*nit:* Move this to end of previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112765111
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -17,21 +17,24 @@
 
 package org.apache.spark.streaming.kinesis
 
-import scala.collection.JavaConverters._
-import scala.reflect.ClassTag
-import scala.util.control.NonFatal
-
-import com.amazonaws.auth.{AWSCredentials, 
DefaultAWSCredentialsProviderChain}
+import com.amazonaws.auth.AWSCredentials
 import com.amazonaws.services.kinesis.AmazonKinesisClient
 import com.amazonaws.services.kinesis.clientlibrary.types.UserRecord
 import com.amazonaws.services.kinesis.model._
-
 import org.apache.spark._
 import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.JavaUtils
 import org.apache.spark.rdd.{BlockRDD, BlockRDDPartition}
 import org.apache.spark.storage.BlockId
 import org.apache.spark.util.NextIterator
 
+import scala.collection.JavaConverters._
--- End diff --

Why change the ordering of this import group? I don't think this is 
consistent with the scalastyle for this project.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112765206
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -17,21 +17,24 @@
 
 package org.apache.spark.streaming.kinesis
 
-import scala.collection.JavaConverters._
-import scala.reflect.ClassTag
-import scala.util.control.NonFatal
-
-import com.amazonaws.auth.{AWSCredentials, 
DefaultAWSCredentialsProviderChain}
+import com.amazonaws.auth.AWSCredentials
 import com.amazonaws.services.kinesis.AmazonKinesisClient
 import com.amazonaws.services.kinesis.clientlibrary.types.UserRecord
 import com.amazonaws.services.kinesis.model._
-
--- End diff --

I think this newline should be kept to be consistent with the project's 
scalastyle. Have you been running style checks when testing this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112766374
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -147,6 +153,17 @@ class KinesisSequenceRangeIterator(
   private var lastSeqNumber: String = null
   private var internalIterator: Iterator[Record] = null
 
+  // variable for kinesis wait time interval between next retry
+  private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs(
+Try {sparkConf.get("spark.streaming.kinesis.retry.waitTime")}
--- End diff --

It may also be useful to declare these keys as public constants in a 
sensible location such as the [companion object to 
```KinesisInputDStream```](https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala#L84),
 e.g.:

```scala
object KinesisInputDStream {
...
  /**
   * Relevant doc
   */
  val RETRY_WAIT_TIME_KEY = "spark.streaming.kinesis.retry.waitTime"
 
  /**
   * Relevant doc
   */
  val RETRY_MAX_ATTEMPTS_KEY = "spark.streaming.kinesis.retry.maxAttempts"
...
```

This will make things a little less brittle for users who want to 
dynamically fill in SparkConf values in their apps. You would also be able use 
these constants in unit tests here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112766633
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala
 ---
@@ -101,6 +101,37 @@ abstract class 
KinesisBackedBlockRDDTests(aggregateTestData: Boolean)
 }
   }
 
+  testIfEnabled("Basic reading from Kinesis with modified configurations") 
{
+// Add Kinesis retry configurations
+sc.conf.set("spark.streaming.kinesis.retry.waitTime", "1000ms")
+sc.conf.set("spark.streaming.kinesis.retry.maxAttempts", "5")
+
+// Verify all data using multiple ranges in a single RDD partition
+val receivedData1 = new KinesisBackedBlockRDD[Array[Byte]](sc, 
testUtils.regionName,
+  testUtils.endpointUrl, fakeBlockIds(1),
+  Array(SequenceNumberRanges(allRanges.toArray)),
+  sparkConf = sc.getConf).map { bytes => new String(bytes).toInt 
}.collect()
+assert(receivedData1.toSet === testData.toSet)
+
+// Verify all data using one range in each of the multiple RDD 
partitions
+val receivedData2 = new KinesisBackedBlockRDD[Array[Byte]](sc, 
testUtils.regionName,
+  testUtils.endpointUrl, fakeBlockIds(allRanges.size),
+  allRanges.map { range => SequenceNumberRanges(Array(range)) 
}.toArray,
+  sparkConf = sc.getConf).map { bytes => new String(bytes).toInt 
}.collect()
+assert(receivedData2.toSet === testData.toSet)
+
+// Verify ordering within each partition
+val receivedData3 = new KinesisBackedBlockRDD[Array[Byte]](sc, 
testUtils.regionName,
+  testUtils.endpointUrl, fakeBlockIds(allRanges.size),
+  allRanges.map { range => SequenceNumberRanges(Array(range)) 
}.toArray,
+  sparkConf = sc.getConf
+).map { bytes => new String(bytes).toInt }.collectPartitions()
--- End diff --

*nit:* move this to the end of previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112764808
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -147,6 +153,17 @@ class KinesisSequenceRangeIterator(
   private var lastSeqNumber: String = null
   private var internalIterator: Iterator[Record] = null
 
+  // variable for kinesis wait time interval between next retry
+  private val kinesisWaitTimeMs = JavaUtils.timeStringAsMs(
+Try {sparkConf.get("spark.streaming.kinesis.retry.waitTime")}
+  .getOrElse(MIN_RETRY_WAIT_TIME_MS)
+  )
+
+  // variable for kinesis max retry attempts
+  private val kinesisMaxRetries =
+Try {sparkConf.get("spark.streaming.kinesis.retry.maxAttempts")}
--- End diff --

See above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-21 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112764344
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -83,7 +86,8 @@ class KinesisBackedBlockRDD[T: ClassTag](
 @transient private val isBlockIdValid: Array[Boolean] = Array.empty,
 val retryTimeoutMs: Int = 1,
 val messageHandler: Record => T = 
KinesisInputDStream.defaultMessageHandler _,
-val kinesisCreds: SparkAWSCredentials = DefaultCredentials
+val kinesisCreds: SparkAWSCredentials = DefaultCredentials,
+val sparkConf: SparkConf = new SparkConf()
--- End diff --

Why does this need to be provided as a constructor parameter? You'll want 
to use the global ```SparkConf``` for the context via ```sc.getConf```. To 
avoid bringing ```sc``` into the serialized closure for the ```compute()``` 
method and raising an exception you can alias it as a private field in this 
class:

```scala
private val sparkConf: SparkConf = sc.getConf
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17713: [SPARK-20417][SQL] Move subquery error handling to check...

2017-04-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17713
  
**[Test build #76043 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76043/testReport)**
 for PR 17713 at commit 
[`39e8cf7`](https://github.com/apache/spark/commit/39e8cf752f5bd3325edbb93e69ee09b92026242f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17672: [SPARK-20371][R] Add wrappers for collect_list and colle...

2017-04-21 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17672
  
Thanks @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76042/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17688
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >