date:20170814

[GitHub] spark pull request #18920: [SPARK-19471][SQL]AggregationIterator does not in...

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18920#discussion_r133002116
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -449,6 +451,49 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
   }
 
+  private def assertNoExceptions(c: Column): Unit = {
--- End diff --

Could you submit a follow-up PR to move this test case to 
`DataFrameAggregateSuite`? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...

2017-08-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18887#discussion_r133001818
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala
 ---
@@ -76,6 +76,14 @@ private[history] case class LoadedAppUI(
 private[history] abstract class ApplicationHistoryProvider {
 
   /**
+   * The number of applications available for listing. Separate method in 
case it's cheaper
+   * to get a count than to calculate the whole listing.
--- End diff --

Actually it doesn't seem like this is used anymore and I can remove it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...

2017-08-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18887#discussion_r133001417
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -31,6 +33,9 @@ class ApplicationInfo private[spark](
 val memoryPerExecutorMB: Option[Int],
 val attempts: Seq[ApplicationAttemptInfo])
 
+@JsonIgnoreProperties(
+  value = Array("startTimeEpoch", "endTimeEpoch", "lastUpdatedEpoch"),
--- End diff --

No, this just avoids trying to deserialize them, which would cause an error 
because these properties have no setter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...

2017-08-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18887#discussion_r133001335
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -742,53 +698,145 @@ private[history] object FsHistoryProvider {
   private val APPL_END_EVENT_PREFIX = 
"{\"Event\":\"SparkListenerApplicationEnd\""
 
   private val LOG_START_EVENT_PREFIX = 
"{\"Event\":\"SparkListenerLogStart\""
+
+  private val CURRENT_VERSION = 1L
 }
 
 /**
- * Application attempt information.
- *
- * @param logPath path to the log file, or, for a legacy log, its directory
- * @param name application name
- * @param appId application ID
- * @param attemptId optional attempt ID
- * @param startTime start time (from playback)
- * @param endTime end time (from playback). -1 if the application is 
incomplete.
- * @param lastUpdated the modification time of the log file when this 
entry was built by replaying
- *the history.
- * @param sparkUser user running the application
- * @param completed flag to indicate whether or not the application has 
completed.
- * @param fileSize the size of the log file the last time the file was 
scanned for changes
+ * A KVStoreSerializer that provides Scala types serialization too, and 
uses the same options as
+ * the API serializer.
  */
-private class FsApplicationAttemptInfo(
+private class KVStoreScalaSerializer extends KVStoreSerializer {
+
+  mapper.registerModule(DefaultScalaModule)
+  mapper.setSerializationInclusion(JsonInclude.Include.NON_NULL)
+  mapper.setDateFormat(v1.JacksonMessageWriter.makeISODateFormat)
+
+}
+
+private[history] case class KVStoreMetadata(
+  val version: Long,
+  val logDir: String)
+
+private[history] case class LogInfo(
+  @KVIndexParam val logPath: String,
+  val fileSize: Long)
+
+private[history] class AttemptInfoWrapper(
+val info: v1.ApplicationAttemptInfo,
--- End diff --

Yes, I'm using this syntax because in many places there are conflicting 
type names in the API package and in other packages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18887: [SPARK-20642][core] Store FsHistoryProvider listi...

2017-08-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18887#discussion_r133000455
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala
 ---
@@ -76,6 +76,14 @@ private[history] case class LoadedAppUI(
 private[history] abstract class ApplicationHistoryProvider {
 
   /**
+   * The number of applications available for listing. Separate method in 
case it's cheaper
+   * to get a count than to calculate the whole listing.
--- End diff --

This is an interface, so this was added to allow implementations to 
override this method if that makes sense.

It just looks like I lost the override in one of my rebases, so let me add 
that back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18918
  
Yes. We should fix it in `object PhysicalOperation`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18920: [SPARK-19471][SQL]AggregationIterator does not in...

2017-08-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18920


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18920
  
Thanks! Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18920
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...

2017-08-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18914


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

2017-08-14 Thread dbolshak

Github user dbolshak commented on the issue:

https://github.com/apache/spark/pull/18940
  
LGTM, btw, no unit tests for the change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...

2017-08-14 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18929
  
They're there mainly to declare the constant as a public API that must not 
change. (I'm not sure whether mima captures changes in constant values, since 
that's a binary breaking change, but that's the spirit of having these 
constants.)

I didn't change all of the usages when I introduced them because it would 
be really noisy. There are also a whole bunch of other constants that could be 
re-used throughout the code (basically all the constants declared in 
`SparkLauncher`). But I think there's no real need to change this - we can 
encourage new code to use the constants, but leave the old code there until it 
needs to be changed, to avoid noise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18914
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18700#discussion_r132996396
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1096,8 +1099,42 @@ class SessionCatalog(
* This performs reflection to decide what type of [[Expression]] to 
return in the builder.
*/
   protected def makeFunctionBuilder(name: String, functionClassName: 
String): FunctionBuilder = {
-// TODO: at least support UDAFs here
-throw new UnsupportedOperationException("Use 
sqlContext.udf.register(...) instead.")
+makeFunctionBuilder(name, Utils.classForName(functionClassName))
+  }
+
+  /**
+   * Construct a [[FunctionBuilder]] based on the provided class that 
represents a function.
+   */
+  private def makeFunctionBuilder(name: String, clazz: Class[_]): 
FunctionBuilder = {
+// When we instantiate ScalaUDAF class, we may throw exception if the 
input
+// expressions don't satisfy the UDAF, such as type mismatch, input 
number
+// mismatch, etc. Here we catch the exception and throw 
AnalysisException instead.
+(children: Seq[Expression]) => {
+  try {
+val clsForUDAF =
+  
Utils.classForName("org.apache.spark.sql.expressions.UserDefinedAggregateFunction")
--- End diff --

```Scala
/**
 * The base class for implementing user-defined aggregate functions (UDAF).
 *
 * @since 1.5.0
 */
@InterfaceStability.Stable
abstract class UserDefinedAggregateFunction
```

This interface has been marked as stable. Can we still move it? or make a 
trait in Catalyst? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...

2017-08-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18700#discussion_r132994080
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1096,8 +1099,42 @@ class SessionCatalog(
* This performs reflection to decide what type of [[Expression]] to 
return in the builder.
*/
   protected def makeFunctionBuilder(name: String, functionClassName: 
String): FunctionBuilder = {
--- End diff --

The changes 
[here](https://github.com/apache/spark/pull/18700/files#diff-ca4533edbf148c89cc0c564ab6b0aeaa)
 are for `HiveSessionCatalog`. Also, we have a test case in 
`HiveUDAFSuite.scala` to verify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread byakuinss

Github user byakuinss commented on the issue:

https://github.com/apache/spark/pull/18895
  
Okay, I leave a comment in the issue page. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9518
  
**[Test build #80638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80638/testReport)**
 for PR 9518 at commit 
[`1ec9cc9`](https://github.com/apache/spark/commit/1ec9cc967ebb8789edb80bdae28d7c24b5d49a6c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18866: [WIP][SPARK-21649][SQL] Support writing data into hive b...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18866
  
**[Test build #80637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80637/testReport)**
 for PR 18866 at commit 
[`6df2e78`](https://github.com/apache/spark/commit/6df2e7803a9769cd296a4b1b37756340504f6684).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
Can you maybe leave any comment saying.. like .. "here is my JIRA account." 
in https://issues.apache.org/jira/browse/SPARK-21658 if you don't mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
Hm.. weird. I can't search your account on JIRA ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18907
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18907
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80630/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread byakuinss

Github user byakuinss commented on the issue:

https://github.com/apache/spark/pull/18895
  
@HyukjinKwon 
Oh, do you mean my jira full name? It's `Chin Han Yu`.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18907
  
**[Test build #80630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80630/testReport)**
 for PR 18907 at commit 
[`8f4bc08`](https://github.com/apache/spark/commit/8f4bc087df88cdb8c0308c6607d944f7bdf37019).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CatalogRelation extends LeafNode `
  * `case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends 
CatalogRelation `
  * `case class HiveTableRelation(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-14 Thread nchammas

Github user nchammas commented on the issue:

https://github.com/apache/spark/pull/18926
  
To summarize the feedback from @HyukjinKwon and @gatorsmile, I think what I 
need to do is:
* Add a test for the mixed type case.
* Explicitly check for `long` in Python 2 and throw a `TypeError` from 
PySpark.
* Add a test for the `long` `TypeError` in Python 2.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18940
  
**[Test build #80636 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80636/testReport)**
 for PR 18940 at commit 
[`f23a4c7`](https://github.com/apache/spark/commit/f23a4c79b69fd1f8a77162da34b8821cb0cc1352).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18468
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...

2017-08-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18895


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
@byakuinss, BTW, do you mind if I ask your JIRA id? I want to assign this 
to you as you resolved this but I can't find the ID..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

2017-08-14 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/18940
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18468
  
**[Test build #80629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80629/testReport)**
 for PR 18468 at commit 
[`a26dc15`](https://github.com/apache/spark/commit/a26dc150f6b95cc42558561cd2548de04a89f041).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18938
  
**[Test build #80635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80635/testReport)**
 for PR 18938 at commit 
[`b87562f`](https://github.com/apache/spark/commit/b87562f6e81c1696373b4413479f884520504345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...

2017-08-14 Thread jmchung

Github user jmchung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18930#discussion_r132984129
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -361,10 +361,18 @@ case class JsonTuple(children: Seq[Expression])
   // the fields to query are the remaining children
   @transient private lazy val fieldExpressions: Seq[Expression] = 
children.tail
 
+  // a field name given with constant null will be replaced with this 
pseudo field name
+  private val nullFieldName = "__NullFieldName"
--- End diff --

@HyukjinKwon @viirya  Yep, we've discarded the fake field name and use 
Option here. We made a slight revision to deal with the None in 
`foldableFieldNames` instead of creating a new function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r132983668
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
 ---
@@ -79,7 +79,7 @@ public ExpressionInfo(
 assert name != null;
 assert arguments != null;
 assert examples != null;
-assert examples.isEmpty() || examples.startsWith("\n
Examples:");
+assert examples.isEmpty() || 
examples.startsWith(System.lineSeparator() + "Examples:");
--- End diff --

I don't think we support Windows for dev. This assertion should probably be 
weakened anyway but that's a separate issue from this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18855#discussion_r132982799
  
--- Diff: project/SparkBuild.scala ---
@@ -790,7 +790,7 @@ object TestSettings {
 javaOptions in Test ++= 
System.getProperties.asScala.filter(_._1.startsWith("spark"))
   .map { case (k,v) => s"-D$k=$v" }.toSeq,
 javaOptions in Test += "-ea",
-javaOptions in Test ++= "-Xmx3g -Xss4096k"
+javaOptions in Test ++= "-Xmx6g -Xss4096k"
--- End diff --

I am +1 for separating it if this can be. Let's get some changes we are 
sure of into the code base first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18930
  
**[Test build #80634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80634/testReport)**
 for PR 18930 at commit 
[`5d71263`](https://github.com/apache/spark/commit/5d712637ba0710d9edda79c2097b4044adca75e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18940: YSPARK-734 Change CacheLoader to limit entries based on ...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18940
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18940: YSPARK-734 Change CacheLoader to limit entries ba...

2017-08-14 Thread redsanket

GitHub user redsanket opened a pull request:

https://github.com/apache/spark/pull/18940

YSPARK-734 Change CacheLoader to limit entries based on memory footprint

Right now the spark shuffle service has a cache for index files. It is 
based on a # of files cached (spark.shuffle.service.index.cache.entries). This 
can cause issues if people have a lot of reducers because the size of each 
entry can fluctuate based on the # of reducers.
We saw an issues with a job that had 17 reducers and it caused NM with 
spark shuffle service to use 700-800MB or memory in NM by itself.
We should change this cache to be memory based and only allow a certain 
memory size used. When I say memory based I mean the cache should have a limit 
of say 100MB.

https://issues.apache.org/jira/browse/SPARK-21501

Manual Testing with 17 reducers has been performed with cache loaded up 
to max 100MB default limit, with each shuffle index file of size 1.3MB. 
Eviction takes place as soon as the total cache size reaches the 100MB limit 
and the objects will be ready for garbage collection there by avoiding NM to 
crash. No notable difference in runtime has been observed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/redsanket/spark SPARK-21501

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18940


commit f23a4c79b69fd1f8a77162da34b8821cb0cc1352
Author: Sanket Chintapalli 
Date:   2017-07-27T14:59:40Z

YSPARK-734 Change CacheLoader to limit entries based on memory footprint




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...

2017-08-14 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/18855#discussion_r132978316
  
--- Diff: project/SparkBuild.scala ---
@@ -790,7 +790,7 @@ object TestSettings {
 javaOptions in Test ++= 
System.getProperties.asScala.filter(_._1.startsWith("spark"))
   .map { case (k,v) => s"-D$k=$v" }.toSeq,
 javaOptions in Test += "-ea",
-javaOptions in Test ++= "-Xmx3g -Xss4096k"
+javaOptions in Test ++= "-Xmx6g -Xss4096k"
--- End diff --

@cloud-fan , let's wait few hours and see what the other guys CCed for this 
(the last ones to edit the build) have to say about this. if they are also 
worried or do not comment I'll revert this.

I must say I'm reluctant to revert these tests as I personally believe that 
lack of such tests contributed to spark's 2GB issues, including this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18895
  
LGTM too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80633/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80633/testReport)**
 for PR 18895 at commit 
[`d07d49a`](https://github.com/apache/spark/commit/d07d49aa9dbff1a87a947da1309612a355aaeac2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18913: [SPARK-21563][CORE] Fix race condition when serializing ...

2017-08-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18913
  
LGTM, merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18895
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18913: [SPARK-21563][CORE] Fix race condition when seria...

2017-08-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18913


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-14 Thread mike0sv

Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@srowen @HyukjinKwon it seems like it's all ok now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80633/testReport)**
 for PR 18895 at commit 
[`d07d49a`](https://github.com/apache/spark/commit/d07d49aa9dbff1a87a947da1309612a355aaeac2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...

2017-08-14 Thread byakuinss

Github user byakuinss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18895#discussion_r132968529
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1403,6 +1403,16 @@ def replace(self, to_replace, value=None, 
subset=None):
 |null|  null|null|
 ++--++
 
+>>> df4.na.replace('Alice').show()
+++--++
+| age|height|name|
+++--++
+|  10|80|null|
+|   5|  null| Bob|
+|null|  null| Tom|
+|null|  null|null|
+++--++ 
--- End diff --

Thanks for your reminding! I'll remove them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18895: [SPARK-21658][SQL][PYSPARK] Add default None for ...

2017-08-14 Thread byakuinss

Github user byakuinss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18895#discussion_r132968408
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1837,8 +1847,8 @@ def fill(self, value, subset=None):
 
 fill.__doc__ = DataFrame.fillna.__doc__
 
-def replace(self, to_replace, value, subset=None):
-return self.df.replace(to_replace, value, subset)
+def replace(self, to_replace, value=None, subset=None):
+return self.df.replace(to_replace=to_replace, value=value, 
subset=subset)
--- End diff --

Got it, I'll change them back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80632/testReport)**
 for PR 18895 at commit 
[`abdef40`](https://github.com/apache/spark/commit/abdef40adc187f1a7b8b5e4db7601b517f893741).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80632/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18895
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18895: [SPARK-21658][SQL][PYSPARK] Add default None for value i...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18895
  
**[Test build #80632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80632/testReport)**
 for PR 18895 at commit 
[`abdef40`](https://github.com/apache/spark/commit/abdef40adc187f1a7b8b5e4db7601b517f893741).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18939: [SPARK-21724][SQL][DOC] Adds since information in the do...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18939
  
**[Test build #80631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80631/testReport)**
 for PR 18939 at commit 
[`1cf870c`](https://github.com/apache/spark/commit/1cf870c0a54649d2cc1e29b1b7b0be6d2daa739c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...

2017-08-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18939
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18855: [SPARK-3151] [Block Manager] DiskStore.getBytes f...

2017-08-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18855#discussion_r132964212
  
--- Diff: project/SparkBuild.scala ---
@@ -790,7 +790,7 @@ object TestSettings {
 javaOptions in Test ++= 
System.getProperties.asScala.filter(_._1.startsWith("spark"))
   .map { case (k,v) => s"-D$k=$v" }.toSeq,
 javaOptions in Test += "-ea",
-javaOptions in Test ++= "-Xmx3g -Xss4096k"
+javaOptions in Test ++= "-Xmx6g -Xss4096k"
--- End diff --

I'm a little worried about this change. Since the change to 
`BlockManagerSuite` is not very related to this PR, can we revert and revisit 
it in follow-up PR? Then we can unblock this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18907: [SPARK-18464][SQL][followup] support old table which doe...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18907
  
**[Test build #80630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80630/testReport)**
 for PR 18907 at commit 
[`8f4bc08`](https://github.com/apache/spark/commit/8f4bc087df88cdb8c0308c6607d944f7bdf37019).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...

2017-08-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18700#discussion_r132961933
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1096,8 +1099,42 @@ class SessionCatalog(
* This performs reflection to decide what type of [[Expression]] to 
return in the builder.
*/
   protected def makeFunctionBuilder(name: String, functionClassName: 
String): FunctionBuilder = {
--- End diff --

this will be overwritten by `HiveSessionCatalog`, does it mean we can not 
register spark UDAF if hive support is enable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18700: [SPARK-21499] [SQL] Support creating persistent f...

2017-08-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18700#discussion_r132961262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1096,8 +1099,42 @@ class SessionCatalog(
* This performs reflection to decide what type of [[Expression]] to 
return in the builder.
*/
   protected def makeFunctionBuilder(name: String, functionClassName: 
String): FunctionBuilder = {
-// TODO: at least support UDAFs here
-throw new UnsupportedOperationException("Use 
sqlContext.udf.register(...) instead.")
+makeFunctionBuilder(name, Utils.classForName(functionClassName))
+  }
+
+  /**
+   * Construct a [[FunctionBuilder]] based on the provided class that 
represents a function.
+   */
+  private def makeFunctionBuilder(name: String, clazz: Class[_]): 
FunctionBuilder = {
+// When we instantiate ScalaUDAF class, we may throw exception if the 
input
+// expressions don't satisfy the UDAF, such as type mismatch, input 
number
+// mismatch, etc. Here we catch the exception and throw 
AnalysisException instead.
+(children: Seq[Expression]) => {
+  try {
+val clsForUDAF =
+  
Utils.classForName("org.apache.spark.sql.expressions.UserDefinedAggregateFunction")
--- End diff --

shall we move the UDAF interface to catalyst?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18939
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80626/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18939
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18939
  
**[Test build #80626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80626/testReport)**
 for PR 18939 at commit 
[`1cf870c`](https://github.com/apache/spark/commit/1cf870c0a54649d2cc1e29b1b7b0be6d2daa739c).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18918
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80628/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18918
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18918
  
**[Test build #80628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80628/testReport)**
 for PR 18918 at commit 
[`bf81c45`](https://github.com/apache/spark/commit/bf81c45469e8554fc76eec0c97e2b5fc7f397f3f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18920
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80625/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18920
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18920
  
**[Test build #80625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80625/testReport)**
 for PR 18920 at commit 
[`d58ffaa`](https://github.com/apache/spark/commit/d58ffaa434337ae19f4b1f59524c84943ff7934f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18934
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18934
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18934
  
**[Test build #80627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80627/testReport)**
 for PR 18934 at commit 
[`13defbb`](https://github.com/apache/spark/commit/13defbbd26a2ec4806c1fc94b890f6f43068d411).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18468
  
**[Test build #80629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80629/testReport)**
 for PR 18468 at commit 
[`a26dc15`](https://github.com/apache/spark/commit/a26dc150f6b95cc42558561cd2548de04a89f041).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-08-14 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18468
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80624/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18938
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18938
  
**[Test build #80624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80624/testReport)**
 for PR 18938 at commit 
[`cf68f69`](https://github.com/apache/spark/commit/cf68f6960180817530ef3755edfb0b426cb6cb77).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/18902#discussion_r132939361
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
@@ -133,23 +134,29 @@ class Imputer @Since("2.2.0") (@Since("2.2.0") 
override val uid: String)
   override def fit(dataset: Dataset[_]): ImputerModel = {
 transformSchema(dataset.schema, logging = true)
 val spark = dataset.sparkSession
-import spark.implicits._
-val surrogates = $(inputCols).map { inputCol =>
-  val ic = col(inputCol)
-  val filtered = dataset.select(ic.cast(DoubleType))
-.filter(ic.isNotNull && ic =!= $(missingValue) && !ic.isNaN)
-  if(filtered.take(1).length == 0) {
-throw new SparkException(s"surrogate cannot be computed. " +
-  s"All the values in $inputCol are Null, Nan or 
missingValue(${$(missingValue)})")
-  }
-  val surrogate = $(strategy) match {
-case Imputer.mean => 
filtered.select(avg(inputCol)).as[Double].first()
-case Imputer.median => filtered.stat.approxQuantile(inputCol, 
Array(0.5), 0.001).head
-  }
-  surrogate
+
+val selected = dataset.select($(inputCols).map(col(_).cast("double")): 
_*).rdd
+
+val summarizer = $(strategy) match {
+  case Imputer.mean =>
+new Imputer.MeanSummarizer($(inputCols).length, $(missingValue))
+  case Imputer.median =>
+new Imputer.MedianSummarizer($(inputCols).length, $(missingValue))
+}
+
+val summary = selected.treeAggregate(summarizer)(
+  seqOp = { case (sum, row) => sum.update(row) },
+  combOp = { case (sum1, sum2) => sum1.merge(sum2) }
+)
+
+val emptyCols = ($(inputCols) zip summary.counts).filter(_._2 == 
0).map(_._1)
+if(emptyCols.nonEmpty) {
--- End diff --

Style: space between `if` and `(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/18902#discussion_r132939323
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
@@ -133,23 +134,29 @@ class Imputer @Since("2.2.0") (@Since("2.2.0") 
override val uid: String)
   override def fit(dataset: Dataset[_]): ImputerModel = {
 transformSchema(dataset.schema, logging = true)
 val spark = dataset.sparkSession
-import spark.implicits._
-val surrogates = $(inputCols).map { inputCol =>
-  val ic = col(inputCol)
-  val filtered = dataset.select(ic.cast(DoubleType))
-.filter(ic.isNotNull && ic =!= $(missingValue) && !ic.isNaN)
-  if(filtered.take(1).length == 0) {
-throw new SparkException(s"surrogate cannot be computed. " +
-  s"All the values in $inputCol are Null, Nan or 
missingValue(${$(missingValue)})")
-  }
-  val surrogate = $(strategy) match {
-case Imputer.mean => 
filtered.select(avg(inputCol)).as[Double].first()
-case Imputer.median => filtered.stat.approxQuantile(inputCol, 
Array(0.5), 0.001).head
-  }
-  surrogate
+
+val selected = dataset.select($(inputCols).map(col(_).cast("double")): 
_*).rdd
+
+val summarizer = $(strategy) match {
+  case Imputer.mean =>
+new Imputer.MeanSummarizer($(inputCols).length, $(missingValue))
+  case Imputer.median =>
+new Imputer.MedianSummarizer($(inputCols).length, $(missingValue))
+}
+
+val summary = selected.treeAggregate(summarizer)(
+  seqOp = { case (sum, row) => sum.update(row) },
+  combOp = { case (sum1, sum2) => sum1.merge(sum2) }
+)
+
+val emptyCols = ($(inputCols) zip summary.counts).filter(_._2 == 
0).map(_._1)
--- End diff --

Style - use dot notation here not infix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...

2017-08-14 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18929
  
Maybe @vanzin can weigh in here, because the real question is whether these 
constants in the launcher module are meant to be _the_ single definition of 
them used throughout the code. core depends on launcher and uses these 
constants a little bit, but not consistently. Most of the other code doesn't 
seem to use it. That is, there are hundreds more changes like this you could 
make.

Consistency is good. In contrast, there are only about 8 usages of these 
constants outside the launcher module. Is it simpler to achieve some 
consistency by removing those usages? Then it seems like a small step backwards 
to not use them (and yet declare them), but is also much less change.

In the end, this is why I don't know if it's worth trying to standardize, 
because it is also hard to keep it standard anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18920
  
**[Test build #80625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80625/testReport)**
 for PR 18920 at commit 
[`d58ffaa`](https://github.com/apache/spark/commit/d58ffaa434337ae19f4b1f59524c84943ff7934f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18939: [WIP][SPARK-21724][SQL][DOC] Adds since information in t...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18939
  
**[Test build #80626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80626/testReport)**
 for PR 18939 at commit 
[`1cf870c`](https://github.com/apache/spark/commit/1cf870c0a54649d2cc1e29b1b7b0be6d2daa739c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18918: [SPARK-21707][SQL]Improvement a special case for non-det...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18918
  
**[Test build #80628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80628/testReport)**
 for PR 18918 at commit 
[`bf81c45`](https://github.com/apache/spark/commit/bf81c45469e8554fc76eec0c97e2b5fc7f397f3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18934: [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache w...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18934
  
**[Test build #80627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80627/testReport)**
 for PR 18934 at commit 
[`13defbb`](https://github.com/apache/spark/commit/13defbbd26a2ec4806c1fc94b890f6f43068d411).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18938: [SPARK-21363][SQL] Prevent name duplication in (global/l...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18938
  
**[Test build #80624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80624/testReport)**
 for PR 18938 at commit 
[`cf68f69`](https://github.com/apache/spark/commit/cf68f6960180817530ef3755edfb0b426cb6cb77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18937
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18937
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18937: [MINOR] Remove false comment from planStreamingAggregati...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18937
  
**[Test build #80623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80623/testReport)**
 for PR 18937 at commit 
[`28919cc`](https://github.com/apache/spark/commit/28919cc9dee8408612d94e2e03be5e5fbbc076e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18933
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80622/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18933
  
**[Test build #80622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80622/testReport)**
 for PR 18933 at commit 
[`7df7ac9`](https://github.com/apache/spark/commit/7df7ac941da56ee9ae894ada3ae30661fddd4b03).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-14 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
@vanzin I dont forgot I didnt see agreement. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18935
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-08-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18935
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80618/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18918: [SPARK-21707][SQL]Improvement a special case for ...

2017-08-14 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18918#discussion_r132921051
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -522,6 +522,8 @@ object ColumnPruning extends Rule[LogicalPlan] {
* so remove it.
*/
   private def removeProjectBeforeFilter(plan: LogicalPlan): LogicalPlan = 
plan transform {
+case p1 @ Project(_, _ @ Filter(condition, _ @ Project(_, _: 
LeafNode)))
+  if !condition.deterministic => p1
--- End diff --

I don't get it from your explanation. If I understand it correctly, when 
there is a `Project` which selects subset of output from the `LeafNode`, if we 
remove it by the below pattern, we will retrieve all fields. Is it your purpose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18935: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-08-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18935
  
**[Test build #80618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80618/testReport)**
 for PR 18935 at commit 
[`05c1f4d`](https://github.com/apache/spark/commit/05c1f4de4f00639d5f1acf1b9c061e4894d8286d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  public class TransportClientFactory implements Closeable `
  * `public class NettyMemoryMetrics implements MetricSet `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18918: [SPARK-21707][SQL]Improvement a special case for ...

2017-08-14 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18918#discussion_r132919858
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 ---
@@ -360,5 +360,34 @@ class ColumnPruningSuite extends PlanTest {
 comparePlans(optimized2, expected2.analyze)
   }
 
+  test("SPARK-21707 the condition of filter is not deterministic that 
split to two project ") {
--- End diff --

Actually I don't get what the test title tries to say. Can you try to 
rephrase it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18929: [MINOR][LAUNCHER]Reuse EXECUTOR_MEMORY and EXECUTOR_CORE...

2017-08-14 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18929
  
@srowen @jerryshao 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

201 - 300 of 426 matches

Mail list logo