date:20170125

[GitHub] spark issue #16672: [SPARK-19329][SQL]insert data to a not exist location da...

2017-01-25 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16672
  
@gatorsmile could you give some suggestion? thanks very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16700
  
**[Test build #72017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72017/testReport)**
 for PR 16700 at commit 
[`40efce2`](https://github.com/apache/spark/commit/40efce2a607908ff06cce85c5e782ed3b1320546).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16709: [SPARK-19333][SPARKR] Add Apache License headers to R fi...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16709
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16709: [SPARK-19333][SPARKR] Add Apache License headers to R fi...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16709
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72015/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16709: [SPARK-19333][SPARKR] Add Apache License headers to R fi...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16709
  
**[Test build #72015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72015/testReport)**
 for PR 16709 at commit 
[`63a0356`](https://github.com/apache/spark/commit/63a035621934d99bd4cb475985090f08c46e1dfc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...

2017-01-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/13300
  
Yea, I also think `json` and `csv` stuffs should be consistent and they'd 
be better to have the same code structure and behaviour as @HyukjinKwon said. 
Since we do not have `DataFrameReader.json(Dataset[String])` in the current 
master, `functions.from_json` is the only interface for that conversion. If we 
add `csv(Dataset[String])` in `DataFrameReader`, do we need to also add 
`json(Dataset[String])` there even though we already have `functions.from_json`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16708
  
**[Test build #72016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72016/testReport)**
 for PR 16708 at commit 
[`048759b`](https://github.com/apache/spark/commit/048759b23d9c4303dc7e7c9cd6d6d6e8eb4a3c21).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...

2017-01-25 Thread xwu0226

Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/13300
  
@HyukjinKwon Thanks! After your #16680 is merged, submit a PR with the code 
you show above. then. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...

2017-01-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13300
  
Actually, this feature might not be urgent as said above but IMO I like 
this feature to be honest. I guess the reason it was hold is that IMHO it does 
not look a clean fix. 

I recently refactored this code path and I have one left PR, 
https://github.com/apache/spark/pull/16680. After hopefully merging, there can 
be a easy clean fix consistently with json one within 10ish line additions, for 
example, something like one below in `Dataset`...

```scala
def csv(csv: Dataset[String]): DataFrame = {
  val parsedOptions: CSVOptions = new CSVOptions(extraOptions.toMap)
  val caseSensitive = sparkSession.sessionState.conf.caseSensitive
  val schema = userSpecifiedSchema.getOrElse {
InferSchema.infer(csv, caseSensitive, parsedOptions)
  }

  val parsed = csv.mapPartitions { iter =>
val parser = new UnivocityParser(schema, caseSensitive, parsedOptions)
iter.flatMap(parser.parse)
  }

  Dataset.ofRows(
sparkSession,
LogicalRDD(schema.toAttributes, parsed)(sparkSession))
}
```

I remember there have been a quite bit of questions about this feature in 
spark-csv as thirdparty (and also spark-xml too).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-25 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16670#discussion_r97939920
  
--- Diff: R/pkg/inst/tests/testthat/test_Windows.R ---
@@ -20,7 +20,7 @@ test_that("sparkJars tag in SparkContext", {
   if (.Platform$OS.type != "windows") {
 skip("This test is only for Windows, skipped")
   }
-  testOutput <- launchScript("ECHO", "a/b/c", capture = TRUE)
+  testOutput <- launchScript("ECHO", "a/b/c", wait = TRUE)
--- End diff --

This is an example of what in a R IDE see:
```
> head(p, 40)
Error in handleErrors(returnStatus, conn) :
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 70.0 failed 1 times, most recent failure: Lost task 0.0 in stage 70.0 
(TID 115, localhost, executor driver): org.apache.spark.SparkException: Failed 
to execute user defined function($anonfun$4: (string) => double)
   at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
   at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
   at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
   at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16709: [SPARK-19333][SPARKR] Add Apache License headers to R fi...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16709
  
**[Test build #72015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72015/testReport)**
 for PR 16709 at commit 
[`63a0356`](https://github.com/apache/spark/commit/63a035621934d99bd4cb475985090f08c46e1dfc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16709: [SPARK-19333][SPARKR] Add Apache License headers ...

2017-01-25 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/16709

[SPARK-19333][SPARKR] Add Apache License headers to R files

## What changes were proposed in this pull request?

add header

## How was this patch tested?

Manual run to check vignettes html is created properly


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rfilelicense

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16709


commit 63a035621934d99bd4cb475985090f08c46e1dfc
Author: Felix Cheung 
Date:   2017-01-26T06:31:11Z

license




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-25 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16700
  
1.
renamePartition:
A=1/B=2/C=3 -> A=4/B=5/C=6
path created by Hive after renamePartition:
/path/a=4/b=5/c=6
and SparkSQL rename it /path/A=4/B=5/C=6, and this pr will delete 
`/path/a=4`.

2.
renamePartition:
a=1/B=2/C=3 -> a=4/B=5/C=6
path created by Hive after renamePartition:
/path/a=4/b=5/c=6
and SparkSQL rename it /path/a=4/B=5/C=6, and this pr will delete 
`/path/a=4/b=5`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13300: [SPARK-15463][SQL] support creating dataframe out...

2017-01-25 Thread xwu0226

Github user xwu0226 closed the pull request at:

https://github.com/apache/spark/pull/13300


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...

2017-01-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/13300
  
This pr seems stale and inactive. I know this kind of API changes has lower 
priorities now. So, how about closing this pr for now and setting `LATER` in 
the corresponding JIRA? Thought? cc: @rxin @xwu0226 @HyukjinKwon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16700
  
Yeah, if we having three columns, does your solution resolve all the issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-25 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16700
  
the example showed A/B are two partition columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r97938408
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 ---
@@ -120,6 +120,17 @@ object ExternalCatalogUtils {
   new Path(totalPath, nextPartPath)
 }
   }
+
+  def getExtraPartPathCreatedByHive(
--- End diff --

Happy Chinese New Year


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-25 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r97938358
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 ---
@@ -120,6 +120,17 @@ object ExternalCatalogUtils {
   new Path(totalPath, nextPartPath)
 }
   }
+
+  def getExtraPartPathCreatedByHive(
--- End diff --

ok ,thanks very much~ BTW, Happy Chinese New Year~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16707
  
**[Test build #72014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72014/testReport)**
 for PR 16707 at commit 
[`836f0fb`](https://github.com/apache/spark/commit/836f0fbd8d54ea23bc662ac8f016c684a9a8dacb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16707
  
It seems the latest test failure does not relate to this pr..., I'll test 
again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16707
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16707
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72010/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16707
  
**[Test build #72010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72010/testReport)**
 for PR 16707 at commit 
[`1b70561`](https://github.com/apache/spark/commit/1b705611925eefb30fbf5483ce1b976958192b01).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16707#discussion_r97937869
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -125,7 +125,7 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   def register[RT: TypeTag](name: String, func: Function0[RT]): 
UserDefinedFunction = {
 val dataType = ScalaReflection.schemaFor[RT].dataType
 val inputTypes = Try(Nil).toOption
-def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, 
inputTypes.getOrElse(Nil))
+def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, 
inputTypes.getOrElse(Nil), Some(name))
--- End diff --

oh, yea, you're right. Thanks! I'll fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16707#discussion_r97937744
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -125,7 +125,7 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   def register[RT: TypeTag](name: String, func: Function0[RT]): 
UserDefinedFunction = {
 val dataType = ScalaReflection.schemaFor[RT].dataType
 val inputTypes = Try(Nil).toOption
-def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, 
inputTypes.getOrElse(Nil))
+def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, 
inputTypes.getOrElse(Nil), Some(name))
--- End diff --

Oh, I remember I was told that we should remove or fix the comments above 
to generate this codes too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16707
  
LGTM pending jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16707
  
**[Test build #72013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72013/testReport)**
 for PR 16707 at commit 
[`637a39d`](https://github.com/apache/spark/commit/637a39d6f96dc8fb8e3d8c522804da532235a41f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16308
  
**[Test build #72012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72012/testReport)**
 for PR 16308 at commit 
[`6fa1d6a`](https://github.com/apache/spark/commit/6fa1d6a2e09bc103d190377b012158c8a37de5d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16685
  
Thank you! 

I prefer to pushing down the UPSERT workloads into the underlying DBMS, but 
not all the JDBC sources support it. Thus, maybe we can provide users two 
solutions at the same time. Let them choose based on their usage scenarios. 
Also cc @srowen @JoshRosen

BTW, I did not carefully review the solution. It might still have some 
holes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16674: [SPARK-19331][SQL][TESTS] Improve the test coverage of S...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16674
  
Happy New Year! : )

I double checked all the test case change. The changes makes sense to me. 
Could you resolve the conflicts? 

BTW, a general suggestion. Could you please add the comments when you move 
the codes in the PR? It can help others review the codes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16708
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16708
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72011/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16708
  
**[Test build #72011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72011/testReport)**
 for PR 16708 at commit 
[`68cb3e2`](https://github.com/apache/spark/commit/68cb3e2f92b1aaa14dddfcdb311a7d685d209e97).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16706
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72008/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16706
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16707
  
Aha, SGTM. I'll fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16706
  
**[Test build #72008 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72008/testReport)**
 for PR 16706 at commit 
[`b373c10`](https://github.com/apache/spark/commit/b373c103d623c985e03e5fc6e81d86a2c829bb0f).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16708
  
**[Test build #72011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72011/testReport)**
 for PR 16708 at commit 
[`68cb3e2`](https://github.com/apache/spark/commit/68cb3e2f92b1aaa14dddfcdb311a7d685d209e97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16708
  
Actually - why do we need this? I worry it can be a confusing API due to 
optimizer behavior.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16708: [SPARK-19366][SQL] add getNumPartitions to Datase...

2017-01-25 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16708#discussion_r97935710
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2421,6 +2421,13 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns the number of partitions of this Dataset.
+   * @group basic
+   * @since 2.2.0
+   */
+  def getNumPartitions: Int = rdd.getNumPartitions()
--- End diff --

why is this not just numPartitions?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16707
  
Maybe add a prefix so it is clear a UDF? e.g. `UDF:func_name(...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16708: [SPARK-19366][SQL] add getNumPartitions to Datase...

2017-01-25 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/16708

[SPARK-19366][SQL] add getNumPartitions to Dataset

## What changes were proposed in this pull request?

As suggested by @cloud-fan 
[here](https://github.com/apache/spark/pull/16668#discussion_r97254989), adding 
a simple wrapper in Scala can help avoid inefficiency with non-JVM cases

## How was this patch tested?

unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark getnumpart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16708.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16708


commit 68cb3e2f92b1aaa14dddfcdb311a7d685d209e97
Author: Felix Cheung 
Date:   2017-01-26T06:03:43Z

add getNumPartitions and test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97935192
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -250,6 +252,8 @@ class Dataset[T] private[sql](
 val hasMoreData = takeResult.length > numRows
 val data = takeResult.take(numRows)
 
+lazy val timeZone = 
TimeZone.getTimeZone(sparkSession.sessionState.conf.sessionLocalTimeZone)
--- End diff --

I think the preferred behavior would be that the result of `show()` should 
be changed by the session timezone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16688: [TESTS][SQL] Setup testdata at the beginning for ...

2017-01-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16688


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16688
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97934467
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -177,180 +177,186 @@ class DateTimeUtilsSuite extends SparkFunSuite {
   }
 
   test("string to timestamp") {
-var c = Calendar.getInstance()
-c.set(1969, 11, 31, 16, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(UTF8String.fromString("1969-12-31 
16:00:00")).get ===
-  c.getTimeInMillis * 1000)
-c.set(1, 0, 1, 0, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(UTF8String.fromString("0001")).get ===
-  c.getTimeInMillis * 1000)
-c = Calendar.getInstance()
-c.set(2015, 2, 1, 0, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(UTF8String.fromString("2015-03")).get ===
-  c.getTimeInMillis * 1000)
-c = Calendar.getInstance()
-c.set(2015, 2, 18, 0, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(UTF8String.fromString("2015-03-18")).get ===
-  c.getTimeInMillis * 1000)
-assert(stringToTimestamp(UTF8String.fromString("2015-03-18 ")).get ===
-  c.getTimeInMillis * 1000)
-assert(stringToTimestamp(UTF8String.fromString("2015-03-18T")).get ===
-  c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance()
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(UTF8String.fromString("2015-03-18 
12:03:17")).get ===
-  c.getTimeInMillis * 1000)
-
assert(stringToTimestamp(UTF8String.fromString("2015-03-18T12:03:17")).get ===
-  c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT-13:53"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17-13:53")).get === 
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("UTC"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-
assert(stringToTimestamp(UTF8String.fromString("2015-03-18T12:03:17Z")).get ===
-  c.getTimeInMillis * 1000)
-assert(stringToTimestamp(UTF8String.fromString("2015-03-18 
12:03:17Z")).get ===
-  c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT-01:00"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-
assert(stringToTimestamp(UTF8String.fromString("2015-03-18T12:03:17-1:0")).get 
===
-  c.getTimeInMillis * 1000)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17-01:00")).get === 
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:30"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17+07:30")).get === 
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:03"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17+07:03")).get === 
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance()
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18 12:03:17.123")).get === 
c.getTimeInMillis * 1000)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17.123")).get === 
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("UTC"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 456)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17.456Z")).get  === 
c.getTimeInMillis * 1000)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18 12:03:17.456Z")).get  === 
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT-01:00"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17.123-1:0")).get  === 
c.getTimeInMillis * 1000)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015-03-18T12:03:17.123-01:00")).get ===  
c.getTimeInMillis * 1000)
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:30"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-assert(stringToTimestamp(
-  UTF8String.fromString("2015

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97934459
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -107,108 +109,119 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("cast string to timestamp") {
-checkEvaluation(Cast(Literal("123"), TimestampType), null)
-
-var c = Calendar.getInstance()
-c.set(2015, 0, 1, 0, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-c = Calendar.getInstance()
-c.set(2015, 2, 1, 0, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-c = Calendar.getInstance()
-c.set(2015, 2, 18, 0, 0, 0)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03-18"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18 "), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18T"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance()
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03-18 12:03:17"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("UTC"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17Z"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18 12:03:17Z"), TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT-01:00"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17-1:0"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17-01:00"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:30"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17+07:30"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:03"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 0)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17+7:3"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance()
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-checkEvaluation(Cast(Literal("2015-03-18 12:03:17.123"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17.123"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("UTC"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 456)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17.456Z"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18 12:03:17.456Z"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT-01:00"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17.123-1:0"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17.123-01:00"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:30"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17.123+07:30"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-c = Calendar.getInstance(TimeZone.getTimeZone("GMT+07:03"))
-c.set(2015, 2, 18, 12, 3, 17)
-c.set(Calendar.MILLISECOND, 123)
-checkEvaluation(Cast(Literal("2015-03-18T12:03:17.123+7:3"), 
TimestampType),
-  new Timestamp(c.getTimeInMillis))
-
-checkEvaluation(Cast(Literal("2015-03-18 123142"), TimestampType), 
null)
-check

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97934309
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1056,7 +1057,8 @@ object DecimalAggregates extends Rule[LogicalPlan] {
   val newAggExpr = ae.copy(aggregateFunction = 
Average(UnscaledValue(e)))
   Cast(
 Divide(newAggExpr, Literal.create(math.pow(10.0, scale), 
DoubleType)),
-DecimalType(prec + 4, scale + 4))
+DecimalType(prec + 4, scale + 4),
+Option(conf.sessionLocalTimeZone))
--- End diff --

I'll remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97934297
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -490,7 +569,11 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
 
   private lazy val constFormat: UTF8String = 
right.eval().asInstanceOf[UTF8String]
   private lazy val formatter: SimpleDateFormat =
-Try(new SimpleDateFormat(constFormat.toString, 
Locale.US)).getOrElse(null)
+Try {
--- End diff --

I see, I'll replace `Try`s with try-catch and add a method to create 
`SimpleDateFormat` with a format string and a timezone to `DateTimeUtils`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16603
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72009/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16603
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r97934304
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1044,7 +1044,8 @@ object DecimalAggregates extends Rule[LogicalPlan] {
 we.copy(windowFunction = ae.copy(aggregateFunction = 
Average(UnscaledValue(e
   Cast(
 Divide(newAggExpr, Literal.create(math.pow(10.0, scale), 
DoubleType)),
-DecimalType(prec + 4, scale + 4))
+DecimalType(prec + 4, scale + 4),
+Option(conf.sessionLocalTimeZone))
--- End diff --

I'll remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16603
  
**[Test build #72009 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72009/testReport)**
 for PR 16603 at commit 
[`9f7c8ca`](https://github.com/apache/spark/commit/9f7c8ca8948b07f16327382975276078ea620cf8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16700
  
What happens if the partitioning columns have more than two columns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r97933750
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 ---
@@ -120,6 +120,17 @@ object ExternalCatalogUtils {
   new Path(totalPath, nextPartPath)
 }
   }
+
+  def getExtraPartPathCreatedByHive(
--- End diff --

A general suggestion. When the function names become not self-descriptive. 
Write the function comments. 

For example, here, please add comments for `generatePartitionPath` and 
`getExtraPartPathCreatedByHive`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r97933674
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 ---
@@ -120,6 +120,17 @@ object ExternalCatalogUtils {
   new Path(totalPath, nextPartPath)
 }
   }
+
+  def getExtraPartPathCreatedByHive(
+ lowerCaseSpec: TablePartitionSpec,
+ partitionColumnNames: Seq[String],
+ tablePath: Path): Path = {
--- End diff --

Please fix the indent issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16696
  
Overall looks good to me. : ) Could you add a few more test cases? 

- One is the child has less row counts than the limit. 
- Another is having zero row counts but `sizeInBytes` is not zero.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97933241
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala
 ---
@@ -18,12 +18,41 @@
 package org.apache.spark.sql.catalyst.statsEstimation
 
 import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference}
-import org.apache.spark.sql.catalyst.plans.logical.{ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference, Literal}
+import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.types.IntegerType
 
 
-class StatsConfSuite extends StatsEstimationTestBase {
+class StatsEstimationSuite extends StatsEstimationTestBase {
+  val (ar, colStat) = (attr("key"), ColumnStat(distinctCount = 10, min = 
Some(1), max = Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4))
+
+  val plan = StatsTestPlan(
+outputList = Seq(ar),
+attributeStats = AttributeMap(Seq(ar -> colStat)),
+rowCount = 10,
+size = Some(10 * (8 + 4)))
+
+  test("limit estimation") {
+val localLimit = LocalLimit(Literal(2), plan)
+val globalLimit = GlobalLimit(Literal(2), plan)
+// LocalLimit and GlobalLimit share the same stats estimation logic.
+val expected = Statistics(sizeInBytes = 24, rowCount = Some(2))
+checkStats(localLimit, expected)
+checkStats(globalLimit, expected)
+  }
+
+  test("sample estimation") {
+val sample = Sample(0.0, 0.5, withReplacement = false, (math.random * 
1000).toLong, plan)()
+checkStats(sample, expected = Statistics(sizeInBytes = 60, rowCount = 
Some(5)))
+
+// Test if Sample's child doesn't have rowCount in stats
+val stats2 = Statistics(sizeInBytes = 120)
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97933222
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala
 ---
@@ -18,12 +18,41 @@
 package org.apache.spark.sql.catalyst.statsEstimation
 
 import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference}
-import org.apache.spark.sql.catalyst.plans.logical.{ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference, Literal}
+import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.types.IntegerType
 
 
-class StatsConfSuite extends StatsEstimationTestBase {
+class StatsEstimationSuite extends StatsEstimationTestBase {
+  val (ar, colStat) = (attr("key"), ColumnStat(distinctCount = 10, min = 
Some(1), max = Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4))
+
+  val plan = StatsTestPlan(
+outputList = Seq(ar),
+attributeStats = AttributeMap(Seq(ar -> colStat)),
+rowCount = 10,
+size = Some(10 * (8 + 4)))
+
+  test("limit estimation") {
+val localLimit = LocalLimit(Literal(2), plan)
+val globalLimit = GlobalLimit(Literal(2), plan)
+// LocalLimit and GlobalLimit share the same stats estimation logic.
+val expected = Statistics(sizeInBytes = 24, rowCount = Some(2))
+checkStats(localLimit, expected)
+checkStats(globalLimit, expected)
+  }
+
+  test("sample estimation") {
+val sample = Sample(0.0, 0.5, withReplacement = false, (math.random * 
1000).toLong, plan)()
+checkStats(sample, expected = Statistics(sizeInBytes = 60, rowCount = 
Some(5)))
+
+// Test if Sample's child doesn't have rowCount in stats
+val stats2 = Statistics(sizeInBytes = 120)
+val plan2 = DummyLogicalPlan(stats2, stats2)
--- End diff --

rename `plan2` to `childPlan`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97933132
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala
 ---
@@ -48,6 +77,14 @@ class StatsConfSuite extends StatsEstimationTestBase {
 // Return the simple statistics
 assert(plan.stats(conf.copy(cboEnabled = false)) == 
expectedDefaultStats)
   }
+
+  /** Check estimated stats which is the same when cbo is turned on/off. */
+  private def checkStats(plan: LogicalPlan, expected: Statistics): Unit = {
--- End diff --

You know, this is a utility function. We can make it more general by having 
two expected stats values


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16707
  
**[Test build #72010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72010/testReport)**
 for PR 16707 at commit 
[`1b70561`](https://github.com/apache/spark/commit/1b705611925eefb30fbf5483ce1b976958192b01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/16707

[SPARK-19338][SQL] Add UDF names in explain

## What changes were proposed in this pull request?
This pr added a variable for a UDF name in `ScalaUDF`.
Then, if the variable filled, `DataFrame#explain` prints the name.

## How was this patch tested?
Added a test in `UDFSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-19338

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16707


commit 1b705611925eefb30fbf5483ce1b976958192b01
Author: Takeshi YAMAMURO 
Date:   2017-01-24T15:21:07Z

Add UDF names in explain




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97933053
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala
 ---
@@ -48,6 +77,14 @@ class StatsConfSuite extends StatsEstimationTestBase {
 // Return the simple statistics
 assert(plan.stats(conf.copy(cboEnabled = false)) == 
expectedDefaultStats)
--- End diff --

Could you replace the above three lines by `checkStats`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97932867
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala
 ---
@@ -18,12 +18,41 @@
 package org.apache.spark.sql.catalyst.statsEstimation
 
 import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference}
-import org.apache.spark.sql.catalyst.plans.logical.{ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
AttributeReference, Literal}
+import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.types.IntegerType
 
 
-class StatsConfSuite extends StatsEstimationTestBase {
+class StatsEstimationSuite extends StatsEstimationTestBase {
+  val (ar, colStat) = (attr("key"), ColumnStat(distinctCount = 10, min = 
Some(1), max = Some(10),
+nullCount = 0, avgLen = 4, maxLen = 4))
+
+  val plan = StatsTestPlan(
+outputList = Seq(ar),
+attributeStats = AttributeMap(Seq(ar -> colStat)),
+rowCount = 10,
+size = Some(10 * (8 + 4)))
--- End diff --

I still prefer to adding a comment above this line:
```
  // rowCount * (overhead + column size)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-01-25 Thread samkum

Github user samkum commented on the issue:

https://github.com/apache/spark/pull/16387
  
ok..I will get back to you in next couple of days.

-Sameer.

On Thu, Jan 26, 2017 at 3:56 AM, Marcelo Vanzin 
wrote:

> Nope, I didn't tested it in isolation.
>
> Could you do that? To make sure that it's really caused by this change?
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97932053
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -727,37 +728,18 @@ case class GlobalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryN
   }
   override def computeStats(conf: CatalystConf): Statistics = {
 val limit = limitExpr.eval().asInstanceOf[Int]
-val sizeInBytes = if (limit == 0) {
-  // sizeInBytes can't be zero, or sizeInBytes of BinaryNode will also 
be zero
-  // (product of children).
-  1
-} else {
-  (limit: Long) * output.map(a => a.dataType.defaultSize).sum
-}
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+val childStats = child.stats(conf)
+// Don't propagate column stats, because we don't know the 
distribution after a limit operation
+Statistics(
+  sizeInBytes = EstimationUtils.getOutputSize(output, limit, 
childStats.attributeStats),
--- End diff --

I think @wzhfy is just keeping the existing code logics. Sure, we can 
improve it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97932007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -727,37 +728,18 @@ case class GlobalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryN
   }
   override def computeStats(conf: CatalystConf): Statistics = {
 val limit = limitExpr.eval().asInstanceOf[Int]
--- End diff --

To make the stats more accurate, yes, we can use a smaller number between 
`childStats.rowCounts` and `limit` as `outputRowCount` of `getOutputSize`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97931646
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -791,12 +773,14 @@ case class Sample(
 
   override def computeStats(conf: CatalystConf): Statistics = {
 val ratio = upperBound - lowerBound
-// BigInt can't multiply with Double
-var sizeInBytes = child.stats(conf).sizeInBytes * (ratio * 100).toInt 
/ 100
+val childStats = child.stats(conf)
+var sizeInBytes = 
EstimationUtils.ceil(BigDecimal(childStats.sizeInBytes) * ratio)
 if (sizeInBytes == 0) {
   sizeInBytes = 1
 }
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+val sampledNumber = childStats.rowCount.map(c => 
EstimationUtils.ceil(BigDecimal(c) * ratio))
--- End diff --

`sampledNumber` -> `sampledRowCount`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97931570
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -29,6 +31,8 @@ object EstimationUtils {
   def rowCountsExist(conf: CatalystConf, plans: LogicalPlan*): Boolean =
 plans.forall(_.stats(conf).rowCount.isDefined)
 
+  def ceil(bigDecimal: BigDecimal): BigInt = bigDecimal.setScale(0, 
RoundingMode.CEILING).toBigInt()
--- End diff --

`ceil` -> `ceiling`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r97928605
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -727,37 +728,18 @@ case class GlobalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryN
   }
   override def computeStats(conf: CatalystConf): Statistics = {
 val limit = limitExpr.eval().asInstanceOf[Int]
-val sizeInBytes = if (limit == 0) {
-  // sizeInBytes can't be zero, or sizeInBytes of BinaryNode will also 
be zero
-  // (product of children).
-  1
-} else {
-  (limit: Long) * output.map(a => a.dataType.defaultSize).sum
-}
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+val childStats = child.stats(conf)
+// Don't propagate column stats, because we don't know the 
distribution after a limit operation
+Statistics(
+  sizeInBytes = EstimationUtils.getOutputSize(output, limit, 
childStats.attributeStats),
--- End diff --

Why don't we use `childStats.rowCount`? If `childStats.rowCount` is less 
than limit number, I think we should use it instead of limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-25 Thread titicaca

Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
I have modified the codes and tests, including the existed tests 
@test_sparkSQL.R#1280 and @test_sparkSQL.R#1282. 

Like in local R,  now NA column of the SparkDataFrame will also be 
collected as its corresponding type instead of logical NA.   




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16603
  
**[Test build #72009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72009/testReport)**
 for PR 16603 at commit 
[`9f7c8ca`](https://github.com/apache/spark/commit/9f7c8ca8948b07f16327382975276078ea620cf8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16603
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16603
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72006/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16603
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16603
  
**[Test build #72006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72006/testReport)**
 for PR 16603 at commit 
[`9f7c8ca`](https://github.com/apache/spark/commit/9f7c8ca8948b07f16327382975276078ea620cf8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15237: [SPARK-17663] [CORE] SchedulableBuilder should handle in...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15237
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72005/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15237: [SPARK-17663] [CORE] SchedulableBuilder should handle in...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15237
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15237: [SPARK-17663] [CORE] SchedulableBuilder should handle in...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15237
  
**[Test build #72005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72005/testReport)**
 for PR 15237 at commit 
[`a1b2924`](https://github.com/apache/spark/commit/a1b29247d1e94abc75aea21e9a6dc209d01d4ba5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16706: [SPARK-19365][Core]Optimize RequestMessage serial...

2017-01-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16706#discussion_r97921119
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -501,34 +498,105 @@ private[netty] class NettyRpcEndpointRef(
 out.defaultWriteObject()
   }
 
-  override def name: String = _name
+  override def name: String = endpointAddress.name
 
   override def ask[T: ClassTag](message: Any, timeout: RpcTimeout): 
Future[T] = {
-nettyEnv.ask(RequestMessage(nettyEnv.address, this, message), timeout)
+nettyEnv.ask(new RequestMessage(nettyEnv.address, this, message), 
timeout)
   }
 
   override def send(message: Any): Unit = {
 require(message != null, "Message is null")
-nettyEnv.send(RequestMessage(nettyEnv.address, this, message))
+nettyEnv.send(new RequestMessage(nettyEnv.address, this, message))
   }
 
-  override def toString: String = s"NettyRpcEndpointRef(${_address})"
-
-  def toURI: URI = new URI(_address.toString)
+  override def toString: String = 
s"NettyRpcEndpointRef(${endpointAddress})"
 
   final override def equals(that: Any): Boolean = that match {
-case other: NettyRpcEndpointRef => _address == other._address
+case other: NettyRpcEndpointRef => endpointAddress == 
other.endpointAddress
 case _ => false
   }
 
-  final override def hashCode(): Int = if (_address == null) 0 else 
_address.hashCode()
+  final override def hashCode(): Int =
+if (endpointAddress == null) 0 else endpointAddress.hashCode()
 }
 
 /**
  * The message that is sent from the sender to the receiver.
+ *
+ * @param senderAddress the sender address. It's `null` if this message is 
from a client
+ *  `NettyRpcEnv`.
+ * @param receiver the receiver of this message.
+ * @param content the message content.
  */
-private[netty] case class RequestMessage(
-senderAddress: RpcAddress, receiver: NettyRpcEndpointRef, content: Any)
+private[netty] class RequestMessage(
+@Nullable val senderAddress: RpcAddress,
+val receiver: NettyRpcEndpointRef, val content: Any) {
+
+  /** Manually serialize [[RequestMessage]] to minimize the size of bytes. 
*/
+  def serialize(nettyEnv: NettyRpcEnv): ByteBuffer = {
+val bos = new ByteBufferOutputStream()
+val out = new DataOutputStream(bos)
+try {
+  if (senderAddress == null) {
+out.writeBoolean(false)
+  } else {
+out.writeBoolean(true)
+out.writeUTF(senderAddress.host)
+out.writeInt(senderAddress.port)
+  }
+  val receiverAddress = receiver.endpointAddress
--- End diff --

Write `receiver.endpointAddress` rather than `NettyRpcEndpointRef` since we 
only need the address to recreate them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16706: [SPARK-19365][Core]Optimize RequestMessage serial...

2017-01-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16706#discussion_r97920967
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -480,16 +480,13 @@ private[rpc] class NettyRpcEnvFactory extends 
RpcEnvFactory with Logging {
  */
 private[netty] class NettyRpcEndpointRef(
 @transient private val conf: SparkConf,
-endpointAddress: RpcEndpointAddress,
-@transient @volatile private var nettyEnv: NettyRpcEnv)
-  extends RpcEndpointRef(conf) with Serializable with Logging {
+val endpointAddress: RpcEndpointAddress,
+@transient @volatile private var nettyEnv: NettyRpcEnv) extends 
RpcEndpointRef(conf) {
 
   @transient @volatile var client: TransportClient = _
 
-  private val _address = if (endpointAddress.rpcAddress != null) 
endpointAddress else null
--- End diff --

Removed `_address` and `_name` to save some bytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16706
  
**[Test build #72008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72008/testReport)**
 for PR 16706 at commit 
[`b373c10`](https://github.com/apache/spark/commit/b373c103d623c985e03e5fc6e81d86a2c829bb0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16706: [SPARK-19365][Core]Optimize RequestMessage serial...

2017-01-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16706#discussion_r97920768
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -501,34 +498,105 @@ private[netty] class NettyRpcEndpointRef(
 out.defaultWriteObject()
   }
 
-  override def name: String = _name
+  override def name: String = endpointAddress.name
 
   override def ask[T: ClassTag](message: Any, timeout: RpcTimeout): 
Future[T] = {
-nettyEnv.ask(RequestMessage(nettyEnv.address, this, message), timeout)
+nettyEnv.ask(new RequestMessage(nettyEnv.address, this, message), 
timeout)
   }
 
   override def send(message: Any): Unit = {
 require(message != null, "Message is null")
-nettyEnv.send(RequestMessage(nettyEnv.address, this, message))
+nettyEnv.send(new RequestMessage(nettyEnv.address, this, message))
   }
 
-  override def toString: String = s"NettyRpcEndpointRef(${_address})"
-
-  def toURI: URI = new URI(_address.toString)
+  override def toString: String = 
s"NettyRpcEndpointRef(${endpointAddress})"
 
   final override def equals(that: Any): Boolean = that match {
-case other: NettyRpcEndpointRef => _address == other._address
+case other: NettyRpcEndpointRef => endpointAddress == 
other.endpointAddress
 case _ => false
   }
 
-  final override def hashCode(): Int = if (_address == null) 0 else 
_address.hashCode()
+  final override def hashCode(): Int =
+if (endpointAddress == null) 0 else endpointAddress.hashCode()
 }
 
 /**
  * The message that is sent from the sender to the receiver.
+ *
+ * @param senderAddress the sender address. It's `null` if this message is 
from a client
+ *  `NettyRpcEnv`.
+ * @param receiver the receiver of this message.
+ * @param content the message content.
  */
-private[netty] case class RequestMessage(
-senderAddress: RpcAddress, receiver: NettyRpcEndpointRef, content: Any)
+private[netty] class RequestMessage(
--- End diff --

Removed `case` to make `RequestMessage` non-serializable to avoid using 
Java serialization occasionally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16706: [SPARK-19365][Core]Optimize RequestMessage serial...

2017-01-25 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16706

[SPARK-19365][Core]Optimize RequestMessage serialization

## What changes were proposed in this pull request?

Right now Netty PRC serializes `RequestMessage` using Java serialization, 
and the size of a single message (e.g., RequestMessage(..., "hello!")`) is 
about 1kb.

This PR optimizes it by serializing `RequestMessage` manually, and reduces 
the above message size to 100+ bytes.

## How was this patch tested?

Jenkins

I did a simple test to measure the improvement:

Before
```
$ bin/spark-shell --master local-cluster[1,4,1024]
...
scala> for (i <- 1 to 10) {
 |   val start = System.nanoTime
 |   val s = sc.parallelize(1 to 100, 10 * 1000).count()
 |   val end = System.nanoTime
 |   println(s"$i\t" + ((end - start)/1000/1000))
 | }
1   6830

2   4353

3   3322

4   3107

5   3235

6   3139

7   3156

8   3166

9   3091

10  3029
```
After:
```
$ bin/spark-shell --master local-cluster[1,4,1024]
...
scala> for (i <- 1 to 10) {
 |   val start = System.nanoTime
 |   val s = sc.parallelize(1 to 100, 10 * 1000).count()
 |   val end = System.nanoTime
 |   println(s"$i\t" + ((end - start)/1000/1000))
 | }
1   6431

2   3643

3   2913

4   2679

5   2760

6   2710

7   2747

8   2793

9   2679

10  2651  
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark rpc-opt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16706.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16706


commit b373c103d623c985e03e5fc6e81d86a2c829bb0f
Author: Shixiong Zhu 
Date:   2017-01-25T23:47:01Z

Optimize RequestMessage serialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHAR...

2017-01-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16213


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16213
  
great, haha! Many thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15237: [SPARK-17663] [CORE] SchedulableBuilder should handle in...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15237
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15237: [SPARK-17663] [CORE] SchedulableBuilder should handle in...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15237
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72004/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-25 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/16213
  
Thanks a lot! Merging to master! (May take a while, going to be my first 
merge!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15237: [SPARK-17663] [CORE] SchedulableBuilder should handle in...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15237
  
**[Test build #72004 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72004/testReport)**
 for PR 15237 at commit 
[`d5597c3`](https://github.com/apache/spark/commit/d5597c37125a43b79ead38972f2df5a057dcc2b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsR...

2017-01-25 Thread windpiger

Github user windpiger closed the pull request at:

https://github.com/apache/spark/pull/16545


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72007/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16213
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16213
  
**[Test build #72007 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72007/testReport)**
 for PR 16213 at commit 
[`17a5c3a`](https://github.com/apache/spark/commit/17a5c3a0b74414d2f65cfdf33d813599b1697804).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...

2017-01-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16213
  
Many thanks! Also, congrats, commiter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16702: [SPARK-18495][UI] Document meaning of green dot i...

2017-01-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16702


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16702: [SPARK-18495][UI] Document meaning of green dot in DAG v...

2017-01-25 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16702
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2017-01-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15396


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...

2017-01-25 Thread ilganeli

Github user ilganeli commented on the issue:

https://github.com/apache/spark/pull/16685
  
@gatorsmile I'll submit a PR with just the UPDATE functionality, how do you 
suggest proceeding on the UPSERT front? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 368 matches

Mail list logo