date:20170717

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127895248
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
+  // We can't push down non-deterministic conditions.
+  case ExtractEquiJoinKeys(_, leftKeys, rightKeys, conditions, _, 
_)
--- End diff --

Joining keys can only be equi-join. It is exactly the use case discussed in 
the dev mailling list. It's actually useful for the use cases.

A general non-deterministic join condition pushdown doesn't make a lot of 
sense. The kind of predicates like `rand(1) > 0 && rand(11) < 0` can be a 
serious concern. The join results can be different before and after pushdown.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127894772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

Even if for equi join, how about `rand(a) = rand(b)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127894313
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

IIUC, for joining keys, it actually satisfies what you said: It's evaluated 
in the same order and in the same number as we don't push it down.

I can't think an example it doesn't. So I may ask if you have an example 
for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127893995
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
+  // We can't push down non-deterministic conditions.
+  case ExtractEquiJoinKeys(_, leftKeys, rightKeys, conditions, _, 
_)
--- End diff --

Supporting only equi-join does not sound reasonable here. The join 
condition can be any predicate. 

How about adding a SQLConf flag for controlling it? We can simply pushing 
it down no matter whether its semantics are the same or not, for making it 
consistent with Hive. By default, turn that flag off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127893543
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

The major point here is the non-deterministic join condition push-down is 
safe only when the results are the exactly same before and after the push down. 
After we push it down, basically, it will be evaluated for each row of that 
side. Will it be evaluated in the same order and in the same number if we do 
not push it down? We can find many different scenarios to break it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-07-17 Thread yaooqinn

GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/18668

[SPARK-21451][SQL]get `spark.hadoop.*` properties from sysProps to hiveconf 



## What changes were proposed in this pull request?

get `spark.hadoop.*` properties from sysProps to hiveconf

## How was this patch tested?
UT

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark SPARK-21451

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18668


commit 89d9b86616196fde5d0b3a08fb284e6af6afe588
Author: Kent Yao 
Date:   2017-07-18T06:41:24Z

HiveConf in SparkSQLCLIDriver doesn't respect 
spark.hadoop.some.hive.variables




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127893174
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

We use `ExtractEquiJoinKeys` to extract joining keys. You can check it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-17 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18555
  
@gatorsmile 
Could you please review this code  again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-17 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18656
  
Will CodegenFallback be used in wholestage codegen? I think it's not 
supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12646
  
**[Test build #79697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79697/testReport)**
 for PR 12646 at commit 
[`9bb80ea`](https://github.com/apache/spark/commit/9bb80eaf8e0b4339850d8c48e221c8ad1e477552).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127892847
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

What is the join key? Any definition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18655
  
**[Test build #79696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79696/testReport)**
 for PR 18655 at commit 
[`8ffedda`](https://github.com/apache/spark/commit/8ffedda9f05d379d700aef95dca049a751374f87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127891910
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

For different joining type, I think the joining keys are used to find 
matching/not matching rows. Currently I can't think of the case we can't push 
down non-deterministic joining keys. Maybe you can also show an example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18656: [SPARK-21441]Incorrect Codegen in SortMergeJoinExec resu...

2017-07-17 Thread DonnyZone

Github user DonnyZone commented on the issue:

https://github.com/apache/spark/pull/18656
  
Hi, @cloud-fan, @vanzin , could you help to take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-17 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/18620
  
Hi @MLnick , @srowen .
My test showing: pq.poll is not significantly faster than 
pq.toArray.sortBy, but significantly faster than pq.toArray.sorted.  Seems not 
each pq.toArray.sorted (such as used in topByKey) can be replaced by 
pq.toArray.sortBy, so use pq.poll to replace pq.toArray.sorted will benefit.
You can compare the performance of pq.sorted, pq.sortBy, and pq.poll using: 
 https://github.com/apache/spark/pull/18624
The performance of pq.toArray.sortBy is about the same as pq.poll, and 
about 20% improvement comparing pq.toArray.sorted. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79695/testReport)**
 for PR 18654 at commit 
[`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127888746
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempPath { dir =>
--- End diff --

More clear :) No need to create source files in real.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Creat CachedBatchColumnVector to abst...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18468
  
`ArrowColumnVector` is also a wrapper for arrow vector, and it doesn't 
introduce vector type stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...

2017-07-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18468
  
@cloud-fan Thank you for your comments. Based on [this 
discussion](https://github.com/apache/spark/pull/18468#discussion_r125395003), 
I introduced `VectorType`.
I have just seen @ueshin 's `ArrowColumnVector` implementation. I will 
update `CachedBatchColumnVector` based on your comments and @ueshin 's 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18487#discussion_r127885748
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -277,11 +290,13 @@ final class ShuffleBlockFetcherIterator(
   } else if (size < 0) {
 throw new BlockException(blockId, "Negative block size " + 
size)
   }
-  if (curRequestSize >= targetRequestSize) {
+  if (curRequestSize >= targetRequestSize ||
+  curBlocks.size >= maxBlocksInFlightPerAddress) {
--- End diff --

We may have a lot of adjacent fetch requests in the queue, shall we shuffle 
the request queue before fetching?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79694/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)**
 for PR 18654 at commit 
[`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18649: [SPARK-21395][SQL] Spark SQL hive-thriftserver doesn't r...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18649
  
cc @jerryshao 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18468
  
I think this PR doesn't have a good abstraction of the problem. For table 
cache, our goal is not making the comressed data a `ColumnVector`, but having 
an efficient way to convert the compressed data(byte array) to `ColumnVector`. 
I think the most efficient way is to not do conversion at all, but having a 
wrapper, i.e. having a `class CachedBatchColumnVector(data: Array[Byte])`, 
which implements various `getXXX` methods by doing decompression. Then we don't 
need to introduce the `VectorType` concept and change `ColumnVector`.

@kiszk what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
@cloud-fan @jiangxb1987 
Thanks for help! I will refine and post the result of manual test late 
today :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-17 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r127882623
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

And this test case doesn't cover when CurrentRow is not in the window 
frame. We'd better add that senario.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18634
  
@jinxing64 I think this patch is straightforward, can you do a manual test, 
which OOM before and works after this PR? We can put the test in PR description 
so that other people can try it out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r127882430
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

BTW this test is not very related to this PR, just adds test coverage for 
range window frame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r127882358
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

wrap your test with `withTempView`, which can drop the view automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18649: [SPARK-21395][SQL] Spark SQL hive-thriftserver doesn't r...

2017-07-17 Thread debugger87

Github user debugger87 commented on the issue:

https://github.com/apache/spark/pull/18649
  
@cloud-fan Any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-17 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18655
  
Thank you for your comments.
I agree that we should split this into smaller PRs. I'll push another 
commit to remove `ArrowColumnVector` from this as soon as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...

2017-07-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18468
  
ping @ueshin @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18655
  
yea let's put `ArrowColumnVector` and its tests in a new PR and merge that 
first.

`ArrowWriter` will also be used for pandas UDF, see 
https://issues.apache.org/jira/browse/SPARK-21190 for more details, so it makes 
sense to move it to a separated file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18660
  
Also merged to branch-2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18660
  
thanks @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18667: Fix the simpleString used in error messages

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18667
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18667: Fix the simpleString used in error messages

2017-07-17 Thread fxbonnet

GitHub user fxbonnet opened a pull request:

https://github.com/apache/spark/pull/18667

Fix the simpleString used in error messages

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fxbonnet/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18667


commit e31555ba0b297054c504d3e2eaac20befb10738d
Author: Francois-Xavier Bonnet 
Date:   2017-07-18T04:19:17Z

Fix the simpleString used in error messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-07-17 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r127879502
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -792,6 +793,76 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 collectAndValidate(df, json, "binaryData.json")
   }
 
+  test("date type conversion") {
+val json =
+  s"""
+ |{
+ |  "schema" : {
+ |"fields" : [ {
+ |  "name" : "date",
+ |  "type" : {
+ |"name" : "date",
+ |"unit" : "DAY"
+ |  },
+ |  "nullable" : true,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 32
+ |} ]
+ |  }
+ |} ]
+ |  },
+ |  "batches" : [ {
+ |"count" : 4,
+ |"columns" : [ {
+ |  "name" : "date",
+ |  "count" : 4,
+ |  "VALIDITY" : [ 1, 1, 1, 1 ],
+ |  "DATA" : [ -1, 0, 16533, 16930 ]
+ |} ]
+ |  } ]
+ |}
+   """.stripMargin
+
+val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US)
+val d1 = new Date(-1)  // "1969-12-31 13:10:15.000 UTC"
+val d2 = new Date(0)  // "1970-01-01 13:10:15.000 UTC"
+val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime)
+val d4 = new Date(sdf.parse("2016-05-09 12:01:01.000 UTC").getTime)
+
+// Date is created unaware of timezone, but DateTimeUtils force 
defaultTimeZone()
+
assert(DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d2)).getTime == 
d2.getTime)
--- End diff --

We handle `DateType` value as days from `1970-01-01` internally.

When converting from/to `Date` to/from internal value, we assume the `Date` 
instance contains the timestamp of `00:00:00` time of the day in 
`TimeZone.getDefault()` timezone, which is the offset of the timezone. e.g. in 
JST (GMT+09:00):

```
scala> TimeZone.setDefault(TimeZone.getTimeZone("JST"))

scala> Date.valueOf("1970-01-01").getTime()
res6: Long = -3240
```

whereas in PST (GMT-08:00):

```
scala> TimeZone.setDefault(TimeZone.getTimeZone("PST"))

scala> Date.valueOf("1970-01-01").getTime()
res8: Long = 2880
```

We use `DateTimeUtils.defaultTimeZone()` to adjust the offset.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18660: [SPARK-21445] Make IntWrapper and LongWrapper in ...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18660


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18660
  
thanks, merging to master!

@brkyvz I think it's fine, this bug is very obvious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18660
  
I couldn't write an easy reproduction for the bug :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18583: [SPARK-21332][SQL] Incorrect result type inferred...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18583


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18583: [SPARK-21332][SQL] Incorrect result type inferred for so...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18583
  
Thanks! Merging to master/2.2/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127876852
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempPath { dir =>
--- End diff --

Could we maybe just do as below?

```scala
withTempPath { path =>
  spark.range(100).repartition(10).where("id = 50").write.parquet(path)
  val partFiles = path.listFiles()
.filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
  assert(partFiles.length === 2)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18662
  
Merged to master. Thanks for the quick reviews.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18662: [SPARK-21444] Be more defensive when removing bro...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18662


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127875986
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

I meant joining keys. I am not sure if `a = c && rand(b) < 0` is a joining 
key?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18666
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18666: [SPARK-21449][SQL][Hive]Close HiveClient's Sessio...

2017-07-17 Thread yaooqinn

GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/18666

[SPARK-21449][SQL][Hive]Close HiveClient's SessionState to delete residual 
dirs


## What changes were proposed in this pull request?

When sparkSession.stop() is called, close the hive client too.

## How was this patch tested?

manully

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark SPARK-21449

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18666


commit cac9fe7a627911079e55d5704fcf1b49228c5147
Author: Kent Yao 
Date:   2017-07-18T03:22:17Z

Hive client's SessionState was not closed properly in HiveExternalCatalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18663
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79692/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18663
  
**[Test build #79692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79692/testReport)**
 for PR 18663 at commit 
[`1496b78`](https://github.com/apache/spark/commit/1496b78d2bcd2003b23307f767c57c0dc2818e16).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-17 Thread facaiy

Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18305#discussion_r127874833
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
 ---
@@ -32,40 +34,45 @@ private[ml] trait DifferentiableRegularization[T] 
extends DiffFunction[T] {
 }
 
 /**
- * A Breeze diff function for computing the L2 regularized loss and 
gradient of an array of
+ * A Breeze diff function for computing the L2 regularized loss and 
gradient of a vector of
  * coefficients.
  *
  * @param regParam The magnitude of the regularization.
  * @param shouldApply A function (Int => Boolean) indicating whether a 
given index should have
  *regularization applied to it.
- * @param featuresStd Option indicating whether the regularization should 
be scaled by the standard
- *deviation of the features.
+ * @param applyFeaturesStd Option for a function which maps coefficient 
index (column major) to the
+ * feature standard deviation. If `None`, no 
standardization is applied.
  */
 private[ml] class L2Regularization(
-val regParam: Double,
+override val regParam: Double,
 shouldApply: Int => Boolean,
-featuresStd: Option[Array[Double]]) extends 
DifferentiableRegularization[Array[Double]] {
+applyFeaturesStd: Option[Int => Double]) extends 
DifferentiableRegularization[Vector] {
 
-  override def calculate(coefficients: Array[Double]): (Double, 
Array[Double]) = {
-var sum = 0.0
-val gradient = new Array[Double](coefficients.length)
-coefficients.indices.filter(shouldApply).foreach { j =>
-  val coef = coefficients(j)
-  featuresStd match {
-case Some(stds) =>
-  val std = stds(j)
-  if (std != 0.0) {
-val temp = coef / (std * std)
-sum += coef * temp
-gradient(j) = regParam * temp
-  } else {
-0.0
+  override def calculate(coefficients: Vector): (Double, Vector) = {
+coefficients match {
+  case dv: DenseVector =>
+var sum = 0.0
+val gradient = new Array[Double](dv.size)
+dv.values.indices.filter(shouldApply).foreach { j =>
+  val coef = coefficients(j)
+  applyFeaturesStd match {
+case Some(getStd) =>
+  val std = getStd(j)
+  if (std != 0.0) {
+val temp = coef / (std * std)
+sum += coef * temp
+gradient(j) = regParam * temp
+  } else {
+0.0
+  }
+case None =>
+  sum += coef * coef
+  gradient(j) = coef * regParam
--- End diff --

Trivial, to match `regParam * temp` above, how about using `regParam * 
coef`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18665: [SPARK-21446] [SQL] Fix setAutoCommit never executed

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18665
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18665: [SPARK-21446] [SQL] Fix setAutoCommit never execu...

2017-07-17 Thread DFFuture

GitHub user DFFuture opened a pull request:

https://github.com/apache/spark/pull/18665

[SPARK-21446] [SQL] Fix setAutoCommit never executed

## What changes were proposed in this pull request?
JIRA Issue: https://issues.apache.org/jira/browse/SPARK-21446
options.asConnectionProperties can not have fetchsizeï¼because fetchsize 
belongs to Spark-only options, and Spark-only options have been excluded in 
connection properities.
So change properties of beforeFetch from  
options.asConnectionProperties.asScala.toMap to 
options.asProperties.asScala.toMap

## How was this patch tested?



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DFFuture/spark sparksql_pg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18665


commit 9ba431a838a16a8371b3d3f6ef028158576f85d2
Author: DFFuture 
Date:   2017-07-18T00:36:06Z

asConnectionProperties can not have fetchsize, change it to asProperties




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127874260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

The join type also matters. For example, are we able to push it to the left 
side for the right outer join?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127874213
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

`a = c && rand(3) * b < 0 ` Are we able to push down the second one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18662
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79691/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18662
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18662
  
**[Test build #79691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79691/testReport)**
 for PR 18662 at commit 
[`a5ebcac`](https://github.com/apache/spark/commit/a5ebcac4ceb14eb8342ce085965b370186b4aba9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-17 Thread facaiy

Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18305#discussion_r127873828
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -598,8 +598,23 @@ class LogisticRegression @Since("1.2.0") (
 val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam)
 
 val bcFeaturesStd = instances.context.broadcast(featuresStd)
-val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
-  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
isMultinomial,
+val getAggregatorFunc = new LogisticAggregator(bcFeaturesStd, 
numClasses, $(fitIntercept),
+  multinomial = isMultinomial)(_)
+val getFeaturesStd = (j: Int) => if (j >= 0 && j < 
numCoefficientSets * numFeatures) {
+  featuresStd(j / numCoefficientSets)
+} else {
+  0.0
+}
+
+val regularization = if (regParamL2 != 0.0) {
+  val shouldApply = (idx: Int) => idx >= 0 && idx < numFeatures * 
numCoefficientSets
--- End diff --

It seems that the `regularization` contains `intercept`, right?

However, the comment in [LogisticRegression.scala: 
1903L](https://github.com/apache/spark/pull/18305/files#diff-3734f1689cb8a80b07974eb93de0795dL1903)
 is:
> // We do not apply regularization to the intercepts



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)**
 for PR 18654 at commit 
[`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127872988
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
+val allFiles = dst_dir.listFiles(new FilenameFilter {
+  override def accept(dir: File, name: String): Boolean = {
+!name.startsWith(".") && !name.startsWith("_")
+  }
+})
+// First partition file and the data file
--- End diff --

Can't agree more,  firstly I try to implement like this but the 
`FileFormatWriter.write` can only see the iterator of each task self.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18660
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79689/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18660
  
**[Test build #79689 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79689/testReport)**
 for PR 18660 at commit 
[`d220290`](https://github.com/apache/spark/commit/d2202903518b3dfa0f4a719a0b9cb5431088ed66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  public static class LongWrapper implements Serializable `
  * `  public static class IntWrapper implements Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18664
  
**[Test build #79693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79693/testReport)**
 for PR 18664 at commit 
[`69e1e21`](https://github.com/apache/spark/commit/69e1e21bf4bebc7bea6bd9322e4300df71a90b18).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18664
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79693/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18661: [SPARK-21409][SS] Follow up PR to allow different...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/18661
  
Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127869754
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
--- End diff --

I mean..  for example, if we happen to have a single partition in the `df` 
in any event, I guess this test will become invalid ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79690/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18661
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18661
  
**[Test build #79690 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79690/testReport)**
 for PR 18661 at commit 
[`351c207`](https://github.com/apache/spark/commit/351c20704e5ba2577bd18a5a9dd2f577141c453a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait StateStoreCustomMetric `
  * `case class StateStoreCustomSizeMetric(name: String, desc: String) 
extends StateStoreCustomMetric`
  * `case class StateStoreCustomTimingMetric(name: String, desc: String) 
extends StateStoreCustomMetric`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79687/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127868378
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
--- End diff --

I was thinking just in order to make sure the (previous) number of files 
written out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79687/testReport)**
 for PR 18654 at commit 
[`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18631: [SPARK-21410][CORE] Create less partitions for Ra...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18631: [SPARK-21410][CORE] Create less partitions for RangePart...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18631
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867549
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
+val allFiles = dst_dir.listFiles(new FilenameFilter {
+  override def accept(dir: File, name: String): Boolean = {
+!name.startsWith(".") && !name.startsWith("_")
+  }
+})
+// First partition file and the data file
--- End diff --

Ideally we only need the first partition file if all other partitions are 
empty, but this is hard to do right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867486
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
--- End diff --

why we need repartition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867380
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
--- End diff --

+1 for the shorter one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867341
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867290
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
 committer.setupTask(taskAttemptContext)
 
 val writeTask =
-  if (description.partitionColumns.isEmpty && 
description.bucketIdExpression.isEmpty) {
+  if (sparkPartitionId != 0 && !iterator.hasNext) {
--- End diff --

cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867254
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
 committer.setupTask(taskAttemptContext)
 
 val writeTask =
-  if (description.partitionColumns.isEmpty && 
description.bucketIdExpression.isEmpty) {
+  if (sparkPartitionId != 0 && !iterator.hasNext) {
--- End diff --

This is a little hacky but is the simplest fix I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18632#discussion_r127866899
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int 
numFields) {
 this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
 this.fixedSize = nullBitsSize + 8 * numFields;
 this.startingOffset = holder.cursor;
+holder.reset();
--- End diff --

I not very sure about this, but what if this writer is for inner struct? 
Then the buffer holder is shared between many writers and we should only reset 
once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79684/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18633
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18632
  
OK to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18633
  
**[Test build #79684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79684/testReport)**
 for PR 18633 at commit 
[`95988c1`](https://github.com/apache/spark/commit/95988c112905018d20c6d78a2ab688164735ede6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r127866465
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java ---
@@ -121,4 +122,29 @@ public void udf6Test() {
 Row result = spark.sql("SELECT returnOne()").head();
 Assert.assertEquals(1, result.getInt(0));
   }
+
+  public static class randUDFTest implements UDF1 {
--- End diff --

`RandUDFTest`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r127866406
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -103,4 +110,19 @@ case class UserDefinedFunction protected[sql] (
   udf
 }
   }
+
+  /**
+   * Updates UserDefinedFunction to non-deterministic.
+   *
+   * @since 2.3.0
+   */
+  def nonDeterministic(): UserDefinedFunction = {
--- End diff --

not a big deal, let's keep it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r127866355
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala ---
@@ -69,7 +69,7 @@ class SQLContextSuite extends SparkFunSuite with 
SharedSparkContext {
 
 // UDF should not be shared
 def myadd(a: Int, b: Int): Int = a + b
-session1.udf.register[Int, Int, Int]("myadd", myadd)
+session1.udf.register[Int, Int, Int]("myadd", myadd _)
--- End diff --

this sounds like a source code compatibility issue, can we look into it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-17 Thread gczsjdy

Github user gczsjdy commented on the issue:

https://github.com/apache/spark/pull/18632
  
@cloud-fan @viirya @gatorsmile Could you please help me review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127865091
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
--- End diff --

OK, I'll remove this assert and leave a note.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/18654
  
Yep, empty result dir need this meta, otherwise will throw the exception:
```
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. 
It must be specified manually.;
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188)
  at scala.Option.getOrElse(Option.scala:121)
  at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:187)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:571)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:555)
  ... 48 elided
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 392 matches

Mail list logo