date:20170208

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-02-08 Thread jinxing64

GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/16867

[SPARK-16929] Improve performance when check speculatable tasks.

## What changes were proposed in this pull request?

When check speculatable tasks in `TaskSetManager`, current code scan all 
task infos and sort durations of successful tasks in O(NlogN) time complexity. 
Since during the checkin
g process, `TaskSchedulerImpl`'s synchronized lock is acquired, so it might 
cause performance degradation when check a large scale task set, say hundreds 
of thousands.

This change uses a `TreeSet` to cache the successful task infos and compare 
the median duration with running tasks, avoiding scanning all task infos.
## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-16929

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16867


commit 1169d118662a9bfdabe88238352fe834a28aee14
Author: jinxing 
Date:   2017-02-07T02:35:10Z

[SPARK-16929] Improve performance when check speculatable tasks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-08 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16809
  
I just found refresh table related to table insertion when 
`DataFrameWriter.saveAsTable` with overwrite mode, and `InsetIntoHiveTable`. 
`InsertHadoopFsRelation` need to refresh table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...

2017-02-08 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/16862
  
@liancheng FYI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16859: [SPARK-17714][Core][test-maven][test-hadoop2.6]Avoid usi...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16859
  
**[Test build #3570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3570/testReport)**
 for PR 16859 at commit 
[`1c88474`](https://github.com/apache/spark/commit/1c8847494c29d4b51182ecfeebb5cc85e000e7a1).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class TransportChannelHandler extends 
ChannelInboundHandlerAdapter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16776
  
**[Test build #72634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72634/testReport)**
 for PR 16776 at commit 
[`4db82b4`](https://github.com/apache/spark/commit/4db82b45ce061a131ece96f1ca554bc9e5423d46).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16866: [SPARK-19529] TransportClientFactory.createClient() shou...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16866
  
**[Test build #72633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72633/testReport)**
 for PR 16866 at commit 
[`c1c4553`](https://github.com/apache/spark/commit/c1c4553e32826453ed39eaaefd1cd92ef0e36382).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread watermen

Github user watermen commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya We'd better don't modify the API and in `TaskMetrics` already has 
`resultSize`, we can add `resultNum` like it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72630/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16715
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16715
  
**[Test build #72630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72630/testReport)**
 for PR 16715 at commit 
[`1b70b91`](https://github.com/apache/spark/commit/1b70b919edea26321f21220f11d520d4f4f98ede).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16866: [SPARK-19529] TransportClientFactory.createClient...

2017-02-08 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/16866

[SPARK-19529] TransportClientFactory.createClient() shouldn't call 
awaitUninterruptibly()

## What changes were proposed in this pull request?

This patch replaces a single `awaitUninterruptibly()` call with a plain 
`await()` call in Spark's common network layer in order to fix a bug which may 
cause tasks to be uncancellable.

In Spark's Netty RPC layer, `TransportClientFactory.createClient()` calls 
`awaitUninterruptibly()` on a Netty future while waiting for a connection to be 
established. This creates problem when a Spark task is interrupted while 
blocking in this call (which can happen in the event of a slow connection which 
will eventually time out). This has bad impacts on task cancellation when 
`interruptOnCancel = true`.

As an example of the impact of this problem, I experienced significant 
numbers of uncancellable "zombie tasks" on a production cluster where several 
tasks were blocked trying to connect to a dead shuffle server and then 
continued running as zombies after I cancelled the associated Spark stage. The 
zombie tasks ran for several minutes with the following stack:

```
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:460)
io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:607) 

io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:301)
 

org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:224)
 

org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
 => holding Monitor(java.lang.Object@1849476028}) 

org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105)
 

org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
 

org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
 

org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:114)
 

org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:169)
 

org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:
350) 

org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:286)
 

org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:120)
 

org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
 

org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:169) 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
[...]
```

As far as I can tell, `awaitUninterruptibly()` might have been used in 
order to avoid having to declare that methods throw `InterruptedException` 
(this code is written in Java, hence the need to use checked exceptions). This 
patch simply replaces this with a regular, interruptible `await()` call,.

This required several interface changes to declare a new checked exception 
(these are internal interfaces, though, and this change doesn't significantly 
impact binary compatibility).

An alternative approach would be to wrap `InterruptedException` into 
`IOException` in order to avoid having to change interfaces. The problem with 
this approach is that the `network-shuffle` project's `RetryingBlockFetcher` 
code treats `IOExceptions` as transitive failures when deciding whether to 
retry fetches, so throwing a wrapped `IOException` might cause an interrupted 
shuffle fetch to be retried, further prolonging the lifetime of a cancelled 
zombie task.

## How was this patch tested?

Manually.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-19529

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16866


commit c1c4553e32826453ed39eaaefd1cd92ef0e36382
Author: Josh Rosen 
Date:   2017-02-09T07:25:29Z

Use await() instead of awaitUninterruptibly()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To

[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...

2017-02-08 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16865#discussion_r100245982
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1004,7 +1016,8 @@ object CodeGenerator extends Logging {
* weak keys/values and thus does not respond to memory pressure.
*/
   private val cache = CacheBuilder.newBuilder()
-.maximumSize(100)
+.maximumWeight(10 * 1024 * 1024)
--- End diff --

Not sure if this is a proper number.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16865
  
cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16865
  
**[Test build #72632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72632/testReport)**
 for PR 16865 at commit 
[`e6e2a8d`](https://github.com/apache/spark/commit/e6e2a8dd95512047346b939fc305dfaaef67f592).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...

2017-02-08 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16865

[SPARK-19530][SQL] Use guava weigher for code cache eviction

## What changes were proposed in this pull request?

We use guava cache to cache compiled codes for codegen. Currently we use 
number of entries (100 as maximum now) in the cache to determine when to evict 
older entries.

However, the number of entries can't respond well to actually memory usage 
of cache entries. As we heavily rely codegen now and the generated codes can be 
large, we shouldn't use maximum of entries.

This patch turns to use `Weigher` in guava. We use the size of bytecode as 
the weight of an entry.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 use-weight-for-code-cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16865


commit e6e2a8dd95512047346b939fc305dfaaef67f592
Author: Liang-Chi Hsieh 
Date:   2017-02-09T07:18:23Z

Use weight for code cache.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...

2017-02-08 Thread vitillo

Github user vitillo commented on the issue:

https://github.com/apache/spark/pull/16857
  
@zsxwing Since I can't access the build results you could please tell me 
why the patch fails to build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16785: [SPARK-19443][SQL] The function to generate constraints ...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16785
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-02-08 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16630
  
Could somebody help review this PR? I think this will make gathering the 
estimation results in Scala much easier. This will also be helpful in 
constructing the tests. For example, the GLM tests with weights can be 
simplified a lot if we have all results in arrays and SEs etc are aligned with 
coefficients (current GLM tests with weight force no intercept to avoid this 
nuisance).

@sethah @imatiach-msft @felixcheung  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16785: [SPARK-19443][SQL] The function to generate constraints ...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16785
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72625/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16785
  
**[Test build #72625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72625/testReport)**
 for PR 16785 at commit 
[`8c98a5c`](https://github.com/apache/spark/commit/8c98a5c3ab1477408988c8cb682733e65dd554fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16750
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16750
  
**[Test build #72626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72626/testReport)**
 for PR 16750 at commit 
[`ffc4912`](https://github.com/apache/spark/commit/ffc4912e17cc900fc9d7ceefd0f66461109728e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100241493
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -776,20 +778,21 @@ private[hive] class HiveClientImpl(
 client.dropDatabase(db, true, false, true)
   }
   }
+}
 
+private[hive] object HiveClientImpl {
+  private lazy val shimForHiveExecution = IsolatedClientLoader.hiveVersion(
--- End diff --

let me remove it , thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72631/testReport)**
 for PR 16787 at commit 
[`99d5bb2`](https://github.com/apache/spark/commit/99d5bb20a3f98220e8370c94b3620e9b2c6c61f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16787
  
thanks! @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72627/testReport)**
 for PR 16787 at commit 
[`b20d14f`](https://github.com/apache/spark/commit/b20d14fb6e70aaf6c4e09c644dd8ec6b8b5569dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16638
  
OK. I'll try it immediately. Thank U very much! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16674: [SPARK-19331][SQL][TESTS] Improve the test covera...

2017-02-08 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16674#discussion_r100238713
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala
 ---
@@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.spark.sql.{AnalysisException, Row, SaveMode, 
SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable, CatalogTableType}
+import org.apache.spark.sql.execution.SQLViewSuite
+import org.apache.spark.sql.hive.test.{TestHive, TestHiveSingleton}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * A test suite for Hive view related functionality.
+ */
+class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton {
+  protected override val spark: SparkSession = TestHive.sparkSession
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+// Create a simple table with two columns: id and id1
+spark.range(1, 10).selectExpr("id", "id 
id1").write.format("json").saveAsTable("jt")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.sql(s"DROP TABLE IF EXISTS jt")
+} finally {
+  super.afterAll()
+}
+  }
+
+  import testImplicits._
+
+  test("create a permanent/temp view using a hive, built-in, and permanent 
user function") {
+val permanentFuncName = "myUpper"
+val permanentFuncClass =
+  
classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName
+val builtInFuncNameInLowerCase = "abs"
+val builtInFuncNameInMixedCase = "aBs"
+val hiveFuncName = "histogram_numeric"
+
+withUserDefinedFunction(permanentFuncName -> false) {
+  sql(s"CREATE FUNCTION $permanentFuncName AS '$permanentFuncClass'")
+  withTable("tab1") {
+(1 to 10).map(i => (s"$i", i)).toDF("str", 
"id").write.saveAsTable("tab1")
+Seq("VIEW", "TEMPORARY VIEW").foreach { viewMode =>
+  withView("view1") {
+sql(
+  s"""
+ |CREATE $viewMode view1
+ |AS SELECT
+ |$permanentFuncName(str),
+ |$builtInFuncNameInLowerCase(id),
+ |$builtInFuncNameInMixedCase(id) as aBs,
+ |$hiveFuncName(id, 5) over()
+ |FROM tab1
+   """.stripMargin)
+checkAnswer(sql("select count(*) FROM view1"), Row(10))
+  }
+}
+  }
+}
+  }
+
+  test("create a permanent/temp view using a temporary function") {
+val tempFunctionName = "temp"
+val functionClass =
+  
classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName
+withUserDefinedFunction(tempFunctionName -> true) {
+  sql(s"CREATE TEMPORARY FUNCTION $tempFunctionName AS 
'$functionClass'")
+  withView("view1", "tempView1") {
+withTable("tab1") {
+  (1 to 10).map(i => s"$i").toDF("id").write.saveAsTable("tab1")
+
+  // temporary view
+  sql(s"CREATE TEMPORARY VIEW tempView1 AS SELECT 
$tempFunctionName(id) from tab1")
+  checkAnswer(sql("select count(*) FROM tempView1"), Row(10))
+
+  // permanent view
+  val e = intercept[AnalysisException] {
+sql(s"CREATE VIEW view1 AS SELECT $tempFunctionName(id) from 
tab1")
+  }.getMessage
+  assert(e.contains("Not allowed to create a permanent view 
`view1` by referencing " +
+s"a temporary function `$tempFunctionName`"))
+}
+  }
+}
+  }
+
+  test("create hive view for json table") {
+// json table is not hive-compatible, make sure the new flag fix it.
+withView("testView") {
+

[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16854
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16809
  
where do we refresh table for table insertion? will we fresh twice(table 
and path)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16854: [SPARK-15463][SQL] Add an API to load DataFrame from Dat...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16854
  
**[Test build #72623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72623/testReport)**
 for PR 16854 at commit 
[`a7e8c2b`](https://github.com/apache/spark/commit/a7e8c2bfaf98c27885907caa21cce7e93d4afd1b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnivocityParser(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16672: [SPARK-19329][SQL]insert data to a not exist location da...

2017-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16672
  
ping @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
You might be able to make it by forcefully pushing the new changes by `git 
push -f origin NEW_BRANCH:REMOTE_BRANCH `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16715
  
**[Test build #72630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72630/testReport)**
 for PR 16715 at commit 
[`1b70b91`](https://github.com/apache/spark/commit/1b70b919edea26321f21220f11d520d4f4f98ede).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
No worry, open/submit a new PR. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16787
  
Late in the east coast. Will review it tomorrow. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16638
  
Oh, I See, I miss a step  âgit remote add upstream ...â.
But now, I have delete my repository in my profile. So this PR canât know 
which repository  should be associated. So, do u have a method to help me cover 
this problem? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16715
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72628/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16715
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16715
  
**[Test build #72628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72628/testReport)**
 for PR 16715 at commit 
[`b45ec0a`](https://github.com/apache/spark/commit/b45ec0ab118545383526ffa80fa873a4ccc33307).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
You do not need to do the step 1 every time. You might miss the following 
two steps when you want to resolve your conflicts.
> git fetch upstream
> git merge upstream/master




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16664#discussion_r100236493
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+val destination = source match {
+  case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
+  case _ => extraOptions.get("path")
--- End diff --

Actually all the "magic keys" in the options used by `DataFrameWriter` are 
public APIs, they are not going to change and users need to know about them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16638
  
Here's how I create a PR:
1.fork the master of Apache;
2.create a new branch in my master branch
3.select my new branch menu and create a new PR.
4.edit my new branch code.
5.commit and push.
Can u point lost or mistake steps for me, Thank u for your guidancesï¼ 
@gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16664#discussion_r100236345
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+val destination = source match {
+  case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
+  case _ => extraOptions.get("path")
--- End diff --

> e.g. calling the save method adds a "path" key to the option map, but is 
that key name a public API?

yes, it is. e.g. `df.write.format("parquet").option("path", 
some_path).save()`, the `path` is a "magic key" and we've exposed it to users, 
so `path` is a public API and if we change it, we will break existing 
applications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100235734
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -815,7 +819,20 @@ private[hive] class HiveClientImpl(
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
-  private def toHiveTable(table: CatalogTable): HiveTable = {
+  private def toInputFormat(name: String) =
+Utils.classForName(name).asInstanceOf[Class[_ <: 
org.apache.hadoop.mapred.InputFormat[_, _]]]
+
+  private def toOutputFormat(name: String) =
+Utils.classForName(name)
+  .asInstanceOf[Class[_ <: 
org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]]
+
+  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table.
--- End diff --

style:
```
/**
 * doc
 */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100235680
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -776,20 +778,21 @@ private[hive] class HiveClientImpl(
 client.dropDatabase(db, true, false, true)
   }
   }
+}
 
+private[hive] object HiveClientImpl {
+  private lazy val shimForHiveExecution = IsolatedClientLoader.hiveVersion(
--- End diff --

is this still needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
You might not be familiar with the Github/Git. How about submitting a new 
PR? : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-08 Thread tdas

Github user tdas closed the pull request at:

https://github.com/apache/spark/pull/16850


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16638
  
My master branch with the master of Apache is not synchronized, and then I 
did the pull operation, my master branch still not synchronized, and finally I 
removed my remote repository.
But I do not know how to associate a new branch with this PR? I Think I 
made a misopreation. @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/16736
  
@gatorsmile @cloud-fan thank you for the time and efforts you've put in 
reviewing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16674: [SPARK-19331][SQL][TESTS] Improve the test covera...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16674#discussion_r100235360
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala
 ---
@@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.spark.sql.{AnalysisException, Row, SaveMode, 
SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable, CatalogTableType}
+import org.apache.spark.sql.execution.SQLViewSuite
+import org.apache.spark.sql.hive.test.{TestHive, TestHiveSingleton}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * A test suite for Hive view related functionality.
+ */
+class HiveSQLViewSuite extends SQLViewSuite with TestHiveSingleton {
+  protected override val spark: SparkSession = TestHive.sparkSession
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+// Create a simple table with two columns: id and id1
+spark.range(1, 10).selectExpr("id", "id 
id1").write.format("json").saveAsTable("jt")
+  }
+
+  override def afterAll(): Unit = {
+try {
+  spark.sql(s"DROP TABLE IF EXISTS jt")
+} finally {
+  super.afterAll()
+}
+  }
+
+  import testImplicits._
+
+  test("create a permanent/temp view using a hive, built-in, and permanent 
user function") {
+val permanentFuncName = "myUpper"
+val permanentFuncClass =
+  
classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName
+val builtInFuncNameInLowerCase = "abs"
+val builtInFuncNameInMixedCase = "aBs"
+val hiveFuncName = "histogram_numeric"
+
+withUserDefinedFunction(permanentFuncName -> false) {
+  sql(s"CREATE FUNCTION $permanentFuncName AS '$permanentFuncClass'")
+  withTable("tab1") {
+(1 to 10).map(i => (s"$i", i)).toDF("str", 
"id").write.saveAsTable("tab1")
+Seq("VIEW", "TEMPORARY VIEW").foreach { viewMode =>
+  withView("view1") {
+sql(
+  s"""
+ |CREATE $viewMode view1
+ |AS SELECT
+ |$permanentFuncName(str),
+ |$builtInFuncNameInLowerCase(id),
+ |$builtInFuncNameInMixedCase(id) as aBs,
+ |$hiveFuncName(id, 5) over()
+ |FROM tab1
+   """.stripMargin)
+checkAnswer(sql("select count(*) FROM view1"), Row(10))
+  }
+}
+  }
+}
+  }
+
+  test("create a permanent/temp view using a temporary function") {
+val tempFunctionName = "temp"
+val functionClass =
+  
classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName
+withUserDefinedFunction(tempFunctionName -> true) {
+  sql(s"CREATE TEMPORARY FUNCTION $tempFunctionName AS 
'$functionClass'")
+  withView("view1", "tempView1") {
+withTable("tab1") {
+  (1 to 10).map(i => s"$i").toDF("id").write.saveAsTable("tab1")
+
+  // temporary view
+  sql(s"CREATE TEMPORARY VIEW tempView1 AS SELECT 
$tempFunctionName(id) from tab1")
+  checkAnswer(sql("select count(*) FROM tempView1"), Row(10))
+
+  // permanent view
+  val e = intercept[AnalysisException] {
+sql(s"CREATE VIEW view1 AS SELECT $tempFunctionName(id) from 
tab1")
+  }.getMessage
+  assert(e.contains("Not allowed to create a permanent view 
`view1` by referencing " +
+s"a temporary function `$tempFunctionName`"))
+}
+  }
+}
+  }
+
+  test("create hive view for json table") {
+// json table is not hive-compatible, make sure the new flag fix it.
+withView("testView") {
+

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-08 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16795
  
At least, `spark-master-test-maven-hadoop-2.6` goes green.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...

2017-02-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16736


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16736
  
LGTM

Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
Your master is clean (i.e., exactly identical to the upstream/master), 
right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16837: [SPARK-19359][SQL] renaming partition should not ...

2017-02-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16837


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16803: [SPARK-19458][BUILD]load hive jars from local repo which...

2017-02-08 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16803
  
@dongjoon-hyun @srowen could you help to review this? thanks very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16837
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16775
  
ping @mengxr @jkbradley @liancheng @MLnick May you take a look at this? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16803: [SPARK-19458][BUILD]load hive jars from local repo which...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16803
  
**[Test build #72629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72629/testReport)**
 for PR 16803 at commit 
[`51b8f5e`](https://github.com/apache/spark/commit/51b8f5e4f75fcba524df8240c2384ff204fe93cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/16760
  
Many thanks @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16760: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-02-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16760


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16803: [SPARK-19458][BUILD]load hive jars from local repo which...

2017-02-08 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16803
  
if we not set ivy.jars.repos , it will use default ${user.home}/.m2 repo, 
and if we set ivy.jars.path which has download, it will can alos load from this 
path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16760
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...

2017-02-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16785
  
since this change is related to SQL, cc @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...

2017-02-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16785
  
I don't find a way to improve `getAliasedConstraints` significantly by 
re-writing its logic. The current way to improve its performance is using 
parallel collection to do the transformation in parallel. It can cut the 
running time by half (see benchmark in the pr description), but the running 
time (13.5 secs) is still too long compared with 1.6.

We may consider #16775 which is an another solution to fix this issue by 
checkpointing datasets for pipelines of long stages, or both of them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16715
  
**[Test build #72628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72628/testReport)**
 for PR 16715 at commit 
[`b45ec0a`](https://github.com/apache/spark/commit/b45ec0ab118545383526ffa80fa873a4ccc33307).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/16715
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-08 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r100232773
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala
 ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions.{Expression, 
GetStructField}
+import org.apache.spark.sql.types.StructField
+
+/**
+ * A Scala extractor that extracts the child expression and struct field 
from a [[GetStructField]].
+ * This is in contrast to the [[GetStructField]] case class extractor 
which returns the field
+ * ordinal instead of the field itself.
+ */
+private[planning] object GetStructField2 {
--- End diff --

`GetStructFieldObject` or `GetStructFieldExtractor`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72616/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72616/testReport)**
 for PR 16787 at commit 
[`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16664#discussion_r100232628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+val destination = source match {
+  case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
+  case _ => extraOptions.get("path")
--- End diff --

In Spark SQL, for metadata-like info, we store it as a key-value map. For 
example, 
[MetadataBuilder](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala)
 is used for this purpose. So far, the solution proposed in this PR is not good 
to me. I do not think it is a good design. 

Even if we add a structured type, this could be possibly changed in the 
future. If you want to introduce an external public interface (like our data 
source APIs), we need a careful design. This should be done in a separate PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@watermen Thanks for the review. What is the advantage of adding it in 
`TaskMetrics`  instead of `MapStatus`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions be...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72627/testReport)**
 for PR 16787 at commit 
[`b20d14f`](https://github.com/apache/spark/commit/b20d14fb6e70aaf6c4e09c644dd8ec6b8b5569dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16386
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72620/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16386
  
**[Test build #72620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72620/testReport)**
 for PR 16386 at commit 
[`f71a465`](https://github.com/apache/spark/commit/f71a465cf07fb9c043b2ccd86fa57e8e8ea9dc00).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-08 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16664#discussion_r100231056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+val destination = source match {
+  case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
+  case _ => extraOptions.get("path")
--- End diff --

>  is like metadata

It is metadata, but that doesn't mean it doesn't have meaning and thus 
doesn't need structure. Some of the metadata currently models the "where" the 
data is being written. Internally it doesn't really matter much how much it's 
handled (it's an "implementation detail"), but, for someone building an 
application that uses this information, knowing that a particular key means 
"where the data will end up" *is* very important, and a structured type with 
proper, documented fields helps that.

We just happen to want that information, and we could use it either way, 
but that's beside the point. I'm arguing that there's value in exposing this 
data in a more structured manner than just an opaque map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16736
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16736
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72619/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16736
  
**[Test build #72619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72619/testReport)**
 for PR 16736 at commit 
[`f29c9d7`](https://github.com/apache/spark/commit/f29c9d77a683c1a63abac92f19210eadcb68682e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16760
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72621/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16760
  
**[Test build #72621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72621/testReport)**
 for PR 16760 at commit 
[`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16664#discussion_r100229660
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+val destination = source match {
+  case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
+  case _ => extraOptions.get("path")
--- End diff --

Based on my understanding, the extra information we pass to 
QueryExecutionListener is like metadata. It is just for helping users 
understand the context. I still do not understand why we need to define a 
class/trait for it. This extra class/trait looks weird for this goal, unless 
you have some applications that are built on this class/trait.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-08 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r100229358
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/GetStructField2.scala
 ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions.{Expression, 
GetStructField}
+import org.apache.spark.sql.types.StructField
+
+/**
+ * A Scala extractor that extracts the child expression and struct field 
from a [[GetStructField]].
+ * This is in contrast to the [[GetStructField]] case class extractor 
which returns the field
+ * ordinal instead of the field itself.
+ */
+private[planning] object GetStructField2 {
--- End diff --

How about `GetStructFieldObject`? Or `GetStructFieldRef`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFr...

2017-02-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16854#discussion_r100229312
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -361,6 +362,41 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   }
 
   /**
+   * Loads an `Dataset[String]` storing CSV rows and returns the result as 
a `DataFrame`.
+   *
+   * Unless the schema is specified using `schema` function, this function 
goes through the
+   * input once to determine the input schema.
+   *
+   * @param csvDataset input Dataset with one CSV row per record
+   * @since 2.2.0
+   */
+  def csv(csvDataset: Dataset[String]): DataFrame = {
+val parsedOptions: CSVOptions = new CSVOptions(extraOptions.toMap)
--- End diff --

Just to help review, there is a similar code path in 
https://github.com/apache/spark/blob/3d314d08c9420e74b4bb687603cdd11394eccab5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala#L105-L125


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-02-08 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r100229300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala
 ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that builds a [[StructField]] from a Catalyst complex 
type
+ * extractor. This is like the opposite of [[ExtractValue#apply]].
+ */
+object SelectedField {
+  def unapply(expr: Expression): Option[StructField] = {
+// If this expression is an alias, work on its child instead
+val unaliased = expr match {
+  case Alias(child, _) => child
+  case expr => expr
+}
+selectField(unaliased, None)
+  }
+
+  /**
+   * Converts some chain of complex type extractors into a [[StructField]].
+   *
+   * @param expr the top-level complex type extractor
+   * @param fieldOpt the subfield of [[expr]], where relevent
+   */
+  private def selectField(expr: Expression, fieldOpt: 
Option[StructField]): Option[StructField] =
+expr match {
+  case AttributeReference(name, _, nullable, _) =>
+fieldOpt.map(field => StructField(name, StructType(Array(field)), 
nullable))
+  case GetArrayItem(GetStructField2(child, field @ StructField(name,
+  ArrayType(_, arrayNullable), fieldNullable, _)), _) =>
+val childField = fieldOpt.map(field => StructField(name, ArrayType(
+  StructType(Array(field)), arrayNullable), 
fieldNullable)).getOrElse(field)
+selectField(child, Some(childField))
+  case GetArrayStructFields(child,
--- End diff --

I've spent some time this week developing a few different solutions to this 
problem, however none of them are very easy to understand or verify. I'm going 
to spend some more time working on a simpler solution before posting something 
back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16750
  
**[Test build #72626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72626/testReport)**
 for PR 16750 at commit 
[`ffc4912`](https://github.com/apache/spark/commit/ffc4912e17cc900fc9d7ceefd0f66461109728e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16785: [SPARK-19443][SQL][WIP] The function to generate constra...

2017-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16785
  
**[Test build #72625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72625/testReport)**
 for PR 16785 at commit 
[`8c98a5c`](https://github.com/apache/spark/commit/8c98a5c3ab1477408988c8cb682733e65dd554fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16715
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72624/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16715
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72622/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 614 matches

Mail list logo