date:20160606

[GitHub] spark issue #13535: [SPARK-15792][SQL] Allows operator to change the verbosi...

2016-06-06 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13535
  
LGTM, merging to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13444
  
**[Test build #60106 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60106/consoleFull)**
 for PR 13444 at commit 
[`395d9e4`](https://github.com/apache/spark/commit/395d9e4155f3c4ebe2a45a91b54fa87ec7ca394b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13444
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60106/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13444
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-06 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13439
  
Oh, you are right. IMHO, it is too complex to introduce new implementation 
classes only for a column vector with the same value in all of the rows.
To introduce compression schemes, as implemented in ```CachedBatch``` may 
be more generic solution if we introduce new implementation classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13444
  
**[Test build #60106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60106/consoleFull)**
 for PR 13444 at commit 
[`395d9e4`](https://github.com/apache/spark/commit/395d9e4155f3c4ebe2a45a91b54fa87ec7ca394b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13529: [SPARK-15632][SQL]Typed Filter should NOT change ...

2016-06-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13529


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60104/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60104 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60104/consoleFull)**
 for PR 13518 at commit 
[`b64afc6`](https://github.com/apache/spark/commit/b64afc64d3121479eb5c3f8c8b5663b6e05349b7).
 * This patch **fails executing the `dev/run-tests` script**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13529
  
Merging to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/12836
  
I don't know what could cause this - Do we have the beginning of the string 
? My guess is `MapPartitions` or one of the nodes in the plan is calling 
`toString` on a byte Array that contains some R object and that generates all 
those numbers. 

One possibility could be that using a case class for 
`MapPartitionsRWrapper` or `FlatMapGroupsInR` means that the `toString` method 
auto generated by scala will try to print all the members of the case class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60104/consoleFull)**
 for PR 13518 at commit 
[`b64afc6`](https://github.com/apache/spark/commit/b64afc64d3121479eb5c3f8c8b5663b6e05349b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13444
  
**[Test build #60105 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60105/consoleFull)**
 for PR 13444 at commit 
[`7c4a33f`](https://github.com/apache/spark/commit/7c4a33ff73fbee8ac1503bfb4733a4885059f58c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13285: [Spark-15129][R][DOC]R API changes in ML

2016-06-06 Thread GayathriMurali

Github user GayathriMurali commented on the issue:

https://github.com/apache/spark/pull/13285
  
@yanboliang Please let me know if there is anything else I can do to get 
this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13176: [SPARK-15100][DOC] Modified user guide and exampl...

2016-06-06 Thread GayathriMurali

Github user GayathriMurali commented on a diff in the pull request:

https://github.com/apache/spark/pull/13176#discussion_r66011880
  
--- Diff: docs/ml-features.md ---
@@ -1092,14 +1095,11 @@ for more details on the API.
 ## QuantileDiscretizer
 
 `QuantileDiscretizer` takes a column with continuous features and outputs 
a column with binned
-categorical features.
-The bin ranges are chosen by taking a sample of the data and dividing it 
into roughly equal parts.
-The lower and upper bin bounds will be `-Infinity` and `+Infinity`, 
covering all real values.
-This attempts to find `numBuckets` partitions based on a sample of the 
given input data, but it may
-find fewer depending on the data sample values.
-
-Note that the result may be different every time you run it, since the 
sample strategy behind it is
-non-deterministic.
+categorical features. The number of bins is set by the `numBuckets` 
parameter.
+The bin ranges are chosen using an approximate algorithm (see the 
documentation for 
[approxQuantile](api/scala/index.html#org.apache.spark.sql.DataFrameStatFunctions.scala)
+for a detailed description). The precision of the approximation can be 
controlled with the
+`relativeError` parameter. When set to zero, exact quantiles are 
calculated (**Note:** Computing exact quantiles is an expensive operation). The 
default value of `relativeError` is 0.01.
--- End diff --

@MLnick What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13526
  
OK now I agree this is a useful API.

For performance, I would expect that 
`ds.groupByKey(_._1).mapValues(_._2).mapGroups { case (k, vs) => (k, vs.sum) }` 
should be at least as fast as `ds.groupByKey(_._1).mapGroups { case (k, vs) => 
(k, vs.map(_._2).sum) }`. But the current implementation looks not?

I'll take a closer look tomorrow, and let's discuss what's the best way to 
do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
Do you know what exactly caused this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-06 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
Hi @shivaram , hi @sun-rui ,
Surprisingly the `dataframe.queyExection.toString` both for dapply and 
gapply is prepended by a huge array, which I'm not able to understand. It seems 
that recent commits causes this. 
I've added the following code snippet in mapPartitionsInR: 
```
   print(df.queryExecution)
print("this was dapply")
```
 
And this is what I see :(


[ ...  0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 1, 44, 0, 4, 0, 9, 0, 0, 0, 
0, 0, 4, 0, 9, 0, 0, 0, 6, 115, 99, 104, 101, 109, 97, 0, 4, 0, 9, 0, 0, 0, 1, 
41, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 0, 0, 
4, 0, 9, 0, 0, 0, 36, 35, 32, 78, 117, 109, 98, 101, 114, 32, 111, 102, 32, 
112, 97, 114, 116, 105, 116, 105, 111, 110, 115, 32, 105, 115, 32, 101, 113, 
117, 97, 108, 32, 116, 111, 32, 50, 0, 4, 0, 9, 0, 0, 0, 12, 101, 120, 112, 
101, 99, 116, 95, 101, 113, 117, 97, 108, 0, 4, 0, 9, 0, 0, 0, 1, 40, 0, 4, 0, 
9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 4, 110, 114, 111, 119, 0, 4, 0, 9, 0, 0, 0, 
1, 40, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 3, 100, 102, 49, 0, 4, 0, 
9, 0, 0, 0, 1, 41, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 
0, 0, 0, 1, 44, 0, 4, 0, 9, 0, 0, 0, 1, 50, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 
0, 0, 0, 1, 41, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 1, 125, 0, 4, 0, 
9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 1, 41, 0, 4, 0, 9, 0, 0
 , 0, 0, 0, 4, 0, 9, 0, 0, 0, 6, 117, 110, 108, 105, 110, 107, 0, 4, 0, 9, 0, 
0, 0, 1, 40, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 11, 112, 97, 114, 
113, 117, 101, 116, 80, 97, 116, 104, 0, 4, 0, 9, 0, 0, 0, 1, 41, 0, 4, 0, 9, 
0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 6, 117, 110, 108, 105, 
110, 107, 0, 4, 0, 9, 0, 0, 0, 1, 40, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 
0, 8, 106, 115, 111, 110, 80, 97, 116, 104, 0, 4, 0, 9, 0, 0, 0, 1, 41, 0, 4, 
0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 6, 117, 110, 
108, 105, 110, 107, 0, 4, 0, 9, 0, 0, 0, 1, 40, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 
0, 9, 0, 0, 0, 10, 106, 115, 111, 110, 80, 97, 116, 104, 78, 97, 0, 4, 0, 9, 0, 
0, 0, 1, 41, 0, 4, 0, 9, 0, 0, 0, 0, 0, 4, 0, 9, 0, 0, 0, 0, 0, 0, 4, 2, 0, 0, 
9, -1, 0, 0, 0, 16, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 9, 112, 97, 114, 115, 101, 
68, 97, 116, 97, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 4, 2, 0, 0, 9, -1, 0, 0, 0, 
16, 0, 0, 0, 2, 0, 4, 0, 9, 0, 0, 0, 11, 115, 114, 99, 102
 , 105, 108, 101, 99, 111, 112, 121, 0, 4, 0, 9, 0, 0, 0, 7, 115, 114, 99, 102, 
105, 108, 101, 0, 0, 0, -2, 0, 0, 4, 2, 0, 0, 9, -1, 0, 0, 0, 16, 0, 0, 0, 1, 
0, 4, 0, 9, 0, 0, 0, 6, 115, 114, 99, 114, 101, 102, 0, 0, 0, -2, 0, 0, 0, -2, 
0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, -2, 0, 0, 0, 19, 0, 0, 0, 29, 0, 
0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 
0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 
0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 
0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 
0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, -2, 0, 
0, 4, 2, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 1, 120, 0, 0, 0, -5, 0, 0, 0, -2, 0, 
0, 2, 6, 0, 0, 4, 2, 0, 0, 1, -1, 0, 0, 0, 19, 0, 0, 0, 2, 0, 0, 3, 13, 0, 0, 
0, 8, 0, 0, 8, 84, 0, 0, 0, 17, 0, 0, 8, 84, 0, 0, 0, 17, 0, 0, 0, 17, 0, 0, 0, 
17, 0, 0, 8, 84, 0, 0, 8, 84, 0, 0, 4, 2, 0, 0, 2, -1, 
 0, 0, 3, -1, 0, 0, 4, 2, 0, 0, 9, -1, 0, 0, 0, 16, 0, 0, 0, 1, 0, 4, 0, 9, 0, 
0, 0, 6, 115, 114, 99, 114, 101, 102, 0, 0, 0, -2, 0, 0, 3, 13, 0, 0, 0, 8, 0, 
0, 8, 85, 0, 0, 0, 7, 0, 0, 8, 85, 0, 0, 0, 42, 0, 0, 0, 7, 0, 0, 0, 42, 0, 0, 
8, 85, 0, 0, 8, 85, 0, 0, 4, 2, 0, 0, 2, -1, 0, 0, 3, -1, 0, 0, 4, 2, 0, 0, 9, 
-1, 0, 0, 0, 16, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 6, 115, 114, 99, 114, 101, 
102, 0, 0, 0, -2, 0, 0, 4, 2, 0, 0, 2, -1, 0, 0, 3, -1, 0, 0, 4, 2, 0, 0, 0, 1, 
0, 4, 0, 9, 0, 0, 0, 11, 119, 104, 111, 108, 101, 83, 114, 99, 114, 101, 102, 
0, 0, 3, 13, 0, 0, 0, 8, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 8, 86, 0, 0, 0, 5, 0, 0, 
0, 0, 0, 0, 0, 5, 0, 0, 0, 1, 0, 0, 8, 86, 0, 0, 4, 2, 0, 0, 2, -1, 0, 0, 3, 
-1, 0, 0, 4, 2, 0, 0, 9, -1, 0, 0, 0, 16, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 6, 
115, 114, 99, 114, 101, 102, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 0, 1, 0, 4, 0, 9, 
0, 0, 0, 1, 123, 0, 0, 0, 2, 0, 0, 0, 6, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 2, 
60, 45, 0, 0, 0, 2, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 1, 
 121, 0, 0, 0, 2, 0, 0, 0, 6, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 1, 40, 0, 0, 0, 
2, 0, 0, 0, 6, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 10, 100, 97, 116, 97, 46, 102, 
114, 97, 109, 101, 0, 0, 0, 2, 0, 0, 0, 6, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 1, 
91, 0, 0, 0, 2, 0, 0, 0, 6, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 1, 36, 0, 0, 0, 2, 
0, 0, 17, -1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 1, 97, 0, 0, 0, -2, 
0, 0, 0, 2, 0, 0, 0, 14, 0, 0, 0, 1, 63,

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60103/consoleFull)**
 for PR 13518 at commit 
[`2ead307`](https://github.com/apache/spark/commit/2ead307d01d8f908951fbc059b8e49bbc77947b1).
 * This patch **fails executing the `dev/run-tests` script**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60103/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-06 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13415
  
LGTM, cc @andrewor14 for final sign off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60103/consoleFull)**
 for PR 13518 at commit 
[`2ead307`](https://github.com/apache/spark/commit/2ead307d01d8f908951fbc059b8e49bbc77947b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60102/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60102/consoleFull)**
 for PR 13518 at commit 
[`7fca579`](https://github.com/apache/spark/commit/7fca579d9fec2589f13403b0eb0f2d5f5e6bd52a).
 * This patch **fails executing the `dev/run-tests` script**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60102/consoleFull)**
 for PR 13518 at commit 
[`7fca579`](https://github.com/apache/spark/commit/7fca579d9fec2589f13403b0eb0f2d5f5e6bd52a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-06-06 Thread JustinPihony

Github user JustinPihony commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r66009900
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -96,7 +97,16 @@ private[sql] case class JDBCRelation(
 
   override val needConversion: Boolean = false
 
-  override val schema: StructType = JDBCRDD.resolveTable(url, table, 
properties)
+  override val schema: StructType = {
+val resolvedSchema = JDBCRDD.resolveTable(url, table, properties)
+providedSchemaOption match {
+  case Some(providedSchema) =>
+if (providedSchema.sql.toLowerCase == 
resolvedSchema.sql.toLowerCase) resolvedSchema
--- End diff --

This is the only area I'm unsure about. I'd like a second opinion on 
whether this seems ok, or if I need to build something more custom for schema 
comparison.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13447: [SPARK-15706] [SQL] Fix Wrong Answer when using IF NOT E...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13447
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13447: [SPARK-15706] [SQL] Fix Wrong Answer when using IF NOT E...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60098/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13447: [SPARK-15706] [SQL] Fix Wrong Answer when using IF NOT E...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13447
  
**[Test build #60098 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60098/consoleFull)**
 for PR 13447 at commit 
[`bbaad66`](https://github.com/apache/spark/commit/bbaad666250532d80c8ce57a33b6b94433bcef76).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60101/consoleFull)**
 for PR 13518 at commit 
[`1e9af1c`](https://github.com/apache/spark/commit/1e9af1cef892706de8b07728c192dd8ca5e5851e).
 * This patch **fails executing the `dev/run-tests` script**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60101/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60099/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13415
  
**[Test build #60099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60099/consoleFull)**
 for PR 13415 at commit 
[`f4207e3`](https://github.com/apache/spark/commit/f4207e3c185a13a7f2866b0f12fcde5b28b8d948).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13518
  
**[Test build #60101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60101/consoleFull)**
 for PR 13518 at commit 
[`1e9af1c`](https://github.com/apache/spark/commit/1e9af1cef892706de8b07728c192dd8ca5e5851e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12370: [SPARK-14599][ML] BaggedPoint should support samp...

2016-06-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/12370#discussion_r66007915
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -33,13 +33,20 @@ import org.apache.spark.util.random.XORShiftRandom
  * this datum has 1 copy, 0 copies, and 4 copies in the 3 subsamples, 
respectively.
  *
  * @param datum  Data instance
- * @param subsampleWeights  Weight of this instance in each subsampled 
dataset.
- *
- * TODO: This does not currently support (Double) weighted instances.  
Once MLlib has weighted
- *   dataset support, update.  (We store subsampleWeights as Double 
for this future extension.)
+ * @param subsampleCounts  Number of samples of this instance in each 
subsampled dataset.
+ * @param sampleWeight The weight of this instance.
  */
-private[spark] class BaggedPoint[Datum](val datum: Datum, val 
subsampleWeights: Array[Double])
-  extends Serializable
+private[spark] class BaggedPoint[Datum](
+val datum: Datum,
+val subsampleCounts: Array[Int],
+val sampleWeight: Double) extends Serializable {
+
+  /**
+   * Subsample counts weighted by the sample weight.
+   */
+  def weightedCounts: Array[Double] = subsampleCounts.map(_ * sampleWeight)
--- End diff --

I added this as a convenience method. If we make it a val then we add 
storage overhead in the class which is redundant. If preferable, we could 
remove it entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12258
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-06 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13518
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60096/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13518: [WIP][SPARK-15472][SQL] Add support for writing i...

2016-06-06 Thread lw-lin

GitHub user lw-lin reopened a pull request:

https://github.com/apache/spark/pull/13518

[WIP][SPARK-15472][SQL] Add support for writing in `csv`, `json`, `text` 
formats in Structured Streaming

## What changes were proposed in this pull request?

This patch adds support for writing in `csv`, `json`, `text` formats in 
Structured Streaming:

**1. at a high level, this patch forms the following hierarchy**(`text` as 
an example):
```

  â
 TextOutputWriterBase
 â  â
BatchTextOutputWriter   StreamingTextOutputWriter
```
```

â  â
BatchTextOutputWriterFactory   StreamingOutputWriterFactory
  â
  StreamingTextOutputWriterFactory
```
The `StreamingTextOutputWriter` and other 'streaming' output writers would 
write data **without** using an `OutputCommitter`. This was the same approach 
taken by [SPARK-14716](https://github.com/apache/spark/pull/12409).

**2. to support compression, this patch attaches an extension to the path 
assigned by `FileStreamSink`**, which is slightly different from 
[SPARK-14716](https://github.com/apache/spark/pull/12409). For example, if we 
are writing out using the `gzip` compression and `FileStreamSink` assigns path 
`${uuid}` to a text writer, then in the end the file written out will be 
`${uuid}.txt.gz` -- so that when we read the file back, we'll correctly 
interpret it as `gzip` compressed.

## How was this patch tested?

`FileStreamSinkSuite` is expanded much more to cover the added `csv`, 
`json`, `text` formats:

```scala
test(" csv - unpartitioned data - codecs: none/gzip")
test("json - unpartitioned data - codecs: none/gzip")
test("text - unpartitioned data - codecs: none/gzip")

test(" csv - partitioned data - codecs: none/gzip")
test("json - partitioned data - codecs: none/gzip")
test("text - partitioned data - codecs: none/gzip")

test(" csv - unpartitioned writing and batch reading - codecs: none/gzip")
test("json - unpartitioned writing and batch reading - codecs: none/gzip")
test("text - unpartitioned writing and batch reading - codecs: none/gzip")

test(" csv - partitioned writing and batch reading - codecs: none/gzip")
test("json - partitioned writing and batch reading - codecs: none/gzip")
test("text - partitioned writing and batch reading - codecs: none/gzip")
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark add-csv-json-text-for-ss

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13518


commit 97034f9aeb092b10e1606e60a8e6b4878ebd54cf
Author: Liwei Lin 
Date:   2016-06-05T09:03:04Z

Add csv, json, text

commit 2035b597b44aa519d8da3b155036446f88b3050e
Author: Liwei Lin 
Date:   2016-06-05T09:03:15Z

Fix parquet extension

commit 4737361489fd680405b291ec498ab91374685ffe
Author: Liwei Lin 
Date:   2016-06-05T11:52:14Z

Fix style

commit 90d02c4a10c14af83bbed985e36ef99a1edaa48b
Author: Liwei Lin 
Date:   2016-06-06T08:02:32Z

Fix tests

commit daec480bd16ed52137d32f332debe3806953f4d2
Author: Liwei Lin 
Date:   2016-06-06T09:03:08Z

Revert "Fix tests"

This reverts commit 90d02c4a10c14af83bbed985e36ef99a1edaa48b.

commit 43b68d426e9b64061095eca7a1db0e762843adef
Author: Liwei Lin 
Date:   2016-06-06T09:09:10Z

Fix tests

commit 56dbb9b4f0f7e2bf76935e0d1d2fc6c6cdf141ff
Author: Liwei Lin 
Date:   2016-06-06T12:34:07Z

Investigate test

commit 91e51aed5caf663d5068057e5cf28f21eb768310
Author: Liwei Lin 
Date:   2016-06-06T12:43:39Z

Investigate test

commit eb2090ce9fa04efbd23370f7d3e6cb98fd0b4c74
Author: Liwei Lin 
Date:   2016-06-06T13:00:26Z

Update run-tests

commit 1e9af1cef892706de8b07728c192dd8ca5e5851e
Author: Liwei Lin 
Date:   2016-06-07T04:10:34Z

Investigate test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12258
  
**[Test build #60096 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60096/consoleFull)**
 for PR 12258 at commit 
[`830fb37`](https://github.com/apache/spark/commit/830fb37f545499e4315b4cd2495aac0fd783a00e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60100/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13400
  
**[Test build #60100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60100/consoleFull)**
 for PR 13400 at commit 
[`5bc8996`](https://github.com/apache/spark/commit/5bc89966765e1ec37b7c8d167ac6156988a9a720).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13400
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60092/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13517
  
**[Test build #60092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60092/consoleFull)**
 for PR 13517 at commit 
[`eb0031a`](https://github.com/apache/spark/commit/eb0031af138ed40a8ede9329e10d5fe0ea7e7e8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13373
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60094/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13373
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13373
  
**[Test build #60094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60094/consoleFull)**
 for PR 13373 at commit 
[`fc25f72`](https://github.com/apache/spark/commit/fc25f7269adb554ba2e01a92c19eb2ff3bf1ee93).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13537: [SPARK-15794] Should truncate toString() of very wide pl...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13537
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60093/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13537: [SPARK-15794] Should truncate toString() of very wide pl...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13537
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13537: [SPARK-15794] Should truncate toString() of very wide pl...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13537
  
**[Test build #60093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60093/consoleFull)**
 for PR 13537 at commit 
[`17f98d7`](https://github.com/apache/spark/commit/17f98d76aec40bc7c6b8c46925d4013f9bccd639).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13259: [SPARK-15480][UI][Streaming]show missed InputInfo in str...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13259
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60095/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13259: [SPARK-15480][UI][Streaming]show missed InputInfo in str...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13259
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13259: [SPARK-15480][UI][Streaming]show missed InputInfo in str...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13259
  
**[Test build #60095 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60095/consoleFull)**
 for PR 13259 at commit 
[`ffd1787`](https://github.com/apache/spark/commit/ffd178779c09b31e6e09e92b1b611d23fe90df4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-06 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13526
  
can you explain a bit what is inefficient and would need an optimizer rule? 
is it mapValues being called twice? once for the key and then for the new 
values?
thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-06 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13526

see this conversation:

https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3ccaaswr-7kqfmxd_cpr-_wdygafh+rarecm9olm5jkxfk14fc...@mail.gmail.com%3E

mapGroups is not a very interesting API, since without support for
secondary sort and hence no need for fold operations pushing all the value into
the reducer never really makes sense. so the interesting APIs are reduce (when
its fixed to be efficient and not use mapGroups) and agg.
how do you transform the values before they go into reduce? you can not do
this currently, which is why we need something like mapValues. with Aggregators
you can indeed do something similar inside the Aggregator (since the input type
is not equal to the buffer type), but this leads to all Aggregators currently
taking in some kind of input transform function, which hints at a suboptimal
API and a pattern that should be generalized and extracted.

i am curious to know why appending a column is inefficient? i am open to
different designs

about this being a rare case: i would argue the opposite. i expect to see a
lot of key-value datasets (```Dataset[(K, V)]```) in our codebase, and on those
a lot of operations like ```ds.groupByKey(_._1).mapValues.(_._2).reduce(...)```.
since this is the most natural translation of many RDD algos.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13539
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13539: [SPARK-15795] [SQL] Enable more optimizations in ...

2016-06-06 Thread inouehrs

GitHub user inouehrs opened a pull request:

https://github.com/apache/spark/pull/13539

[SPARK-15795] [SQL] Enable more optimizations in whole stage codegen when 
isNull is a compile-time constant

## What changes were proposed in this pull request?
Whole stage codegen often creates `isNull` variable initialized with 
constant _false_, like
`boolean mapelements_isNull = false || false;`

If there is no further assignment for this `isNull` variable, whole stage 
codegen can do more optimizations by assuming `isNull` as a compile-time 
constant.

In the example below, which is generated for a dataset map operation, 
`mapelements_isNull` defined at line 115 can be assumed by a compile-time 
constant (false). 
By assuming this as a constant, the whole stage codegen eliminates 
`zeroOutNullBytes` at line 119 and an if-statement at line 121.
In addition to the benefits of improved readability of generated code, 
eliminating `zeroOutNullBytes` will give performance advantage since it is 
difficult to remove for Java JIT compiler.

without this patch
```
/* 107 */   // CONSUME: Project [id#0L AS l#3L]
/* 108 */   // CONSUME: DeserializeToObject l#3: bigint, obj#16: bigint
/* 109 */   // CONSUME: MapElements , obj#17: bigint
/* 110 */   // CONSUME: SerializeFromObject [input[0, bigint, true] AS 
value#18L]
/* 111 */   // .apply
/* 112 */   Object mapelements_obj = ((Expression) 
references[1]).eval(null);
/* 113 */   scala.Function1 mapelements_value1 = (scala.Function1) 
mapelements_obj;
/* 114 */
/* 115 */   boolean mapelements_isNull = false || false;
/* 116 */   final long mapelements_value = mapelements_isNull ? -1L : 
(Long) mapelements_value1.apply(range_value);
/* 117 */
/* 118 */   // CONSUME: WholeStageCodegen
/* 119 */   serializefromobject_rowWriter.zeroOutNullBytes();
/* 120 */
/* 121 */   if (mapelements_isNull) {
/* 122 */ serializefromobject_rowWriter.setNullAt(0);
/* 123 */   } else {
/* 124 */ serializefromobject_rowWriter.write(0, mapelements_value);
/* 125 */   }
/* 126 */   append(serializefromobject_result);
```

with this patch
```
/* 107 */   // CONSUME: Project [id#0L AS l#3L]
/* 108 */   // CONSUME: DeserializeToObject l#3: bigint, obj#9: bigint
/* 109 */   // CONSUME: MapElements , obj#10: bigint
/* 110 */   // CONSUME: SerializeFromObject [input[0, bigint, true] AS 
value#11L]
/* 111 */   // .apply
/* 112 */   Object mapelements_obj = ((Expression) 
references[1]).eval(null);
/* 113 */   scala.Function1 mapelements_value1 = (scala.Function1) 
mapelements_obj;
/* 114 */
/* 115 */   final boolean mapelements_isNull = false || false;
/* 116 */   final long mapelements_value = mapelements_isNull ? -1L : 
(Long) mapelements_value1.apply(range_value);
/* 117 */
/* 118 */   // CONSUME: WholeStageCodegen
/* 119 */   serializefromobject_rowWriter.write(0, mapelements_value);
/* 120 */   append(serializefromobject_result);
```


## How was this patch tested?

by unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/inouehrs/spark dev_nullcheck_opt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13539


commit fb6f3b5a6c5fb80adc249fbb54a8c2ed884c7dbb
Author: Hiroshi Inoue 
Date:   2016-06-06T17:31:04Z

enable null check elimination based on generated code

commit 32f158ce5da29f9562c9aa3b4751d2241c4898ca
Author: Hiroshi Inoue 
Date:   2016-06-06T19:41:32Z

Merge branch 'apache/master' into dev_nullcheck_opt

commit 60f582dc3e75db6ff5fe642b692f22f5d7bc7ab2
Author: Hiroshi Inoue 
Date:   2016-06-07T01:53:33Z

make definition of isNull final




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13415
  
**[Test build #60099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60099/consoleFull)**
 for PR 13415 at commit 
[`f4207e3`](https://github.com/apache/spark/commit/f4207e3c185a13a7f2866b0f12fcde5b28b8d948).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13400
  
**[Test build #60100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60100/consoleFull)**
 for PR 13400 at commit 
[`5bc8996`](https://github.com/apache/spark/commit/5bc89966765e1ec37b7c8d167ac6156988a9a720).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13447: [SPARK-15706] [SQL] Fix Wrong Answer when using IF NOT E...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13447
  
**[Test build #60098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60098/consoleFull)**
 for PR 13447 at commit 
[`bbaad66`](https://github.com/apache/spark/commit/bbaad666250532d80c8ce57a33b6b94433bcef76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13400
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...

2016-06-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13415
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13447: [SPARK-15706] [SQL] Fix Wrong Answer when using IF NOT E...

2016-06-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13447
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13538: [MINOR] fix typo in documents

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13538
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13538: [MINOR] fix typo in documents

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13538
  
**[Test build #60097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60097/consoleFull)**
 for PR 13538 at commit 
[`8af1bf8`](https://github.com/apache/spark/commit/8af1bf8f0d1a59aea35633780ca439f6c459bb78).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13538: [MINOR] fix typo in documents

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13538
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60097/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13536: [SPARK-15793][ML] Add maxSentenceLength for ml.Word2Vec

2016-06-06 Thread yinxusen

Github user yinxusen commented on the issue:

https://github.com/apache/spark/pull/13536
  
retest it please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-06 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
@kiszk BTW, we can't simply do this by `ColumnarBatch.allocate` with 
`maxRows=1`. Because we still need to take care of element access. In other 
words, from outside, the vector looks like it has the same number of elements 
as other columns, not just 1 row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13538: [MINOR] fix typo in documents

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13538
  
**[Test build #60097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60097/consoleFull)**
 for PR 13538 at commit 
[`8af1bf8`](https://github.com/apache/spark/commit/8af1bf8f0d1a59aea35633780ca439f6c459bb78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60091/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13529
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13529
  
**[Test build #60091 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60091/consoleFull)**
 for PR 13529 at commit 
[`1281131`](https://github.com/apache/spark/commit/12811316e023a0c0427975db643f268a5b3175ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13538: [MINOR] fix typo in documents

2016-06-06 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/13538

[MINOR] fix typo in documents

## What changes were proposed in this pull request?

I use spell check tools checks typo in spark documents and fix them.

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark fix_doc_typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13538


commit 8af1bf8f0d1a59aea35633780ca439f6c459bb78
Author: WeichenXu 
Date:   2016-06-06T14:23:10Z

fix typo in documents




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12258
  
**[Test build #60096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60096/consoleFull)**
 for PR 12258 at commit 
[`830fb37`](https://github.com/apache/spark/commit/830fb37f545499e4315b4cd2495aac0fd783a00e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13259: [SPARK-15480][UI][Streaming]show missed InputInfo in str...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13259
  
**[Test build #60095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60095/consoleFull)**
 for PR 13259 at commit 
[`ffd1787`](https://github.com/apache/spark/commit/ffd178779c09b31e6e09e92b1b611d23fe90df4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13373: [SPARK-15616] [SQL] Metastore relation should fallback t...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13373
  
**[Test build #60094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60094/consoleFull)**
 for PR 13373 at commit 
[`fc25f72`](https://github.com/apache/spark/commit/fc25f7269adb554ba2e01a92c19eb2ff3bf1ee93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13259: [SPARK-15480][UI][Streaming]show missed InputInfo in str...

2016-06-06 Thread mwws

Github user mwws commented on the issue:

https://github.com/apache/spark/pull/13259
  
@zsxwing Yes sir, a unit test has been added. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13527: [SPARK-15782] [CORE] Set spark.jars system proper...

2016-06-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13527#discussion_r66000637
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -444,6 +444,7 @@ object SparkSubmit {
   OptionAssigner(args.deployMode, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
 sysProp = "spark.submit.deployMode"),
   OptionAssigner(args.name, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, 
sysProp = "spark.app.name"),
+  OptionAssigner(args.jars, ALL_CLUSTER_MGRS, CLIENT, sysProp = 
"spark.jars"),
--- End diff --

> How should Spark behave when a user both sets spark.jars and 
spark.yarn.dist.jars? Do we want to set union them?

Yes, I think that should be the expected behavior. There might be some 
weird semantics that need to be addressed, though:

- user sets `spark.jars` in conf
- SparkSubmitArguments sets `args.jars` to value of `spark.jars`
- YARN-specific code sets `spark.yarn.dist.jars` to value of `args.jars`
- Now both `spark.jars` and `spark.yarn.dist.jars` have the same value.

Perhaps when the YARN translation happens above, `spark.jars` should be 
explicitly unset.

> In SparkConf there is a setJars() method that sets spark.jars

I think it's ok to leave that as is. It could be done as an optimization, 
but it's not necessarily related to this bug so let's leave that for a 
different change, if desired.

> Does implementing option 2 resolve the issue completely?

I did some quick testing and it seems to work. You just need to make sure 
the code works regardless of which option is set. The classpath for the repl 
should be the concatenation of both options.

Optimally the repl would only look at the YARN-specific options when 
running in YARN mode, but I think that code runs too early in the process for 
Spark to already know whether it will use YARN or not.

Also remember you need to change both the 2.10 and 2.11 repl code. Might be 
easier if you make it a helper method in `Utils.scala`, for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-06 Thread zhonghaihua

Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r66000194
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
   taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  if (executorId != null && 
!executorIdToTaskCount.contains(executorId)) {
+logInfo(
+  ("Ignoring update with state %s for TID %s because its 
executor has already " +
--- End diff --

Yeah, thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12258: [SPARK-14485][CORE] ignore task finished for exec...

2016-06-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12258#discussion_r65999829
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -343,17 +343,31 @@ private[spark] class TaskSchedulerImpl(
 }
 taskIdToTaskSetManager.get(tid) match {
   case Some(taskSet) =>
+var executorId: String = null
 if (TaskState.isFinished(state)) {
   taskIdToTaskSetManager.remove(tid)
   taskIdToExecutorId.remove(tid).foreach { execId =>
+executorId = execId
 if (executorIdToTaskCount.contains(execId)) {
   executorIdToTaskCount(execId) -= 1
 }
   }
 }
 if (state == TaskState.FINISHED) {
+  // In some case, executor has already removed by driver for 
heartbeats timeout, but
+  // at sometime, before executor killed  by cluster, the task 
of running on this
+  // executor is finished and return task success state to 
driver. However, this kinds
+  // of task should be ignored, because the task on this 
executor is already re-queued
+  // by driver. For more details, can check in SPARK-14485.
   taskSet.removeRunningTask(tid)
-  taskResultGetter.enqueueSuccessfulTask(taskSet, tid, 
serializedData)
+  if (executorId != null && 
!executorIdToTaskCount.contains(executorId)) {
+logInfo(
+  ("Ignoring update with state %s for TID %s because its 
executor has already " +
--- End diff --

There are lots of places where we could use interpolation instead of format 
strings. But there's no need to go around and modify existing, working code, 
let's try first to just enforce that rule for new code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13535: [SPARK-15792][SQL] Allows operator to change the verbosi...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60088/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13535: [SPARK-15792][SQL] Allows operator to change the verbosi...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13535
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13535: [SPARK-15792][SQL] Allows operator to change the verbosi...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13535
  
**[Test build #60088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60088/consoleFull)**
 for PR 13535 at commit 
[`1ed295f`](https://github.com/apache/spark/commit/1ed295f12b3da10c0a2ffed69d19ed1ba2f0e010).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13537: [SPARK-15794] Should truncate toString() of very wide sc...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13537
  
**[Test build #60093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60093/consoleFull)**
 for PR 13537 at commit 
[`17f98d7`](https://github.com/apache/spark/commit/17f98d76aec40bc7c6b8c46925d4013f9bccd639).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13537: [SPARK-15794] Should truncate toString() of very wide sc...

2016-06-06 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/13537
  
cc @JoshRosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13537: [SPARK-15794] Should truncate toString() of very ...

2016-06-06 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/13537

[SPARK-15794] Should truncate toString() of very wide schemas

## What changes were proposed in this pull request?

With very wide tables, e.g. thousands of fields, the output is unreadable 
and often causes OOMs due to inefficient string processing. This truncates all 
struct and operator field lists to a user configurable threshold to limit 
performance and readability impact.

It would also be nice to optimize string generation to avoid these sort of 
O(n^2) slowdowns entirely (i.e. use StringBuilder everywhere including 
expressions), but this is probably too large of a change for 2.0 at this point.

## How was this patch tested?

Added a microbenchmark that covers this case particularly well. I also ran 
the microbenchmark while varying the truncation threshold.

```
numFields = 5
wide shallowly nested struct field r/w:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


2000 wide x 50 rows (write in-mem)2336 / 2558  0.0  
 23364.4   0.1X

numFields = 25
wide shallowly nested struct field r/w:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


2000 wide x 50 rows (write in-mem)4237 / 4465  0.0  
 42367.9   0.1X

numFields = 100
wide shallowly nested struct field r/w:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


2000 wide x 50 rows (write in-mem)  10458 / 11223  0.0  
104582.0   0.0X

numFields = Infinity
wide shallowly nested struct field r/w:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


[info]   java.lang.OutOfMemoryError: Java heap space
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark truncated-string

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13537


commit d16e0f3e22287a7f3779ed24239d84179602e30a
Author: Eric Liang 
Date:   2016-06-07T00:56:06Z

truncate strings

commit f4f4368d3550b864c6286ce04770990b41c6741c
Author: Eric Liang 
Date:   2016-06-07T01:37:13Z

Mon Jun  6 18:37:13 PDT 2016

commit 17f98d76aec40bc7c6b8c46925d4013f9bccd639
Author: Eric Liang 
Date:   2016-06-07T01:43:24Z

Mon Jun  6 18:43:24 PDT 2016




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60087/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13529
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13529: [SPARK-15632][SQL]Typed Filter should NOT change the Dat...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13529
  
**[Test build #60087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60087/consoleFull)**
 for PR 13529 at commit 
[`5e899f6`](https://github.com/apache/spark/commit/5e899f61790dcfaafcfe896c00cb01c486dd57d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13517
  
**[Test build #60092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60092/consoleFull)**
 for PR 13517 at commit 
[`eb0031a`](https://github.com/apache/spark/commit/eb0031af138ed40a8ede9329e10d5fe0ea7e7e8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13517
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60090/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 609 matches

Mail list logo