[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18924
  
**[Test build #82506 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82506/testReport)**
 for PR 18924 at commit 
[`a81dae5`](https://github.com/apache/spark/commit/a81dae574f2085ec390effd1b9b1962970f00239).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19082
  
**[Test build #82502 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82502/testReport)**
 for PR 19082 at commit 
[`1880dfd`](https://github.com/apache/spark/commit/1880dfdfedbdef11d39cb092202a6bc7db95e374).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82502/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread jsnowacki
Github user jsnowacki commented on the issue:

https://github.com/apache/spark/pull/19370
  
@HyukjinKwon Commit squashed to one as you've requested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #82510 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82510/testReport)**
 for PR 19370 at commit 
[`aec49a0`](https://github.com/apache/spark/commit/aec49a0f3027a7e2c0c83339232a37926db1d2dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
Yup, it looks triggering fine - 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1822-master
 although I wonder why check mark does not appear. I think it is not specific 
to this PR but rather AppVeyor itself though ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19440: [SPARK-21871][SQL] Fix infinite loop when bytecode size ...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19440
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19442
  
**[Test build #82500 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82500/testReport)**
 for PR 19442 at commit 
[`de0aa76`](https://github.com/apache/spark/commit/de0aa76199141255258d9d5b12a0d31b1758c6f1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19442
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82500/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19442
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18732
  
**[Test build #82501 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82501/testReport)**
 for PR 18732 at commit 
[`20fb1fe`](https://github.com/apache/spark/commit/20fb1fe9cbf033d73ecf2851f9cb1dc94f41fb3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82501/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

2017-10-06 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/19444

[SPARK-22214][SQL] Refactor the list hive partitions code

## What changes were proposed in this pull request?

In this PR we make a few changes to the list hive partitions code, to make 
the code more extensible.
The following changes are made:
1. In `HiveClientImpl.getPartitions()`, call `client.getPartitions` instead 
of `shim.getAllPartitions` when `spec` is empty;
2. In `HiveTableScanExec`, previously we always call 
`listPartitionsByFilter` if the config `metastorePartitionPruning` is enabled, 
but actually, we'd better call `listPartitions` if `partitionPruningPred` is 
empty;
3.  We should use sessionCatalog instead of SharedState.externalCatalog in 
`HiveTableScanExec`.

## How was this patch tested?

Tested by existing test cases since this is code refactor, no regression or 
behavior change is expected.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark hivePartitions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19444


commit 8f50c7c47934a8dca662e8e2d5eacbc0b394eaa5
Author: Xingbo Jiang 
Date:   2017-10-06T11:04:29Z

refactor list hive partitions.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19444
  
**[Test build #82509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82509/testReport)**
 for PR 19444 at commit 
[`8f50c7c`](https://github.com/apache/spark/commit/8f50c7c47934a8dca662e8e2d5eacbc0b394eaa5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19445: Dataset select all columns

2017-10-06 Thread sohum2002
GitHub user sohum2002 opened a pull request:

https://github.com/apache/spark/pull/19445

Dataset select all columns

The proposed two new additional functions is to help select all the columns 
in a Dataset except for given columns.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohum2002/spark dataset_selectAllColumns

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19445


commit d35a1268d784a268e6137eff54eb8f83c981a289
Author: Burak Yavuz 
Date:   2017-02-01T00:52:53Z

[SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics 
even if there is no new data in trigger

In StructuredStreaming, if a new trigger was skipped because no new data 
arrived, we suddenly report nothing for the metrics `stateOperator`. We could 
however easily report the metrics from `lastExecution` to ensure continuity of 
metrics.

Regression test in `StreamingQueryStatusAndProgressSuite`

Author: Burak Yavuz 

Closes #16716 from brkyvz/state-agg.

(cherry picked from commit 081b7addaf9560563af0ce25912972e91a78cee6)
Signed-off-by: Tathagata Das 

commit 61cdc8c7cc8cfc57646a30da0e0df874a14e3269
Author: Zheng RuiFeng 
Date:   2017-02-01T13:27:20Z

[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning

## What changes were proposed in this pull request?
Fix brokens links in ml-pipeline and ml-tuning
``  ->   ``

## How was this patch tested?
manual tests

Author: Zheng RuiFeng 

Closes #16754 from zhengruifeng/doc_api_fix.

(cherry picked from commit 04ee8cf633e17b6bf95225a8dd77bf2e06980eb3)
Signed-off-by: Sean Owen 

commit f946464155bb907482dc8d8a1b0964a925d04081
Author: Devaraj K 
Date:   2017-02-01T20:55:11Z

[SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED

## What changes were proposed in this pull request?

Copying of the killed status was missing while getting the newTaskInfo 
object by dropping the unnecessary details to reduce the memory usage. This 
patch adds the copying of the killed status to newTaskInfo object, this will 
correct the display of the status from wrong status to KILLED status in Web UI.

## How was this patch tested?

Current behaviour of displaying tasks in stage UI page,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | 
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle 
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
--- |
|143|10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x 
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  | 
|0.0 B / 0|TaskKilled (killed intentionally)|
|156|11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x 
stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  | 
|0.0 B / 0|TaskKilled (killed intentionally)|

Web UI display after applying the patch,

| Index | ID | Attempt | Status | Locality Level | Executor ID / Host | 
Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle 
Write Size / Records | Errors |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
--- |
|143|10 |0  |KILLED |NODE_LOCAL |6 / x.xx.x.x stdout 
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  |  | 0.0 B / 
0  | TaskKilled (killed intentionally)|
|156|11 |0  |KILLED |NODE_LOCAL |5 / x.xx.x.x stdout 
stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0  |  |0.0 B / 
0   | TaskKilled (killed intentionally)|

Author: Devaraj K 

Closes #16725 from devaraj-kavali/SPARK-19377.

(cherry picked from commit df4a27cc5cae8e251ba2a883bcc5f5ce9282f649)
Signed-off-by: Shixiong Zhu 

commit 7c23bd49e826fc2b7f132ffac2e55a71905abe96
Author: Shixiong Zhu 
Date:   2017-02-02T05:39:21Z

[SPARK-19432][CORE] Fix an unexpected failure when connecting timeout

## What changes were proposed in this pull request?

When connecting timeout, `ask` may fail with a confusing message:

```
17/02/01 23:15:19 INFO Worker: Connecting to master ...
java.lang.IllegalArgumentException: requirement failed: TransportClient has 
not yet been set.
at scala.Predef$.require(Predef.scala:224)
at 

[GitHub] spark pull request #19445: Dataset select all columns

2017-10-06 Thread sohum2002
Github user sohum2002 closed the pull request at:

https://github.com/apache/spark/pull/19445


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

2017-10-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19444#discussion_r143168926
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -638,12 +638,14 @@ private[hive] class HiveClientImpl(
   table: CatalogTable,
   spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition] = 
withHiveState {
 val hiveTable = toHiveTable(table, Some(userName))
-val parts = spec match {
-  case None => shim.getAllPartitions(client, 
hiveTable).map(fromHivePartition)
--- End diff --

After this change, `HiveShim.getAllPartitions` is only used to support 
`HiveShim.getPartitionsByFilter` for hive 0.12, we may consider completely 
remove the method in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19446: Dataset optimization

2017-10-06 Thread sohum2002
GitHub user sohum2002 opened a pull request:

https://github.com/apache/spark/pull/19446

Dataset optimization

The proposed two new additional functions is to help select all the columns 
in a Dataset except for given columns.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohum2002/spark dataset_optimization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19446


commit 0e80ecae300f3e2033419b2d98da8bf092c105bb
Author: Wenchen Fan 
Date:   2017-07-10T05:53:27Z

[SPARK-21100][SQL][FOLLOWUP] cleanup code and add more comments for 
Dataset.summary

## What changes were proposed in this pull request?

Some code cleanup and adding comments to make the code more readable. 
Changed the way to generate result rows, to be more clear.

## How was this patch tested?

existing tests

Author: Wenchen Fan 

Closes #18570 from cloud-fan/summary.

commit 96d58f285bc98d4c2484150eefe7447db4784a86
Author: Eric Vandenberg 
Date:   2017-07-10T06:40:20Z

[SPARK-21219][CORE] Task retry occurs on same executor due to race 
condition with blacklisting

## What changes were proposed in this pull request?

There's a race condition in the current TaskSetManager where a failed task 
is added for retry (addPendingTask), and can asynchronously be assigned to an 
executor *prior* to the blacklist state (updateBlacklistForFailedTask), the 
result is the task might re-execute on the same executor.  This is particularly 
problematic if the executor is shutting down since the retry task immediately 
becomes a lost task (ExecutorLostFailure).  Another side effect is that the 
actual failure reason gets obscured by the retry task which never actually 
executed.  There are sample logs showing the issue in the 
https://issues.apache.org/jira/browse/SPARK-21219

The fix is to change the ordering of the addPendingTask and 
updatingBlackListForFailedTask calls in TaskSetManager.handleFailedTask

## How was this patch tested?

Implemented a unit test that verifies the task is black listed before it is 
added to the pending task.  Ran the unit test without the fix and it fails.  
Ran the unit test with the fix and it passes.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Author: Eric Vandenberg 

Closes #18427 from ericvandenbergfb/blacklistFix.

commit c444d10868c808f4ae43becd5506bf944d9c2e9b
Author: Dongjoon Hyun 
Date:   2017-07-10T06:46:47Z

[MINOR][DOC] Remove obsolete `ec2-scripts.md`

## What changes were proposed in this pull request?

Since this document became obsolete, we had better remove this for Apache 
Spark 2.3.0. The original document is removed via SPARK-12735 on January 2016, 
and currently it's just redirection page. The only reference in Apache Spark 
website will go directly to the destination in 
https://github.com/apache/spark-website/pull/54.

## How was this patch tested?

N/A. This is a removal of documentation.

Author: Dongjoon Hyun 

Closes #18578 from dongjoon-hyun/SPARK-REMOVE-EC2.

commit 647963a26a2d4468ebd9b68111ebe68bee501fde
Author: Takeshi Yamamuro 
Date:   2017-07-10T07:58:34Z

[SPARK-20460][SQL] Make it more consistent to handle column name duplication

## What changes were proposed in this pull request?
This pr made it more consistent to handle column name duplication. In the 
current master, error handling is different when hitting column name 
duplication:
```
// json
scala> val schema = StructType(StructField("a", IntegerType) :: 
StructField("a", IntegerType) :: Nil)
scala> Seq("""{"a":1, 
"a":1}"").toDF().coalesce(1).write.mode("overwrite").text("/tmp/data")
scala> spark.read.format("json").schema(schema).load("/tmp/data").show
org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could 
be: a#12, a#13.;
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:287)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:181)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:153)

scala> spark.read.format("json").load("/tmp/data").show
org.apache.spark.sql.AnalysisException: Duplicate column(s) : "a" found, 
cannot save to JSON format;
  at 
org.apache.spark.sql.execution.datasources.json.JsonDataSource.checkConstraints(JsonDataSource.scala:81)
  

[GitHub] spark pull request #19446: Dataset optimization

2017-10-06 Thread sohum2002
Github user sohum2002 closed the pull request at:

https://github.com/apache/spark/pull/19446


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18924
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18924
  
**[Test build #82505 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82505/testReport)**
 for PR 18924 at commit 
[`f181496`](https://github.com/apache/spark/commit/f1814965885e0c82a71287f5e5912e11b126b8a4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18924
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82505/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
@jsnowacki, would you mind if I ask squash those commits into single one so 
that we can check if the squashed commit, having the changes in `appveyor.yml` 
and `*.cmd`, actually triggers AppVeyor test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
Otherwise, looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19399: [SPARK-22175][WEB-UI] Add status column to histor...

2017-10-06 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19399#discussion_r143114423
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -487,8 +487,10 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
   protected def mergeApplicationListing(fileStatus: FileStatus): Unit = {
 val eventsFilter: ReplayEventsFilter = { eventString =>
   eventString.startsWith(APPL_START_EVENT_PREFIX) ||
-eventString.startsWith(APPL_END_EVENT_PREFIX) ||
-eventString.startsWith(LOG_START_EVENT_PREFIX)
+  eventString.startsWith(APPL_END_EVENT_PREFIX) ||
+  eventString.startsWith(LOG_START_EVENT_PREFIX) ||
+  eventString.startsWith(JOB_START_EVENT_PREFIX) ||
+  eventString.startsWith(JOB_END_EVENT_PREFIX)
--- End diff --

Actually i have not do any benchmark test for this modify.But it has been 
tested with our production cluster.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19399: [SPARK-22175][WEB-UI] Add status column to history page

2017-10-06 Thread caneGuy
Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/19399
  
Ok i will wait for SPARK-18085 and think about log status more accurately 
@squito @ajbozarth Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19442
  
**[Test build #82495 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82495/testReport)**
 for PR 19442 at commit 
[`77ced95`](https://github.com/apache/spark/commit/77ced957e7be2169ac0c59c76f60ab9d4fcac3ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19442
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19440: [SPARK-21871][SQL] Fix infinite loop when bytecode size ...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19440
  
**[Test build #82494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82494/testReport)**
 for PR 19440 at commit 
[`b8eb6a0`](https://github.com/apache/spark/commit/b8eb6a0e45ceb9592fbbf32a236aa17cd3e5dac0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19082
  
sure, I will look into this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19442
  
**[Test build #82500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82500/testReport)**
 for PR 19442 at commit 
[`de0aa76`](https://github.com/apache/spark/commit/de0aa76199141255258d9d5b12a0d31b1758c6f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19250#discussion_r143122396
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -1015,6 +1020,10 @@ object DateTimeUtils {
 guess
   }
 
+  def convertTz(ts: SQLTimestamp, fromZone: String, toZone: String): 
SQLTimestamp = {
+convertTz(ts, getTimeZone(fromZone), getTimeZone(toZone))
--- End diff --

performance is going to suck here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19250#discussion_r143122317
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1213,6 +1213,71 @@ case class ToUTCTimestamp(left: Expression, right: 
Expression)
 }
 
 /**
+ * This modifies a timestamp to show how the display time changes going 
from one timezone to
+ * another, for the same instant in time.
+ *
+ * We intentionally do not provide an ExpressionDescription as this is not 
meant to be exposed to
+ * users, its only used for internal conversions.
+ */
+private[spark] case class TimestampTimezoneCorrection(
--- End diff --

do we need a whole expression for this? can't we just reuse existing 
expressions? It's just simple arithmetics isn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2017-10-06 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19340
  
I'm kind of neutral given the complexity of adding this, but maybe it's the 
least complexity you can get away with. @hhbyyh was adding something related: 
https://issues.apache.org/jira/browse/SPARK-22195


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18460
  
Thank you for review, @gatorsmile . The following is a result from Hive 
1.2.2.
```sql
hive> CREATE TABLE T AS SELECT named_struct('a',1);
hive> CREATE TABLE S AS SELECT named_struct('A',1);
hive> SELECT * FROM T UNION ALL SELECT * FROM S;
{"a":1}
{"a":1}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19442
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19442
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82503/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18460
  
@gatorsmile . I updated the previous comment with more examples.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19442
  
**[Test build #82503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82503/testReport)**
 for PR 19442 at commit 
[`de0aa76`](https://github.com/apache/spark/commit/de0aa76199141255258d9d5b12a0d31b1758c6f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18924
  
**[Test build #82505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82505/testReport)**
 for PR 18924 at commit 
[`f181496`](https://github.com/apache/spark/commit/f1814965885e0c82a71287f5e5912e11b126b8a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19443
  
**[Test build #82507 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82507/testReport)**
 for PR 19443 at commit 
[`9e52c63`](https://github.com/apache/spark/commit/9e52c6380ae8787d20e3442cfaf42cfb70caf4dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19443
  
**[Test build #82507 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82507/testReport)**
 for PR 19443 at commit 
[`9e52c63`](https://github.com/apache/spark/commit/9e52c6380ae8787d20e3442cfaf42cfb70caf4dc).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19443
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82507/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19443
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #82508 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82508/testReport)**
 for PR 19370 at commit 
[`5f52c79`](https://github.com/apache/spark/commit/5f52c791cda81323ac985ce18796ea4131c30923).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-06 Thread akopich
Github user akopich commented on a diff in the pull request:

https://github.com/apache/spark/pull/18924#discussion_r143159334
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends LDAOptimizer {
 val expElogbetaBc = batch.sparkContext.broadcast(expElogbeta)
 val alpha = this.alpha.asBreeze
 val gammaShape = this.gammaShape
-
-val stats: RDD[(BDM[Double], List[BDV[Double]])] = batch.mapPartitions 
{ docs =>
+val optimizeDocConcentration = this.optimizeDocConcentration
+// If and only if optimizeDocConcentration is set true,
+// we calculate logphat in the same pass as other statistics.
+// No calculation of loghat happens otherwise.
+val logphatPartOptionBase = () => if (optimizeDocConcentration) {
+Some(BDV.zeros[Double](k))
+  } else {
+None
+  }
+
+val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] = 
batch.mapPartitions { docs =>
   val nonEmptyDocs = docs.filter(_._2.numNonzeros > 0)
 
   val stat = BDM.zeros[Double](k, vocabSize)
-  var gammaPart = List[BDV[Double]]()
+  val logphatPartOption = logphatPartOptionBase()
+  var nonEmptyDocCount : Long = 0L
   nonEmptyDocs.foreach { case (_, termCounts: Vector) =>
+nonEmptyDocCount += 1
 val (gammad, sstats, ids) = 
OnlineLDAOptimizer.variationalTopicInference(
   termCounts, expElogbetaBc.value, alpha, gammaShape, k)
-stat(::, ids) := stat(::, ids).toDenseMatrix + sstats
-gammaPart = gammad :: gammaPart
+stat(::, ids) := stat(::, ids) + sstats
+logphatPartOption.foreach(_ += 
LDAUtils.dirichletExpectation(gammad))
   }
-  Iterator((stat, gammaPart))
-}.persist(StorageLevel.MEMORY_AND_DISK)
-val statsSum: BDM[Double] = 
stats.map(_._1).treeAggregate(BDM.zeros[Double](k, vocabSize))(
-  _ += _, _ += _)
-val gammat: BDM[Double] = breeze.linalg.DenseMatrix.vertcat(
-  stats.map(_._2).flatMap(list => 
list).collect().map(_.toDenseMatrix): _*)
-stats.unpersist()
+  Iterator((stat, logphatPartOption, nonEmptyDocCount))
+}
+
+val elementWiseSum = (u : (BDM[Double], Option[BDV[Double]], Long),
+  v : (BDM[Double], Option[BDV[Double]], Long)) => 
{
--- End diff --

I see now. Thank you. But seems like the style guide suggests to move both 
of the parameters to the new line. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread akopich
Github user akopich commented on the issue:

https://github.com/apache/spark/pull/18924
  
So shall we ping @jkbradley, shan't we?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18924
  
**[Test build #82506 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82506/testReport)**
 for PR 18924 at commit 
[`a81dae5`](https://github.com/apache/spark/commit/a81dae574f2085ec390effd1b9b1962970f00239).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in...

2017-10-06 Thread jsnowacki
GitHub user jsnowacki opened a pull request:

https://github.com/apache/spark/pull/19443

[SPARK-22212][SQL][PySpark] Some SQL functions in Python fail with string 
column name

## What changes were proposed in this pull request?

The issue in JIRA: 
[SPARK-22212](https://issues.apache.org/jira/browse/SPARK-22212)

Most of the functions in `pyspark.sql.functions` allow usage of both column 
name string and `Column` object. But there are some functions, like `trim`, 
that require to pass only `Column`. See below code for explanation.

```
>>> import pyspark.sql.functions as func
>>> df = spark.createDataFrame([tuple(l) for l in "abcde"], ["text"])
>>> df.select(func.trim(df["text"])).show()
+--+
|trim(text)|
+--+
| a|
| b|
| c|
| d|
| e|
+--+
>>> df.select(func.trim("text")).show()
[...]
Py4JError: An error occurred while calling 
z:org.apache.spark.sql.functions.trim. Trace:
py4j.Py4JException: Method trim([class java.lang.String]) does not exist
at 
py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at 
py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:274)
at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
```

This is because most of the Python function calls map column name to 
`Column` in the Python function mapping, but functions created via 
`_create_function` pass them as is, if they are not `Column`. On the other 
hand, few functions that require the column name has been moved 
`functions_by_column_name`, and are created by 
`_create_function_by_column_name`.

Note that this is only Python-side fix. Some Scala functions still do not 
have method to call them by string column name.

## How was this patch tested?

Additional Python tests where written to accommodate this. It was tested 
via `UnitTest` in IDE and the overall `python\run_tests` script.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jsnowacki/spark-1 fix_func_str_to_col

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19443


commit c5dbd50361a37e9833708dc8985345fbf537e8d9
Author: Jakub Nowacki 
Date:   2017-10-03T07:50:50Z

[SPARK-22212] Fixing string to column mapping in Python functions

commit 9e52c6380ae8787d20e3442cfaf42cfb70caf4dc
Author: Jakub Nowacki 
Date:   2017-10-06T09:07:26Z

[SPARK-22212] Calling functions by string column name fixed and tested




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread jsnowacki
Github user jsnowacki commented on the issue:

https://github.com/apache/spark/pull/19370
  
I've added `- bin/*.cmd` to the AppVeyor file. Please let me know if this 
is sufficient.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r143313284
  
--- Diff: python/pyspark/sql/group.py ---
@@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None):
 jgd = self._jgd.pivot(pivot_col)
 else:
 jgd = self._jgd.pivot(pivot_col, values)
-return GroupedData(jgd, self.sql_ctx)
+return GroupedData(jgd, self._df)
+
+@since(2.3)
+def apply(self, udf):
+"""
+Maps each group of the current :class:`DataFrame` using a pandas 
udf and returns the result
+as a :class:`DataFrame`.
+
+The user-defined function should take a `pandas.DataFrame` and 
return another
+`pandas.DataFrame`. For each group, all columns are passed 
together as a `pandas.DataFrame`
+to the user-function and the returned `pandas.DataFrame` are 
combined as a
+:class:`DataFrame`. The returned `pandas.DataFrame` can be 
arbitrary length and its schema
+must match the returnType of the pandas udf.
+
+:param udf: A wrapped udf function returned by 
:meth:`pyspark.sql.functions.pandas_udf`
+
+>>> from pyspark.sql.functions import pandas_udf
+>>> df = spark.createDataFrame(
+... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
+... ("id", "v"))
+>>> @pandas_udf(returnType=df.schema)
+... def normalize(pdf):
+... v = pdf.v
+... return pdf.assign(v=(v - v.mean()) / v.std())
+>>> df.groupby('id').apply(normalize).show()  # doctest: +SKIP
++---+---+
+| id|  v|
++---+---+
+|  1|-0.7071067811865475|
+|  1| 0.7071067811865475|
+|  2|-0.8320502943378437|
+|  2|-0.2773500981126146|
+|  2| 1.1094003924504583|
++---+---+
+
+.. seealso:: :meth:`pyspark.sql.functions.pandas_udf`
+
+"""
+from pyspark.sql.functions import pandas_udf
+
+# Columns are special because hasattr always return True
+if isinstance(udf, Column) or not hasattr(udf, 'func') or not 
udf.vectorized:
+raise ValueError("The argument to apply must be a pandas_udf")
+if not isinstance(udf.returnType, StructType):
+raise ValueError("The returnType of the pandas_udf must be a 
StructType")
+
+df = self._df
+func = udf.func
+returnType = udf.returnType
--- End diff --

is it necessary to make all these copies?  I could understand maybe copying 
`func` and `columns` because they are in the wrapped function, but not sure if 
`df` and `returnType` need to be copied


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
Sure. I'm totally agreed. We need to know the advantages and possible 
impacts if any when merging this PR and #18931,. It is good @kiszk and 
@rednaxelafx can help review this PR and #18931.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18931
  
**[Test build #82531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82531/testReport)**
 for PR 18931 at commit 
[`601c225`](https://github.com/apache/spark/commit/601c2251c397b30f2ea9a42f6a23e3636129d5bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82532/testReport)**
 for PR 19394 at commit 
[`56089f5`](https://github.com/apache/spark/commit/56089f5ba65f1d7d9e11b76673bcde3df37cd240).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18732
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82499/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19082
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19082
  
**[Test build #82502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82502/testReport)**
 for PR 19082 at commit 
[`1880dfd`](https://github.com/apache/spark/commit/1880dfdfedbdef11d39cb092202a6bc7db95e374).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82533/testReport)**
 for PR 19452 at commit 
[`8c2a39f`](https://github.com/apache/spark/commit/8c2a39fcb3e425a91d25505ae9d29ba8ac670e0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class JoinConditionSplitPredicates(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19443
  
This might look okay within Python side because the fix looks minimised and 
does not actually increase complexity much; however, I think we focus on API 
consistency between other languages in general. In this sense, I think we tend 
to avoid adding those with string parameters in Scala side, please see 
https://github.com/apache/spark/pull/18144#issuecomment-304960488, 
https://github.com/apache/spark/pull/18144#issuecomment-304926567 and 
https://github.com/apache/spark/pull/18144#issuecomment-304955155. I am -0 on 
this because the workaround is simple anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19294
  
**[Test build #82504 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82504/testReport)**
 for PR 19294 at commit 
[`e41abc6`](https://github.com/apache/spark/commit/e41abc65c3ffeaec8c03c0d093a5c5efcd30c17e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19294
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19294
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82504/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19294
  
@mridulm sql-related tests were removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/18664
  
Thanks @gatorsmile for the constructive feedback!

I don't want to make this more complicated but I also want to make sure we 
are aware that there is also difference between Arrow/non-Arrow version when 
treating array and sstruct type:

Array:
```
non-Arrow:
In [47]: type(df2.toPandas().array[0])
Out[47]: list

Arrow:
In [45]: type(df2.toPandas().array[0])
Out[45]: numpy.ndarray
```

Struct:
```
Arrow:
In [35]: type(df.toPandas().struct[0])
Out[35]: pyspark.sql.types.Row

non-Arrow:
In [37]: type(df.toPandas().struct[0])
Out[37]: dict
```

I think there should be a high level doc capturing all differences between 
Arrow/non-Arrow version. 

Unfortunately I cannot commit much time until Nov but I am happy for help 
with review and discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/18664
  
cc @ueshin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4