date:20170614

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18310
  
**[Test build #78087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78087/testReport)**
 for PR 18310 at commit 
[`7040877`](https://github.com/apache/spark/commit/704087798dd4e451fb3bb3caab0cdadd72ae19e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18301
  
Because `SQLMetric` just stores long value. I was using a trick to multiply 
the avg probe by 1000 to get a long.

When preparing the values for UI, dividing the long with 1000 to get a 
float back.

So it's a workaround for long-based `SQLMetric`. But I finally don't use it.

Doesn't it sound too hacky for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18308: [SPARK-21099][Spark Core] INFO Log Message Using ...

2017-06-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18308#discussion_r122129884
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -432,8 +432,10 @@ private[spark] class ExecutorAllocationManager(
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
 newExecutorTotal -= 1
+val hasCachedBlocks = 
SparkEnv.get.blockManager.master.hasCachedBlocks(executorId);
--- End diff --

Btw, there could be a chance when querying executor from BlockManager, the 
executor/block manager was already removed, so we will potentially get `false`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-14 Thread rxin

GitHub user rxin reopened a pull request:

https://github.com/apache/spark/pull/18310

[SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan

## What changes were proposed in this pull request?
QueryPlanConstraints should be part of LogicalPlan, rather than QueryPlan, 
since the constraint framework is only used for query plan rewriting and not 
for physical planning.

## How was this patch tested?
Should be covered by existing tests, since it is a simple refactoring.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-21103

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18310


commit 90cff62eaffbbc7859c4c05bd53e58149bff499a
Author: Reynold Xin 
Date:   2017-06-14T07:19:13Z

[SPARK-21092][SQL] Wire SQLConf in logical plan and expressions

commit b03af7ffe0650811aeaa376b639ac16a17e8c4a8
Author: Reynold Xin 
Date:   2017-06-14T07:22:56Z

More docs

commit ea091643ae48183b2a9294a6bc37ee3b991c9226
Author: Reynold Xin 
Date:   2017-06-14T21:35:25Z

Merge with master

commit 14f2b41642cb9adfc04a5aa95a0d0fc231205598
Author: Reynold Xin 
Date:   2017-06-14T23:25:19Z

Update PruneFilter rule.

commit a032106b8c98bd8a0e823d6576114dbdcb6da032
Author: Reynold Xin 
Date:   2017-06-14T23:28:30Z

Properly unset configs.

commit cec78b5cf1ced8322c8cd8e599a3197c50ed49c0
Author: Reynold Xin 
Date:   2017-06-15T01:50:07Z

Update OuterJoinEliminationSuite

commit 40de35c3ac6bdbf2bd2c43b9deca04a5cbdbc4ef
Author: Reynold Xin 
Date:   2017-06-15T02:08:56Z

[SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18301
  
also the avg probe probably shouldn't be an integer. at least we should 
show something like 1.9?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12147: [SPARK-14361][SQL]Window function exclude clause

2017-06-14 Thread xwu0226

Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/12147
  
Since there are a lot of changes by other PRs over the last year, many 
changes in this PR may not be applicable and I am doing the rebase and 
necessary rework on this PR. Will update soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18301
  
yes but i just feel it is getting very long and verbose ..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18308: [SPARK-21099][Spark Core] INFO Log Message Using ...

2017-06-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18308#discussion_r122129357
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -432,8 +432,10 @@ private[spark] class ExecutorAllocationManager(
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
 newExecutorTotal -= 1
+val hasCachedBlocks = 
SparkEnv.get.blockManager.master.hasCachedBlocks(executorId);
--- End diff --

And final semicolon ";" is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18308: [SPARK-21099][Spark Core] INFO Log Message Using ...

2017-06-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18308#discussion_r122129269
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -432,8 +432,10 @@ private[spark] class ExecutorAllocationManager(
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
 newExecutorTotal -= 1
+val hasCachedBlocks = 
SparkEnv.get.blockManager.master.hasCachedBlocks(executorId);
--- End diff --

This variable `executorId` is not defined, should be change to 
`removedExecutorId`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18308: [SPARK-21099][Spark Core] INFO Log Message Using Incorre...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18308
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78086/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18308: [SPARK-21099][Spark Core] INFO Log Message Using Incorre...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18308
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18308: [SPARK-21099][Spark Core] INFO Log Message Using Incorre...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18308
  
**[Test build #78086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78086/testReport)**
 for PR 18308 at commit 
[`00a42e7`](https://github.com/apache/spark/commit/00a42e7c3d08ba01265db8c7329f0ba2148ce41a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18301
  
So just get the global average of all avg hash probe metrics of all tasks? 
If there's skew, won't we like to see min, med, max?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18308: [SPARK-21099][Spark Core] INFO Log Message Using Incorre...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18308
  
**[Test build #78086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78086/testReport)**
 for PR 18308 at commit 
[`00a42e7`](https://github.com/apache/spark/commit/00a42e7c3d08ba01265db8c7329f0ba2148ce41a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18313
  
**[Test build #78085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78085/testReport)**
 for PR 18313 at commit 
[`0fc43e1`](https://github.com/apache/spark/commit/0fc43e19c29f67c847749fb7ea0cf21ac47eb69f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18301
  
I'd shorten it to "avg hash probe". Also do we really need min, med, max? 
Maybe just a single global avg?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122128307
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
   private def updateIndex(key: Long, address: Long): Unit = {
 var pos = firstSlot(key)
 assert(numKeys < array.length / 2)
+numKeyLookups += 1
--- End diff --

Ain't you on a beach somewhere?!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...

2017-06-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17758#discussion_r122128317
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql ---
@@ -0,0 +1,9 @@
+-- Check name duplication in a regular case
+CREATE TABLE t (c STRING, c INT) USING parquet;
+
+-- Check multiple name duplication
+CREATE TABLE t (c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet;
+
+-- Catch case-insensitive name duplication
+SET spark.sql.caseSensitive=false;
+CREATE TABLE t (ab STRING, cd INT, ef DOUBLE, Ab INT) USING parquet;
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...

2017-06-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17758#discussion_r122128225
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -466,13 +467,15 @@ case class AlterTableRenamePartitionCommand(
   oldPartition,
   table.partitionColumnNames,
   table.identifier.quotedString,
-  sparkSession.sessionState.conf.resolver)
+  sparkSession.sessionState.conf.resolver,
+  sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
 val normalizedNewPartition = PartitioningUtils.normalizePartitionSpec(
   newPartition,
   table.partitionColumnNames,
   table.identifier.quotedString,
-  sparkSession.sessionState.conf.resolver)
+  sparkSession.sessionState.conf.resolver,
+  sparkSession.sessionState.conf.caseSensitiveAnalysis)
--- End diff --

ok, I'll fix to do so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...

2017-06-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17758#discussion_r122128029
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -741,14 +742,7 @@ object JdbcUtils extends Logging {
 val nameEquality = df.sparkSession.sessionState.conf.resolver
 
 // checks duplicate columns in the user specified column types.
-userSchema.fieldNames.foreach { col =>
-  val duplicatesCols = userSchema.fieldNames.filter(nameEquality(_, 
col))
-  if (duplicatesCols.size >= 2) {
-throw new AnalysisException(
-  "Found duplicate column(s) in createTableColumnTypes option 
value: " +
-duplicatesCols.mkString(", "))
-  }
-}
+SchemaUtils.checkSchemaColumnNameDuplication(userSchema, 
"createTableColumnTypes option value")
--- End diff --

oh, sorry, my bad. I'll fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-06-14 Thread hhbyyh

GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/18313

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all 
models after fitting: Scala

## What changes were proposed in this pull request?

Allow `CrossValidatorModel` and `TrainValidationSplitModel` preserve the 
full list of fitted models.

add a new string param `modelPath`, If set, all the models fitted during 
the training will be preserved under the specific directory path. By default 
the models will not be saved.

Save the models during the training to avoid expensive memory consumption 
for caching the models.

Sample for cross validation models:
file name pattern: paramMap-split#-metric

![image](https://user-images.githubusercontent.com/7981698/27168014-9418186c-5158-11e7-8c96-43a01df0b8b9.png)

Sample for train validation split:
file name pattern: paramMap-metric

![image](https://user-images.githubusercontent.com/7981698/27168063-cfbff6a0-5158-11e7-8e31-b6cb192a861c.png)


## How was this patch tested?

new unit tests and local test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark saveModels

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18313


commit f6dfc6624bee4dd3ab4048510b30517047fb979a
Author: Yuhao Yang 
Date:   2017-06-15T05:53:13Z

save all models

commit 4c51912e0166a7b105533687b04a991e0e35257f
Author: Yuhao Yang 
Date:   2017-06-15T06:06:34Z

precision rounding

commit 0fc43e19c29f67c847749fb7ea0cf21ac47eb69f
Author: Yuhao Yang 
Date:   2017-06-15T06:12:35Z

comment update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18308: [SPARK-21099][Spark Core] INFO Log Message Using Incorre...

2017-06-14 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18308
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122127953
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
   private def updateIndex(key: Long, address: Long): Unit = {
 var pos = firstSlot(key)
 assert(numKeys < array.length / 2)
+numKeyLookups += 1
--- End diff --

Yeah. OK. I think you're right. We should also care about the collision 
when searching keys in join operator. I'll update this in next commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...

2017-06-14 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17758#discussion_r122127202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -466,13 +467,15 @@ case class AlterTableRenamePartitionCommand(
   oldPartition,
   table.partitionColumnNames,
   table.identifier.quotedString,
-  sparkSession.sessionState.conf.resolver)
+  sparkSession.sessionState.conf.resolver,
+  sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
 val normalizedNewPartition = PartitioningUtils.normalizePartitionSpec(
   newPartition,
   table.partitionColumnNames,
   table.identifier.quotedString,
-  sparkSession.sessionState.conf.resolver)
+  sparkSession.sessionState.conf.resolver,
+  sparkSession.sessionState.conf.caseSensitiveAnalysis)
--- End diff --

seems to me `sparkSession.sessionState.conf.caseSensitiveAnalysis` and 
`sparkSession.sessionState.conf.resolver` are kind of redundant, can we just 
use `resolver` to detect duplication?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...

2017-06-14 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17758#discussion_r122126852
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql ---
@@ -0,0 +1,9 @@
+-- Check name duplication in a regular case
+CREATE TABLE t (c STRING, c INT) USING parquet;
+
+-- Check multiple name duplication
+CREATE TABLE t (c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet;
+
+-- Catch case-insensitive name duplication
+SET spark.sql.caseSensitive=false;
+CREATE TABLE t (ab STRING, cd INT, ef DOUBLE, Ab INT) USING parquet;
--- End diff --

We only need two tests here, set case sensitive explicitly as true or false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...

2017-06-14 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17758#discussion_r122126261
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -741,14 +742,7 @@ object JdbcUtils extends Logging {
 val nameEquality = df.sparkSession.sessionState.conf.resolver
 
 // checks duplicate columns in the user specified column types.
-userSchema.fieldNames.foreach { col =>
-  val duplicatesCols = userSchema.fieldNames.filter(nameEquality(_, 
col))
-  if (duplicatesCols.size >= 2) {
-throw new AnalysisException(
-  "Found duplicate column(s) in createTableColumnTypes option 
value: " +
-duplicatesCols.mkString(", "))
-  }
-}
+SchemaUtils.checkSchemaColumnNameDuplication(userSchema, 
"createTableColumnTypes option value")
--- End diff --

Why? In `parseUserSpecifiedCreateTableColumnTypes`, apparently there are 
case sensitive checking. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18311: Branch 2.0

2017-06-14 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18311
  
@yhqairqq close this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18312: [SPARK-20980][DOCS] update doc to reflect multiLi...

2017-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18312


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18312: [SPARK-20980][DOCS] update doc to reflect multiLine chan...

2017-06-14 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18312
  
thx, merged to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18312: [SPARK-20980][DOCS] update doc to reflect multiLine chan...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18312
  
**[Test build #78084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78084/testReport)**
 for PR 18312 at commit 
[`a9aeb68`](https://github.com/apache/spark/commit/a9aeb686faf9652c28d3aca775f5964e282c7663).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18312: [SPARK-20980][DOCS] update doc to reflect multiLine chan...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18312
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18312: [SPARK-20980][DOCS] update doc to reflect multiLine chan...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18312
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78084/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18312: [SPARK-20980][DOCS] update doc to reflect multiLine chan...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18312
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18075


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18312: [SPARK-20980][DOCS] update doc to reflect multiLine chan...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18312
  
**[Test build #78084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78084/testReport)**
 for PR 18312 at commit 
[`a9aeb68`](https://github.com/apache/spark/commit/a9aeb686faf9652c28d3aca775f5964e282c7663).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18202: [SPARK-20980] [SQL] Rename `wholeFile` to `multiLine` fo...

2017-06-14 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18202
  
opened #18312


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18075
  
thanks, merging to master! you can address the remaining comments in your 
other PRs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18312: [SPARK-20980][DOCS] update doc to reflect multiLi...

2017-06-14 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/18312

[SPARK-20980][DOCS] update doc to reflect multiLine change

## What changes were proposed in this pull request?

doc only change

## How was this patch tested?

manually

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark sqljsonwholefiledoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18312


commit a9aeb686faf9652c28d3aca775f5964e282c7663
Author: Felix Cheung 
Date:   2017-06-15T05:44:23Z

rename




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122122867
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala 
---
@@ -93,7 +93,7 @@ private[sql] trait ColumnarBatchScan extends 
CodegenSupport {
 }
 
 val nextBatch = ctx.freshName("nextBatch")
--- End diff --

let's keep it as it was


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122122817
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala
 ---
@@ -62,13 +62,63 @@ class GeneratedProjectionSuite extends SparkFunSuite {
 val result = safeProj(unsafe)
 // Can't compare GenericInternalRow with JoinedRow directly
 (0 until N).foreach { i =>
-  val r = i + 1
-  val s = UTF8String.fromString((i + 1).toString)
-  assert(r === result.getInt(i + 2))
+  val s = UTF8String.fromString(i.toString)
+  assert(i === result.getInt(i + 2))
   assert(s === result.getUTF8String(i + 2 + N))
-  assert(r === result.getStruct(0, N * 2).getInt(i))
+  assert(i === result.getStruct(0, N * 2).getInt(i))
   assert(s === result.getStruct(0, N * 2).getUTF8String(i + N))
-  assert(r === result.getStruct(1, N * 2).getInt(i))
+  assert(i === result.getStruct(1, N * 2).getInt(i))
+  assert(s === result.getStruct(1, N * 2).getUTF8String(i + N))
+}
+
+// test generated MutableProjection
+val exprs = nestedSchema.fields.zipWithIndex.map { case (f, i) =>
+  BoundReference(i, f.dataType, true)
+}
+val mutableProj = GenerateMutableProjection.generate(exprs)
+val row1 = mutableProj(result)
+assert(result === row1)
+val row2 = mutableProj(result)
+assert(result === row2)
+  }
+
+  test("SPARK-18016: generated projections on wider table requiring 
class-splitting") {
+val N = 4000
+val wideRow1 = new GenericInternalRow((0 until N).toArray[Any])
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122122765
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratedProjectionSuite.scala
 ---
@@ -33,10 +33,10 @@ class GeneratedProjectionSuite extends SparkFunSuite {
 
   test("generated projections on wider table") {
 val N = 1000
-val wideRow1 = new GenericInternalRow((1 to N).toArray[Any])
+val wideRow1 = new GenericInternalRow((0 until N).toArray[Any])
--- End diff --

nit: can be `new GenericInternalRow(N)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18306: [SPARK-21029][SS] All StreamingQuery should be st...

2017-06-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18306#discussion_r122122675
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryManagerSuite.scala
 ---
@@ -239,6 +237,40 @@ class StreamingQueryManagerSuite extends StreamTest 
with BeforeAndAfter {
 }
   }
 
+  test("stopAllQueries") {
+val datasets = Seq.fill(5)(makeDataset._2)
+withQueriesOn(datasets: _*) { queries =>
+  assert(queries.forall(_.isActive))
+  spark.streams.stopAllQueries()
+  assert(queries.forall(_.isActive == false), "Queries are still 
running")
+}
+  }
+
+  test("stop session stops all queries") {
+val inputData = MemoryStream[Int]
+val mapped = inputData.toDS.map(6 / _)
+var query: StreamingQuery = null
+try {
+  query = mapped.toDF.writeStream
+.format("memory")
+.queryName(s"queryInNewSession")
+.outputMode("append")
+.start()
+  assert(query.isActive)
+  spark.stop()
+  assert(spark.sparkContext.isStopped)
+  assert(query.isActive == false, "Query is still running")
+} catch {
+  case NonFatal(e) =>
+if (query != null) query.stop()
+throw e
--- End diff --

why still try/catch to stop the query? since this is a test of this 
specific behavior, if the query isn't stopped, or throws, the test actually is 
failing?

more importantly, why ignore NonFatal exception thrown?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18046: [SPARK-20749][SQL] Built-in SQL Function Support - all v...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18046
  
**[Test build #78083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78083/testReport)**
 for PR 18046 at commit 
[`1e8cbcd`](https://github.com/apache/spark/commit/1e8cbcd26fa8905d1db3e41d8695a34b706d5dca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122122445
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -233,10 +222,118 @@ class CodegenContext {
   // The collection of sub-expression result resetting methods that need 
to be called on each row.
   val subexprFunctions = mutable.ArrayBuffer.empty[String]
 
-  def declareAddedFunctions(): String = {
-addedFunctions.map { case (funcName, funcCode) => funcCode 
}.mkString("\n")
+  val outerClassName = "OuterClass"
--- End diff --

nit: `private val`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should only ap...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18284
  
**[Test build #78082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78082/testReport)**
 for PR 18284 at commit 
[`9a9d8af`](https://github.com/apache/spark/commit/9a9d8afee0e4ef59bfd4faca66e106851d51822e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18202: [SPARK-20980] [SQL] Rename `wholeFile` to `multiLine` fo...

2017-06-14 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18202
  
oops - found this

https://github.com/apache/spark/blame/ae33abf71b353c638487948b775e966c7127cd46/docs/sql-programming-guide.md#L1001


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should only ap...

2017-06-14 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/18284
  
@cloud-fan thanks for reviewing, code has updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should only ap...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18284
  
**[Test build #78081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78081/testReport)**
 for PR 18284 at commit 
[`92d4a80`](https://github.com/apache/spark/commit/92d4a80929783cca9284b9b74c37fbfc8717dc96).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18202: [SPARK-20980] [SQL] Rename `wholeFile` to `multiLine` fo...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18202
  
ah lucky :)  merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18202: [SPARK-20980] [SQL] Rename `wholeFile` to `multiL...

2017-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18202


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122121031
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
+unionPlan.assertAnalyzed()
+val Seq(left, right) = unionPlan.analyzed.children
+
+// Builds a project list for `other` based on `logicalPlan` output 
names
+val resolver = sparkSession.sessionState.analyzer.resolver
+val rightProjectList = mutable.ArrayBuffer.empty[Attribute]
+val rightOutputAttrs = right.output
+for (lattr <- left.output) {
+  // To handle duplicate names, we first compute diff between 
`rightOutputAttrs` and
+  // already-found attrs in `rightProjectList`.
+  rightOutputAttrs.diff(rightProjectList).find { rattr => 
resolver(lattr.name, rattr.name)}
--- End diff --

In the logic, it seems we cannot catch left column duplication, I think.
How about checking column name duplication first, then build a right 
project list?
```

// Check column name duplication in both sides first
val leftOutputAttrs = left.output
val rightOutputAttrs = right.output
val caseSensitiveAnalysis = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
SchemaUtils.checkColumnNameDuplication(
  leftOutputAttrs.map(_.name), "left column names", 
caseSensitiveAnalysis)
SchemaUtils.checkColumnNameDuplication(
  rightOutputAttrs.map(_.name), "right column names", 
caseSensitiveAnalysis)

// Then, builds a project list for `other` based on `logicalPlan` 
output names
val resolver = sparkSession.sessionState.analyzer.resolver
val rightProjectList = left.output.map { lattr =>
  val foundAttrs = rightOutputAttrs.filter { rattr => 
resolver(lattr.name, rattr.name) }
  assert(foundAttrs.size > 1)
  if (foundAttrs.size == 1) {
foundAttrs.head
  } else if (foundAttrs.size == 0) {
throw new AnalysisException(s"""Cannot resolve column name 
"${lattr.name}" among """ +
  s"""(${rightOutputAttrs.map(_.name).mkString(", ")})""")
  }
}
```

(I used `SchemaUtils` here implemented in #17758 

https://github.com/apache/spark/pull/17758/files#diff-dc9b15e4af298799d788b59d2baf96a9R29)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan a...

2017-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18299


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should ...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18284#discussion_r122120954
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 ---
@@ -146,6 +154,23 @@ class TreeNodeSuite extends SparkFunSuite {
 assert(actual === Dummy(None))
   }
 
+  test("mapChildren should only works on children") {
+val children = Seq((Literal(1), Literal(2)))
+val notChildren = Seq((Literal(3), Literal(4)))
+val before = SeqTupleExpression(children, notChildren)
+val toZero: PartialFunction[Expression, Expression] = { case 
Literal(_, _) => Literal(0) }
+val expect = SeqTupleExpression(Seq((Literal(0), Literal(0))), 
notChildren)
+
+var actual = before transformDown toZero
+assert(actual === expect)
+
+actual = before transformUp toZero
+assert(actual === expect)
+
+actual = before transform toZero
--- End diff --

or we can testing `mapChildren` directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should ...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18284#discussion_r122120916
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 ---
@@ -146,6 +154,23 @@ class TreeNodeSuite extends SparkFunSuite {
 assert(actual === Dummy(None))
   }
 
+  test("mapChildren should only works on children") {
+val children = Seq((Literal(1), Literal(2)))
+val notChildren = Seq((Literal(3), Literal(4)))
+val before = SeqTupleExpression(children, notChildren)
+val toZero: PartialFunction[Expression, Expression] = { case 
Literal(_, _) => Literal(0) }
+val expect = SeqTupleExpression(Seq((Literal(0), Literal(0))), 
notChildren)
+
+var actual = before transformDown toZero
+assert(actual === expect)
+
+actual = before transformUp toZero
+assert(actual === expect)
+
+actual = before transform toZero
--- End diff --

I think testing `transform` is good enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should ...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18284#discussion_r122120851
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 ---
@@ -61,6 +61,14 @@ case class ExpressionInMap(map: Map[String, Expression]) 
extends Expression with
   override lazy val resolved = true
 }
 
+case class SeqTupleExpression(sons: Seq[(Expression, Expression)],
+notsons: Seq[(Expression, Expression)]) extends Expression with 
Unevaluable {
--- End diff --

nit: `nonSons`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18231
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78079/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should ...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18284#discussion_r122120866
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 ---
@@ -146,6 +154,23 @@ class TreeNodeSuite extends SparkFunSuite {
 assert(actual === Dummy(None))
   }
 
+  test("mapChildren should only works on children") {
+val children = Seq((Literal(1), Literal(2)))
+val notChildren = Seq((Literal(3), Literal(4)))
--- End diff --

nit: `nonChildren`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18231
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18299
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should ...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18284#discussion_r122120741
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -340,11 +340,21 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   arg
 }
   case tuple@(arg1: TreeNode[_], arg2: TreeNode[_]) =>
-val newChild1 = f(arg1.asInstanceOf[BaseType])
-val newChild2 = f(arg2.asInstanceOf[BaseType])
+val newChild1 = if (containsChild(arg1)) {
+  f(arg1.asInstanceOf[BaseType])
+} else {
+  arg1
--- End diff --

we can call `arg1.asInstanceOf[BaseType]` here, to avoid [this 
change](https://github.com/apache/spark/pull/18284/files#diff-eac5b02bb450a235fef5e902a2671254R357)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-14 Thread rxin

Github user rxin closed the pull request at:

https://github.com/apache/spark/pull/18310


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should ...

2017-06-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18284#discussion_r122120790
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 ---
@@ -61,6 +61,14 @@ case class ExpressionInMap(map: Map[String, Expression]) 
extends Expression with
   override lazy val resolved = true
 }
 
+case class SeqTupleExpression(sons: Seq[(Expression, Expression)],
--- End diff --

`children` and `nonChildren` may be better names


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #78079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78079/testReport)**
 for PR 18231 at commit 
[`6677bc9`](https://github.com/apache/spark/commit/6677bc9164ca3c04988fab943e0ce0f0bbed5b10).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18310
  
Closing for now, since @sameeragarwal said it might be useful in physical 
planning in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18106
  
**[Test build #78080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78080/testReport)**
 for PR 18106 at commit 
[`b391b6a`](https://github.com/apache/spark/commit/b391b6a3e51229e501982fe184685d7c2e185172).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-14 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18309
  
Can you add a test case? In the test, we can add partitions with different 
paths by ALTER TABLE SET LOCATION command. I think that can reproduce your 
scenario, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18310
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78078/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18310
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18310
  
**[Test build #78078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78078/testReport)**
 for PR 18310 at commit 
[`40de35c`](https://github.com/apache/spark/commit/40de35c3ac6bdbf2bd2c43b9deca04a5cbdbc4ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-14 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18309
  
Please explain the failed scenario in PR description. That's why our test 
cases cannot catch it and we need to test it manually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18309: [SPARK-21079] [SQL] Calculate total size of a par...

2017-06-14 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18309#discussion_r122116932
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -81,6 +83,19 @@ case class AnalyzeTableCommand(
 object AnalyzeTableCommand extends Logging {
 
   def calculateTotalSize(sessionState: SessionState, catalogTable: 
CatalogTable): Long = {
+if (catalogTable.partitionColumnNames.isEmpty) {
+  calculateTotalSize(sessionState, catalogTable.identifier, 
catalogTable.storage.locationUri)
--- End diff --

rename `calculateLocationSize`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122116812
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
+unionPlan.assertAnalyzed()
+val Seq(left, right) = unionPlan.analyzed.children
+
+// Builds a project list for `other` based on `logicalPlan` output 
names
+val resolver = sparkSession.sessionState.analyzer.resolver
+val rightProjectList = mutable.ArrayBuffer.empty[Attribute]
+val rightOutputAttrs = right.output
+for (lattr <- left.output) {
+  // To handle duplicate names, we first compute diff between 
`rightOutputAttrs` and
+  // already-found attrs in `rightProjectList`.
+  rightOutputAttrs.diff(rightProjectList).find { rattr => 
resolver(lattr.name, rattr.name)}
--- End diff --

Inside the map, we can find the column names by using `filter` + 
`resolver`. 

- If the number of found columns is larger than two, throw an error for 
duplicate names. 
- If the number is zero, throw an error. 
- If the number is one, return the right-side attribute. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122116294
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
   private def updateIndex(key: Long, address: Long): Unit = {
 var pos = firstSlot(key)
 assert(numKeys < array.length / 2)
+numKeyLookups += 1
--- End diff --

IMO we should. The number of required probes is the different per key, and 
is also dependent on the order in which the map was constructed. If you combine 
this with some skew and missing keys, the number of probes can be much higher 
than expected.

You could even argue that we do not really care about the number of probes 
when building the map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18290: [SPARK-20989][Core] Fail to start multiple workers on on...

2017-06-14 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18290
  
@jerryshao Thank you for your suggestion, since this PR is aimed to resolve 
the conflict of spark configuration, I think the user behavior you described is 
out of scope here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122116160
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
+unionPlan.assertAnalyzed()
+val Seq(left, right) = unionPlan.analyzed.children
+
+// Builds a project list for `other` based on `logicalPlan` output 
names
+val resolver = sparkSession.sessionState.analyzer.resolver
+val rightProjectList = mutable.ArrayBuffer.empty[Attribute]
+val rightOutputAttrs = right.output
+for (lattr <- left.output) {
--- End diff --

Aha, ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18299
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18299
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78077/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122116053
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
+unionPlan.assertAnalyzed()
+val Seq(left, right) = unionPlan.analyzed.children
+
+// Builds a project list for `other` based on `logicalPlan` output 
names
+val resolver = sparkSession.sessionState.analyzer.resolver
+val rightProjectList = mutable.ArrayBuffer.empty[Attribute]
+val rightOutputAttrs = right.output
+for (lattr <- left.output) {
--- End diff --

Since `left` and `right` always have the same number of columns (after 
L1796 `assertAnalyzed()`), we do not need to add `ArrayBuffer` if using `map` 
to build the `Project` of right. For example, 
```Scala
  left.map { later =>
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18299
  
**[Test build #78077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78077/testReport)**
 for PR 18299 at commit 
[`cec78b5`](https://github.com/apache/spark/commit/cec78b5cf1ced8322c8cd8e599a3197c50ed49c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122114832
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
   private def updateIndex(key: Long, address: Long): Unit = {
 var pos = firstSlot(key)
 assert(numKeys < array.length / 2)
+numKeyLookups += 1
--- End diff --

Should we? It seems to me that we should only care about the hash collision 
happened when inserting the data into the hash map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122114390
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -573,8 +586,11 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
   private def updateIndex(key: Long, address: Long): Unit = {
 var pos = firstSlot(key)
 assert(numKeys < array.length / 2)
+numKeyLookups += 1
--- End diff --

you should also add this code to the het and the getValue methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18301
  
The screenshot of ShuffledHashJoin:

https://user-images.githubusercontent.com/68855/27163988-86e82492-51bd-11e7-866e-199467c35817.png";>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18301
  
The screenshot of HashAggregate:

https://user-images.githubusercontent.com/68855/27163956-5a0b5e6c-51bd-11e7-9159-f4962ac1bc36.png";>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18301
  
The screenshot of BroadcastHashJoin:

https://user-images.githubusercontent.com/68855/27163920-324127ea-51bd-11e7-8bee-924a20e161b8.png";>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move Hadoop delegation token c...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move Hadoop delegation token c...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78072/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move Hadoop delegation token c...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #78072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78072/testReport)**
 for PR 17723 at commit 
[`c4149dd`](https://github.com/apache/spark/commit/c4149ddb940c32285fde8b1c08a5b212fced5d66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should only ap...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18284
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should only ap...

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18284
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78075/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18284: [SPARK-21072][SQL] `TreeNode.mapChildren` should only ap...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18284
  
**[Test build #78075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78075/testReport)**
 for PR 18284 at commit 
[`c3622bb`](https://github.com/apache/spark/commit/c3622bbe3293c81dd3dc433e01af4107d5208a6d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18300
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78076/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18300
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18300
  
**[Test build #78076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78076/testReport)**
 for PR 18300 at commit 
[`5b41430`](https://github.com/apache/spark/commit/5b414305be32d85e210c9e0d8e2f0efdf84672cc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18306: [SPARK-21029][SS] All StreamingQuery should be st...

2017-06-14 Thread aray

Github user aray commented on a diff in the pull request:

https://github.com/apache/spark/pull/18306#discussion_r122112073
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -690,6 +690,7 @@ class SparkSession private(
* @since 2.0.0
*/
   def stop(): Unit = {
+streams.stopAllQueries()
--- End diff --

I did look into that method. It does not work because the [application end 
event is generated]( 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1897)
 after the [SparkContext stopped flag is set]( 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1888)
 which prevents the queries from stopping properly. We could add a new event 
type that gets called before stopped flag is set in the close method. Do you 
think that would be an appropriate solution?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18311: Branch 2.0

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18311
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18300
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78073/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18300
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18300
  
**[Test build #78073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78073/testReport)**
 for PR 18300 at commit 
[`b0fd2ac`](https://github.com/apache/spark/commit/b0fd2ac9dfacbc93df40a6dc1a71dce4f607f13d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18311: Branch 2.0

2017-06-14 Thread yhqairqq

GitHub user yhqairqq opened a pull request:

https://github.com/apache/spark/pull/18311

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18311


commit e355ca8e828629455228b6a346d64638ab639cfa
Author: Christian Kadner 
Date:   2016-10-06T21:28:49Z

[SPARK-17803][TESTS] Upgrade docker-client dependency

[SPARK-17803: Docker integration tests don't run with "Docker for 
Mac"](https://issues.apache.org/jira/browse/SPARK-17803)

## What changes were proposed in this pull request?

This PR upgrades the 
[docker-client](https://mvnrepository.com/artifact/com.spotify/docker-client) 
dependency from 
[3.6.6](https://mvnrepository.com/artifact/com.spotify/docker-client/3.6.6) to 
[5.0.2](https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2) to 
enable _Docker for Mac_ users to run the `docker-integration-tests` out of the 
box.

The very latest docker-client version is 
[6.0.0](https://mvnrepository.com/artifact/com.spotify/docker-client/6.0.0) but 
that has one additional dependency and no usage yet.

## How was this patch tested?

The code change was tested on Mac OS X Yosemite with both _Docker Toolbox_ 
as well as _Docker for Mac_ and on Linux Ubuntu 14.04.

```
$ build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
-Phive-thriftserver -DskipTests clean package

$ build/mvn -Pdocker-integration-tests -Pscala-2.11 -pl 
:spark-docker-integration-tests_2.11 clean compile test
```

Author: Christian Kadner 

Closes #15378 from ckadner/SPARK-17803_Docker_for_Mac.

(cherry picked from commit 49d11d49983fbe270f4df4fb1e34b5fbe854c5ec)
Signed-off-by: Josh Rosen 

commit b1a9c41e8c41c90dd15ee6f635356dd1a5bbf395
Author: Dongjoon Hyun 
Date:   2016-10-06T23:09:45Z

[SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic

## What changes were proposed in this pull request?

Currently, Spark raises `RuntimeException` when creating a view with 
timestamp with INTERVAL arithmetic like the following. The root cause is the 
arithmetic expression, `TimeAdd`, was transformed into `timeadd` function as a 
VIEW definition. This PR fixes the SQL definition of `TimeAdd` and `TimeSub` 
expressions.

```scala
scala> sql("CREATE TABLE dates (ts TIMESTAMP)")

scala> sql("CREATE VIEW view1 AS SELECT ts + INTERVAL 1 DAY FROM dates")
java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ...
```

## How was this patch tested?

Pass Jenkins with a new testcase.

Author: Dongjoon Hyun 

Closes #15383 from dongjoon-hyun/SPARK-17750-BACK.

commit 594a2cf6f7c74c54127b8c3947aadbe0052b404c
Author: sethah 
Date:   2016-10-07T04:10:17Z

[SPARK-17792][ML] L-BFGS solver for linear regression does not accept 
general numeric label column types

## What changes were proposed in this pull request?

Before, we computed `instances` in LinearRegression in two spots, even 
though they did the same thing. One of them did not cast the label column to 
`DoubleType`. This patch consolidates the computation and always casts the 
label column to `DoubleType`.

## How was this patch tested?

Added a unit test to check all solvers. This test failed before this patch.

Author: sethah 

Closes #15364 from sethah/linreg_numeric_type.

(cherry picked from commit 3713bb199142c5e06e2e527c99650f02f41f47b1)
Signed-off-by: Yanbo Liang 

commit 380b099fcfe6f70b978300ea208faf630855471a
Author: Dongjoon Hyun 
Date:   2016-10-07T05:27:20Z

[SPARK-17612][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax

## What changes were proposed in this pull request?

This is a backport of SPARK-17612. This implements `DESCRIBE table 
PARTITION` SQL Syntax again. It was supported until Spark 1.6.2, but was 
dropped since 2.0.0.

**Spark 1.6.2**
```scala
scala> sql("CREATE TABLE partitioned_table (a STRING, b INT) PARTITIONED BY 
(c STRING, d STRING)")
res1: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE part

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #78079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78079/testReport)**
 for PR 18231 at commit 
[`6677bc9`](https://github.com/apache/spark/commit/6677bc9164ca3c04988fab943e0ce0f0bbed5b10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 442 matches

Mail list logo