[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72199/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72199/testReport)**
 for PR 16753 at commit 
[`6b2841a`](https://github.com/apache/spark/commit/6b2841a183825aea1d37287b8530bcb37cdee2c5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-01-31 Thread salilsurendran
Github user salilsurendran commented on the issue:

https://github.com/apache/spark/pull/16664
  
@yhuai @marmbrus  @liancheng Can someone review my PR please. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16757: [SPARK-18609][SQL] Fix redundant Alias removal in...

2017-01-31 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/16757

[SPARK-18609][SQL] Fix redundant Alias removal in the optimizer

## What changes were proposed in this pull request?
The optimizer tries to remove redundant alias only projections from the 
query plan using the `RemoveAliasOnlyProject` rule. The current rule identifies 
removes such a project and rewrites the project's attributes in the **entire** 
tree. This causes problems when parts of the tree are duplicated (for instance 
a self join on a temporary view/CTE)  and the duplicated part contains the 
alias only project, in this case the rewrite will break the tree. 

[Solution] TODO

It was difficult to control both the blacklisted attributes, the 
transformation of the tree, and the to keep the rewrite local to a node's 
parents.  I have made a few changes to `TreeNode`, `QueryPlan` and 
`LogicalPlan` to open up the transformation logic which allows us to have (the 
needed) more fine grained control over tree transformations.

## How was this patch tested?
Added a test to `RemoveRedundantAliasAndProjectSuite` and existing tests. I 
will add some more integration tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-18609

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16757.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16757


commit 6c89a15ed8eb868b23237bba07498fb2053f4643
Author: Herman van Hovell 
Date:   2017-01-30T12:11:46Z

Open-up TreeNode's transform logic.

commit dac7ec99075ce98ebea92e108ad66b05537de396
Author: Herman van Hovell 
Date:   2017-01-31T16:03:57Z

Split RemoveAliasOnlyProject into RemoveRedundantAliases and 
RemoveRedundantProject.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...

2017-01-31 Thread klinvill
Github user klinvill commented on a diff in the pull request:

https://github.com/apache/spark/pull/16746#discussion_r98710451
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql.Types
--- End diff --

Thanks! Fixed in latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72200/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72200/testReport)**
 for PR 16043 at commit 
[`1da58aa`](https://github.com/apache/spark/commit/1da58aa799eac582d2ec2d7980fa3c27b6de8180).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9759: [SPARK-11753][SQL][test-hadoop2.2] Make allowNonNumericNu...

2017-01-31 Thread limansky
Github user limansky commented on the issue:

https://github.com/apache/spark/pull/9759
  
Hi all. There are security issues in jackson-dataformat-xml prior to 2.7.4 
and 2.8.0. Here are the links: FasterXML/jackson-dataformat-xml#199, 
FasterXML/jackson-dataformat-xml#190. Even though Spark itself doesn't use this 
module, this dependency forces Spark users to use affected version, to have 
consistent set of jackson libraries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16756: [SPARK-19411][SQL] Remove the metadata used to mark opti...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16756
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16746: [SPARK-15648][SQL] Add teradataDialect for JDBC c...

2017-01-31 Thread klinvill
Github user klinvill commented on a diff in the pull request:

https://github.com/apache/spark/pull/16746#discussion_r98706364
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.sql.Types
+import org.apache.spark.sql.types._
+
+
+private case object TeradataDialect extends JdbcDialect {
+
+  override def canHandle(url: String): Boolean = { 
url.startsWith("jdbc:teradata") }
+
+  override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
+case StringType => Some(JdbcType("VARCHAR(255)", 
java.sql.Types.VARCHAR))
+case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
+case _ => None
+  }
--- End diff --

Hi @dongjoon-hyun,
Teradata still doesn't support LIMIT (it uses TOP instead) but the spark 
code that was originally using limit has been changed to use "where 1=0 
instead".

```  
/**
   * Get the SQL query that should be used to find if the given table 
exists. Dialects can
   * override this method to return a query that works best in a particular 
database.
   * @param table  The name of the table.
   * @return The SQL query to use for checking the table.
   */
  def getTableExistsQuery(table: String): String = {
s"SELECT * FROM $table WHERE 1=0"
  }

  /**
   * The SQL query that should be used to discover the schema of a table. 
It only needs to
   * ensure that the result set has the same schema as the table, such as 
by calling
   * "SELECT * ...". Dialects can override this method to return a query 
that works best in a
   * particular database.
   * @param table The name of the table.
   * @return The SQL query to use for discovering the schema.
   */
  @Since("2.1.0")
  def getSchemaQuery(table: String): String = {
s"SELECT * FROM $table WHERE 1=0"
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16756: [SPARK-19411][SQL] Remove the metadata used to mark opti...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16756
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72197/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16756: [SPARK-19411][SQL] Remove the metadata used to mark opti...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16756
  
**[Test build #72197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72197/testReport)**
 for PR 16756 at commit 
[`ec2bbbf`](https://github.com/apache/spark/commit/ec2bbbf55a99f0fa8fba39569b959e17d24b3243).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-31 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r98699067
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1212,8 +1223,9 @@ class DAGScheduler(
 
   clearCacheLocs()
 
-  if (!shuffleStage.isAvailable) {
-// Some tasks had failed; let's resubmit this shuffleStage
+  if (!shuffleStage.isAvailable && noActiveTaskSetManager) {
--- End diff --

You need to update this for mapStageJobs -- the `else` branch will now run 
if the shuffleStage is not available, but there is an active task set manager, 
which we don't want.  Also calling `submitWaitingChildStages(shuffleStage)` is 
confusing (though it seems to be correct).

(or use the other version I suggested)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-31 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r98703486
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala ---
@@ -68,6 +68,12 @@ private[scheduler] abstract class Stage(
   /** Set of jobs that this stage belongs to. */
   val jobIds = new HashSet[Int]
 
+  /**
+   * Partitions which there is not yet a task succeeded on. Note that for 
[[ShuffleMapStage]]
+   * pendingPartitions.size() == 0 doesn't mean the stage is available. 
Because the succeeded
+   * task can be bogus which is out of date and task's epoch is older than 
corresponding
+   * executor's failed epoch in [[DAGScheduler]].
+   */
--- End diff --

How about:

Partitions the DAGScheduler is waiting on before it tries to mark the stage 
/ job as completed and continue.  Most commonly, this is the set of tasks that 
are not successful in the active taskset for this stage, but not always.  In 
particular, when there are multiple attempts for a stage, then this will 
include late task completions from earlier attempts.  Finally, note that when 
this is empty, it does not *necessarily* mean that stage is completed -- we 
have may have lost some of the map output from that stage.  But the 
DAGScheduler will check for this condition and resubmit the stage if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-31 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16620#discussion_r98703683
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala 
---
@@ -648,4 +660,70 @@ class BasicSchedulerIntegrationSuite extends 
SchedulerIntegrationSuite[SingleCor
 }
 assertDataStructuresEmpty(noFailure = false)
   }
+
+  testScheduler("[SPARK-19263] DAGScheduler shouldn't resubmit active 
taskSet.") {
+val a = new MockRDD(sc, 2, Nil)
+val b = shuffle(2, a)
+val shuffleId = b.shuffleDeps.head.shuffleId
+
+def runBackend(): Unit = {
+  val (taskDescription, task) = backend.beginTask()
+  task.stageId match {
+// ShuffleMapTask
+case 0 =>
+  val stageAttempt = task.stageAttemptId
+  val partitionId = task.partitionId
+  (stageAttempt, partitionId) match {
+case (0, 0) =>
+  val fetchFailed = FetchFailed(
+DAGSchedulerSuite.makeBlockManagerId("hostA"), shuffleId, 
0, 0, "ignored")
+  backend.taskFailed(taskDescription, fetchFailed)
+case (0, 1) =>
+  // Wait until stage resubmission caused by FetchFailed is 
finished.
+  waitForCondition(taskScheduler.runningTaskSets.size==2, 5000,
--- End diff --

nit: spaces around `==`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15009
  
**[Test build #72201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72201/consoleFull)**
 for PR 15009 at commit 
[`6a7ba5b`](https://github.com/apache/spark/commit/6a7ba5bfdd2cb165956992907f681ab3ad85154e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16751
  
Thank you for review and merging, @viirya , @srowen , and @rxin !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-01-31 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/16620
  
Hi @jinxing64 
sorry to go back and forth on this numerous times -- I think I have another 
alternative, see https://github.com/squito/spark/tree/SPARK-19263_alternate

Its most of your changes but with one main difference:  when we encounter 
the condition where there are no pending partitions, but there is an active 
taskset -- we just mark that taskset as inactive and continue as before  
https://github.com/squito/spark/commit/bec061c8486a681dc16e8b92e553f79e486924e9.
  I think this makes it easier to follow, as there are fewer states to keep 
track of.  It also can potentially improve performance, since you may submit 
downstream stages more quickly, rather than waiting for all tasks in the active 
taskset to complete.  I also think it fixes a bug in your version with 
mapStageJobs (I'll point it out in the code).

This passes all tests in `o.a.s.scheduler.*`, including your new test case. 
(I did come across a race in `ScheduleIntegrationSuite` which I fixed 
https://github.com/squito/spark/commit/9125e6738269df4e0d7e6292726bad2a294c86c0 
not directly related to these changes).

Do you see any problems w/ that approach?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72200/testReport)**
 for PR 16043 at commit 
[`1da58aa`](https://github.com/apache/spark/commit/1da58aa799eac582d2ec2d7980fa3c27b6de8180).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16043
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72199/testReport)**
 for PR 16753 at commit 
[`6b2841a`](https://github.com/apache/spark/commit/6b2841a183825aea1d37287b8530bcb37cdee2c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16043
  
I am just interested in it :). Yes, this one looks not related  again..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16755: [MESOS] Support constraints in spark-dispatcher

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16755
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72198/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16755: [MESOS] Support constraints in spark-dispatcher

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16755
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16755: [MESOS] Support constraints in spark-dispatcher

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16755
  
**[Test build #72198 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72198/testReport)**
 for PR 16755 at commit 
[`551a593`](https://github.com/apache/spark/commit/551a593949475abcb40414e03d7b01e04c5932f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72194/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72194/testReport)**
 for PR 16753 at commit 
[`cfe258b`](https://github.com/apache/spark/commit/cfe258b283941c8a3a55a111092ce511682fdd1a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72193/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72193/testReport)**
 for PR 16753 at commit 
[`d78a7d0`](https://github.com/apache/spark/commit/d78a7d0de980e3af330b95eeb6a9020dfece2ec9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16755: [MESOS] Support constraints in spark-dispatcher

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16755
  
**[Test build #72198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72198/testReport)**
 for PR 16755 at commit 
[`551a593`](https://github.com/apache/spark/commit/551a593949475abcb40414e03d7b01e04c5932f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16756: [SPARK-19411][SQL] Remove the metadata used to mark opti...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16756
  
**[Test build #72197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72197/testReport)**
 for PR 16756 at commit 
[`ec2bbbf`](https://github.com/apache/spark/commit/ec2bbbf55a99f0fa8fba39569b959e17d24b3243).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-31 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16603
  
@mridulm Ok. Thanks for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16756: [SPARK-19411][SQL] Remove the metadata used to mark opti...

2017-01-31 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16756
  
cc @rxin @liancheng @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15120: [SPARK-4563][core] Allow driver to advertise a di...

2017-01-31 Thread sumitvashistha
Github user sumitvashistha commented on a diff in the pull request:

https://github.com/apache/spark/pull/15120#discussion_r98669126
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala ---
@@ -66,7 +66,7 @@ private[spark] class SparkConfigProvider(conf: 
JMap[String, String]) extends Con
 findEntry(key) match {
   case e: ConfigEntryWithDefault[_] => Option(e.defaultValueString)
   case e: ConfigEntryWithDefaultString[_] => 
Option(e.defaultValueString)
-  case e: FallbackConfigEntry[_] => defaultValueString(e.fallback.key)
+  case e: FallbackConfigEntry[_] => get(e.fallback.key)
--- End diff --

We are facing this issue with Spark 1.6 . Are we going to backport this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16756: [SPARK-19411][SQL] Remove the metadata used to ma...

2017-01-31 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16756

[SPARK-19411][SQL] Remove the metadata used to mark optional columns in 
merged Parquet schema for filter predicate pushdown

## What changes were proposed in this pull request?

There is a metadata introduced before to mark the optional columns in 
merged Parquet schema for filter predicate pushdown. As we upgrade to Parquet 
1.8.2 which includes the fix for the pushdown of optional columns, we don't 
need this metadata now.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 remove-optional-metadata

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16756


commit ec2bbbf55a99f0fa8fba39569b959e17d24b3243
Author: Liang-Chi Hsieh 
Date:   2017-01-31T13:40:20Z

Remove the metadata used to mark optional columns for merged Parquet schema.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16755: [MESOS] Support constraints in spark-dispatcher

2017-01-31 Thread philipphoffmann
GitHub user philipphoffmann opened a pull request:

https://github.com/apache/spark/pull/16755

[MESOS] Support constraints in spark-dispatcher

The `MesosClusterScheduler` doesn't handle the `spark.mesos.constraints`
setting (as opposed to `MesosCoarseGrainedSchedulerBackend`).

## What changes were proposed in this pull request?

This commit introduces the necessary changes to handle the offer
constraints.

## How was this patch tested?

unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/philipphoffmann/spark 
fix-dispatcher-constraints

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16755


commit 551a593949475abcb40414e03d7b01e04c5932f3
Author: Philipp Hoffmann 
Date:   2017-01-31T13:42:04Z

[MESOS] Support constraints in spark-dispatcher

The `MesosClusterScheduler` doesn't handle the `spark.mesos.constraints`
setting (as opposed to `MesosCoarseGrainedSchedulerBackend`).

This commit introduces the necessary changes to handle the offer
constraints.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread eyalfa
Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/16043
  
@HyukjinKwon, @hvanhovell, are you familiar with this build failure? seems 
to be unrelated to my specific build...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16754: [SPARK-19410][DOC] Fix brokens links in ml-pipeline and ...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72196/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16754: [SPARK-19410][DOC] Fix brokens links in ml-pipeline and ...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16754
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16754: [SPARK-19410][DOC] Fix brokens links in ml-pipeline and ...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16754
  
**[Test build #72196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72196/testReport)**
 for PR 16754 at commit 
[`6bbe357`](https://github.com/apache/spark/commit/6bbe357715ffc274988b06131d91a3ca153ab3e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16754: [SPARK-19410][DOC] Fix brokens links in ml-pipeline and ...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16754
  
**[Test build #72196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72196/testReport)**
 for PR 16754 at commit 
[`6bbe357`](https://github.com/apache/spark/commit/6bbe357715ffc274988b06131d91a3ca153ab3e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72195/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72195/testReport)**
 for PR 16043 at commit 
[`1da58aa`](https://github.com/apache/spark/commit/1da58aa799eac582d2ec2d7980fa3c27b6de8180).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16754: [SPARK-19410][DOC] Fix brokens links in ml-pipeli...

2017-01-31 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/16754

[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning

## What changes were proposed in this pull request?
Fix brokens links in ml-pipeline and ml-tuning
``  ->   ``

## How was this patch tested?
manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark doc_api_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16754


commit 6bbe357715ffc274988b06131d91a3ca153ab3e9
Author: Zheng RuiFeng 
Date:   2017-01-31T12:50:19Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16043
  
(Ugh, that -9 again. It is unknown up to my knowledge. I talked about this 
before)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72195/testReport)**
 for PR 16043 at commit 
[`1da58aa`](https://github.com/apache/spark/commit/1da58aa799eac582d2ec2d7980fa3c27b6de8180).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16735#discussion_r98661043
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
 }
   }
 
+  private def tryParseDate(field: String, options: CSVOptions): DataType = 
{
+// This case infers a custom `dateFormat` is set.
+if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
+  DateType
+} else {
+  tryParseTimestamp(field, options)
+}
+  }
+
   private def tryParseTimestamp(field: String, options: CSVOptions): 
DataType = {
-// This case infers a custom `dataFormat` is set.
+// This case infers a custom `timestampFormat` is set.
 if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
   TimestampType
 } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) 
{
-  // We keep this for backwords competibility.
+  // We keep this for backwards compatibility.
   TimestampType
 } else {
   tryParseBoolean(field, options)
--- End diff --

(Maybe, you meant L136)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2017-01-31 Thread eyalfa
Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r98660310
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp {
+  // push down field extraction
+  case GetStructField(createNamedStructLike: CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp {
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+// instead f selecting the field on the entire array,
+// select it from each member of the array.
+// pushing down the operation this way open other optimizations 
opportunities
+// (i.e. struct(...,x,...).x)
+CreateArray(elems.map(GetStructField(_, ordinal, 
Some(field.name
+  // push down item selection.
+  case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
+// instead of creating the array and then selecting one row,
+// remove array creation altgether.
+if (idx >= 0 && idx < elems.size) {
+  // valid index
+  elems(idx)
+} else {
+  // out of bounds, mimic the runtime behavior and return null
+  Cast(Literal(null), ga.dataType)
--- End diff --

yep


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2017-01-31 Thread eyalfa
Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r98660085
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -293,6 +293,12 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // from that. Note that CaseWhen.branches should never be empty, 
and as a result the
 // headOption (rather than head) added above is just an extra (and 
unnecessary) safeguard.
 branches.head._2
+
+  case e @ CaseWhen(branches, _) if branches.exists(_._1 == 
Literal(true)) =>
+// a branc with a TRue condition eliminates all following branches,
+// these branches can be pruned away
+val (h, t) = branches.span(_._1 != Literal(true))
+CaseWhen( h :+ t.head, None)
--- End diff --

sorry, please explain


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread eyalfa
Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/16043
  
@hvanhovell can you figure out what fail the build? seems all tests passed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...

2017-01-31 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16603
  
LGTM, will wait for @vanzin's comments before commiting in case he has any.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72194/testReport)**
 for PR 16753 at commit 
[`cfe258b`](https://github.com/apache/spark/commit/cfe258b283941c8a3a55a111092ce511682fdd1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72193/testReport)**
 for PR 16753 at commit 
[`d78a7d0`](https://github.com/apache/spark/commit/d78a7d0de980e3af330b95eeb6a9020dfece2ec9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72192/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate url and table in JdbcUtil...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72192/testReport)**
 for PR 16753 at commit 
[`9be8f84`](https://github.com/apache/spark/commit/9be8f84756fb7e5d2a4fe31c08603688edaf998c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate arguments in JdbcUtils.sa...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16753
  
@srowen, I see. Let me maybe give a shot to make them consistent to show if 
it look good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16747
  
I am OK but I remember there are some discussions about whether this type 
should be exposed or not and I could not track down the conclusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-01-31 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16747
  
@HyukjinKwon is this OK by you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16751
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72191/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16751
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16751
  
**[Test build #72191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72191/testReport)**
 for PR 16751 at commit 
[`92dc3e5`](https://github.com/apache/spark/commit/92dc3e50f136be088357aa7b477ffd79f138be0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8....

2017-01-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16751


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16751
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16750
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72190/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16750
  
**[Test build #72190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72190/testReport)**
 for PR 16750 at commit 
[`551cff9`](https://github.com/apache/spark/commit/551cff99785927be3ef68c4393dca4dabb3c2ba0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16751
  
LGTM too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate arguments in JdbcUtils.sa...

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16753
  
**[Test build #72192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72192/testReport)**
 for PR 16753 at commit 
[`9be8f84`](https://github.com/apache/spark/commit/9be8f84756fb7e5d2a4fe31c08603688edaf998c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate arguments in JdbcUtils.sa...

2017-01-31 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16753
  
It's true, though I wonder if it's still by design, that these methods take 
url and table as important first-class arguments, and then also other options, 
even though the options also contain the same arguments.

Or, could the other methods like tableExists reasonably also not have to 
take these arguments? Consistency is probably more important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16753: [SPARK-19296][SQL] Deduplicate arguments in JdbcUtils.sa...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16753
  
Hi @gatorsmile, could you take a look for this one please? (It might not 
need a JIRA but it happened to be opened by someone).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16753: [SPARK-19296][SQL] Deduplicate arguments in JdbcU...

2017-01-31 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16753

[SPARK-19296][SQL] Deduplicate arguments in JdbcUtils.saveTable

## What changes were proposed in this pull request?

This PR deduplicates arguments, `url` and `table` in `JdbcUtils`.

```diff
   def saveTable(
   df: DataFrame,
-  url: String,
-  table: String,
   tableSchema: Option[StructType],
   isCaseSensitive: Boolean,
   options: JDBCOptions): Unit = {
+val url = options.url
+val table = options.table
```

This seems only called in `JdbcRelationProvider` where both `url` and 
`table `are originated from `JDBCOptions`.

## How was this patch tested?

Running unit test in `JdbcSuite`/`JDBCWriteSuite`

Building with Scala 2.10 as below:


```
./dev/change-scala-version.sh 2.10
./build/mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-19296

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16753


commit 9be8f84756fb7e5d2a4fe31c08603688edaf998c
Author: hyukjinkwon 
Date:   2017-01-31T08:09:14Z

Deduplicate arguments in JdbcUtils.saveTable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r98624418
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -329,7 +332,17 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* @since 1.4.0
*/
   def json(jsonRDD: RDD[String]): DataFrame = {
-val parsedOptions: JSONOptions = new JSONOptions(extraOptions.toMap)
+val optionsWithTimeZone = {
--- End diff --

Could we just pass the timezone into `JSONOptions` as a default or resemble 
`columnNameOfCorruptRecord`  in`JSONOptions` below?

It seems the same logics here duplicated several times and logics to set 
default values in tests are introduced there which might be not necessary or be 
able to be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r98629735
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -329,7 +332,17 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* @since 1.4.0
*/
   def json(jsonRDD: RDD[String]): DataFrame = {
-val parsedOptions: JSONOptions = new JSONOptions(extraOptions.toMap)
+val optionsWithTimeZone = {
--- End diff --

It seems the same comment also applies to `CSVOptions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r98625766
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -161,12 +163,3 @@ private[csv] class CSVOptions(@transient private val 
parameters: CaseInsensitive
 settings
   }
 }
-
-object CSVOptions {
--- End diff --

Do you mind if I ask the reason to remove this which apparently causing 
fixing many tests in CSV?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r98623217
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -297,7 +300,7 @@ def text(self, paths):
 def csv(self, path, schema=None, sep=None, encoding=None, quote=None, 
escape=None,
 comment=None, header=None, inferSchema=None, 
ignoreLeadingWhiteSpace=None,
 ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, 
positiveInf=None,
-negativeInf=None, dateFormat=None, timestampFormat=None, 
maxColumns=None,
+negativeInf=None, dateFormat=None, timestampFormat=None, 
timeZone=None, maxColumns=None,
--- End diff --

(Hi @ueshin, up to my knowledge, this should be added at the end to prevent 
breaking the existing codes that use those options by positional arguments)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16751
  
The dependency change looks clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16752: Branch 2.0

2017-01-31 Thread kishorbp
Github user kishorbp closed the pull request at:

https://github.com/apache/spark/pull/16752


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16752: Branch 2.0

2017-01-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16752
  
Hi @kishorbp , it seems mistakenly open. Would you please close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16752: Branch 2.0

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16752
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16752: Branch 2.0

2017-01-31 Thread kishorbp
GitHub user kishorbp opened a pull request:

https://github.com/apache/spark/pull/16752

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16752


commit b25a8e6e167717fbe92e6a9b69a8a2510bf926ca
Author: frreiss 
Date:   2016-09-22T09:31:15Z

[SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.

## What changes were proposed in this pull request?

Modified the documentation to clarify that `build/mvn` and `pom.xml` always 
add Java 7-specific parameters to `MAVEN_OPTS`, and that developers can safely 
ignore warnings about `-XX:MaxPermSize` that may result from compiling or 
running tests with Java 8.

## How was this patch tested?

Rebuilt HTML documentation, made sure that building-spark.html displays 
correctly in a browser.

Author: frreiss 

Closes #15005 from frreiss/fred-17421a.

(cherry picked from commit 646f383465c123062cbcce288a127e23984c7c7f)
Signed-off-by: Sean Owen 

commit f14f47f072a392df0ebe908f1c57b6eb858105b7
Author: Shivaram Venkataraman 
Date:   2016-09-22T18:52:42Z

Skip building R vignettes if Spark is not built

## What changes were proposed in this pull request?

When we build the docs separately we don't have the JAR files from the 
Spark build in
the same tree. As the SparkR vignettes need to launch a SparkContext to be 
built, we skip building them if JAR files don't exist

## How was this patch tested?

To test this we can run the following:
```
build/mvn -DskipTests -Psparkr clean
./R/create-docs.sh
```
You should see a line `Skipping R vignettes as Spark JARs not found` at the 
end

Author: Shivaram Venkataraman 

Closes #15200 from shivaram/sparkr-vignette-skip.

(cherry picked from commit 9f24a17c59b1130d97efa7d313c06577f7344338)
Signed-off-by: Reynold Xin 

commit 243bdb11d89ee379acae1ea1ed78df10797e86d1
Author: Burak Yavuz 
Date:   2016-09-22T20:05:41Z

[SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames

Consider you have a bucket as `s3a://some-bucket`
and under it you have files:
```
s3a://some-bucket/file1.parquet
s3a://some-bucket/file2.parquet
```
Getting the parent path of `s3a://some-bucket/file1.parquet` yields
`s3a://some-bucket/` and the ListingFileCatalog uses this as the key in the 
hash map.

When catalog.allFiles is called, we use `s3a://some-bucket` (no slash at 
the end) to get the list of files, and we're left with an empty list!

This PR fixes this by adding a `/` at the end of the `URI` iff the given 
`Path` doesn't have a parent, i.e. is the root. This is a no-op if the path 
already had a `/` at the end, and is handled through the Hadoop Path, path 
merging semantics.

Unit test in `FileCatalogSuite`.

Author: Burak Yavuz 

Closes #15169 from brkyvz/SPARK-17613.

(cherry picked from commit 85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c)
Signed-off-by: Josh Rosen 

commit 47fc0b9f40d814bc8e19f86dad591d4aed467222
Author: Shixiong Zhu 
Date:   2016-09-22T21:26:45Z

[SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process 
is dead

## What changes were proposed in this pull request?

When the Python process is dead, the JVM StreamingContext is still running. 
Hence we will see a lot of Py4jException before the JVM process exits. It's 
better to stop the JVM StreamingContext to avoid those annoying logs.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu 

Closes #15201 from zsxwing/stop-jvm-ssc.

(cherry picked from commit 3cdae0ff2f45643df7bc198cb48623526c7eb1a6)
Signed-off-by: Shixiong Zhu 

commit 0a593db360b3b7771f45f482cf45e8500f0faa76
Author: Herman van Hovell 
Date:   2016-09-22T21:29:27Z


[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2017-01-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16281
  
Hi, all.
Now, I'm trying to upgrade Apache Spark to 1.8.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16751
  
**[Test build #72191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72191/testReport)**
 for PR 16751 at commit 
[`92dc3e5`](https://github.com/apache/spark/commit/92dc3e50f136be088357aa7b477ffd79f138be0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8....

2017-01-31 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/16751

[SPARK-19409][BUILD] Bump parquet version to 1.8.2

## What changes were proposed in this pull request?

Apache Parquet 1.8.2 is released officially last week on 26 Jan.


https://lists.apache.org/thread.html/af0c813f1419899289a336d96ec02b3bbeecaea23aa6ef69f435c142@%3Cdev.parquet.apache.org%3E

This PR only aims to bump Parquet version to 1.8.2. It didn't touch other 
codes.

## How was this patch tested?

Pass the existing tests and also manually by doing 
`./dev/test-dependencies.sh`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-19409

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16751.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16751


commit 92dc3e50f136be088357aa7b477ffd79f138be0e
Author: Dongjoon Hyun 
Date:   2017-01-31T08:41:46Z

[SPARK-19409][BUILD] Bump parquet version to 1.8.2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-31 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98615769
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

Should `PRIMITIVE_TYPES[["timestamp"]]` be changed then
https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L32


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72189/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14412: [SPARK-15355] [CORE] Proactive block replication

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14412
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14412: [SPARK-15355] [CORE] Proactive block replication

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14412
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72187/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13932: [SPARK-15354] [CORE] Topology aware block replication st...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13932
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72188/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13932: [SPARK-15354] [CORE] Topology aware block replication st...

2017-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13932
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16750
  
**[Test build #72190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72190/testReport)**
 for PR 16750 at commit 
[`551cff9`](https://github.com/apache/spark/commit/551cff99785927be3ef68c4393dca4dabb3c2ba0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2017-01-31 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r98613798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp {
+  // push down field extraction
+  case GetStructField(createNamedStructLike: CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp {
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+// instead f selecting the field on the entire array,
+// select it from each member of the array.
+// pushing down the operation this way open other optimizations 
opportunities
+// (i.e. struct(...,x,...).x)
+CreateArray(elems.map(GetStructField(_, ordinal, 
Some(field.name
+  // push down item selection.
+  case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
+// instead of creating the array and then selecting one row,
+// remove array creation altgether.
+if (idx >= 0 && idx < elems.size) {
+  // valid index
+  elems(idx)
+} else {
+  // out of bounds, mimic the runtime behavior and return null
+  Cast(Literal(null), ga.dataType)
--- End diff --

`Literal(null, ga.dataType)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-01-31 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/16750

[SPARK-18937][SQL] Timezone support in CSV/JSON parsing

## What changes were proposed in this pull request?

This is a follow-up pr of #16308.

This pr enables timezone support in CSV/JSON parsing.

We should introduce `timeZone` option for CSV/JSON datasources (the default 
value of the option is session local timezone).

The datasources should use the `timeZone` option to format/parse to 
write/read timestamp values.
Notice that while reading, if the timestampFormat has the timezone info, 
the timezone will not be used because we should respect the timezone in the 
values.

For example, if you have timestamp `"2016-01-01 00:00:00"` in `GMT`, the 
values written with the default timezone option, which is `"GMT"` because 
session local timezone is `"GMT"` here, are:

```scala
scala> spark.conf.set("spark.sql.session.timeZone", "GMT")

scala> val df = Seq(new java.sql.Timestamp(145160640L)).toDF("ts")
df: org.apache.spark.sql.DataFrame = [ts: timestamp]

scala> df.show()
+---+
|ts |
+---+
|2016-01-01 00:00:00|
+---+


scala> df.write.json("/path/to/gmtjson")
```

```sh
$ cat /path/to/gmtjson/part-*
{"ts":"2016-01-01T00:00:00.000Z"}
```

whereas setting the option to `"PST"`, they are:

```scala
scala> df.write.option("timeZone", "PST").json("/path/to/pstjson")
```

```sh
$ cat /path/to/pstjson/part-*
{"ts":"2015-12-31T16:00:00.000-08:00"}
```

We can properly read these files even if the timezone option is wrong 
because the timestamp values have timezone info:

```scala
scala> val schema = new StructType().add("ts", TimestampType)
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(ts,TimestampType,true))

scala> spark.read.schema(schema).json("/path/to/gmtjson").show()
+---+
|ts |
+---+
|2016-01-01 00:00:00|
+---+

scala> spark.read.schema(schema).option("timeZone", 
"PST").json("/path/to/gmtjson").show()
+---+
|ts |
+---+
|2016-01-01 00:00:00|
+---+
```

And even if `timezoneFormat` doesn't contain timezone info, we can properly 
read the values with setting correct timezone option:

```scala
scala> df.write.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("timeZone", "JST").json("/path/to/jstjson")
```

```sh
$ cat /path/to/jstjson/part-*
{"ts":"2016-01-01T09:00:00"}
```

```scala
// wrong result
scala> spark.read.schema(schema).option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").json("/path/to/jstjson").show()
+---+
|ts |
+---+
|2016-01-01 09:00:00|
+---+

// correct result
scala> spark.read.schema(schema).option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("timeZone", 
"JST").json("/path/to/jstjson").show()
+---+
|ts |
+---+
|2016-01-01 00:00:00|
+---+
```

This pr also makes `JsonToStruct` and `StructToJson` 
`TimeZoneAwareExpression` to be able to evaluate values with timezone option.

## How was this patch tested?

Existing tests and added some tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-18937

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16750.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16750


commit aa052f4d11929192b749752f4b73772664d0460c
Author: Takuya UESHIN 
Date:   2017-01-05T09:29:42Z

Add timeZone option to JSONOptions.

commit 890879e24b3f63509a000585e18b288961a4e5cf
Author: Takuya UESHIN 
Date:   2017-01-06T05:11:41Z

Apply timeZone option to JSON datasources.

commit f08b78c16ac444550e7ea0857d0275b9a91b7561
Author: Takuya UESHIN 
Date:   2017-01-06T06:03:34Z

Apply timeZone option to CSV datasources.

commit 551cff99785927be3ef68c4393dca4dabb3c2ba0
Author: Takuya UESHIN 
Date:   2017-01-06T08:39:26Z

Modify python files.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please

<    1   2   3   4