date:20160609

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13531#discussion_r66538048
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
 ---
@@ -340,6 +340,40 @@ class FileSourceStrategySuite extends QueryTest with 
SharedSQLContext with Predi
 }
   }
 
+  test("SPARK-15654 do not split non-splittable files") {
--- End diff --

Should we also test the bin-packing the gzipped file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10412: [SPARK-12447][YARN] Only update the states when executor...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10412
  
**[Test build #60250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60250/consoleFull)**
 for PR 10412 at commit 
[`fba6e5c`](https://github.com/apache/spark/commit/fba6e5c3cb2e03dc0a552cacecd6c986a10b75ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13371
  
Is this a bug fix or performance fix? Sorry I don't really understand after 
reading your description.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13581
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60245/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13581
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66537635
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala 
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.streaming.ContinuousQuery
+
+/**
+ * :: Experimental ::
+ * A writer to consume data generated by a [[ContinuousQuery]]. Each 
partition will use a new
+ * deserialized instance, so you usually should do the initialization work 
in the `open` method.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+abstract class ForeachWriter[T] extends Serializable {
+
+  /**
+   * Called when starting to process one partition of new data in the 
executor side. `version` is
+   * for data deduplication. When recovering from a failure, some data may 
be processed twice. But
+   * it's guarantee that they will be opened with the same "version".
+   *
+   * If this method finds this is a partition from a duplicated data set, 
it can return `false` to
+   * skip the further data processing. However, `close` still will be 
called for cleaning up
+   * resources.
+   *
+   * @param partitionId the partition id.
+   * @param version a unique id for data deduplication.
+   * @return a flat that indicates if the data should be processed.
+   */
+  def open(partitionId: Long, version: Long): Boolean
+
+  /**
+   * Called to process the data in the executor side.
--- End diff --

Also say, 
This method will be called only when open returns true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13581
  
**[Test build #60245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60245/consoleFull)**
 for PR 13581 at commit 
[`a086128`](https://github.com/apache/spark/commit/a0861285ccffe362a1a5758f9c99a36e6f7b2f5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66537560
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala 
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.streaming.ContinuousQuery
+
+/**
+ * :: Experimental ::
+ * A writer to consume data generated by a [[ContinuousQuery]]. Each 
partition will use a new
+ * deserialized instance, so you usually should do the initialization work 
in the `open` method.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+abstract class ForeachWriter[T] extends Serializable {
+
+  /**
+   * Called when starting to process one partition of new data in the 
executor side. `version` is
+   * for data deduplication. When recovering from a failure, some data may 
be processed twice. But
+   * it's guarantee that they will be opened with the same "version".
+   *
+   * If this method finds this is a partition from a duplicated data set, 
it can return `false` to
+   * skip the further data processing. However, `close` still will be 
called for cleaning up
+   * resources.
+   *
+   * @param partitionId the partition id.
+   * @param version a unique id for data deduplication.
+   * @return a flat that indicates if the data should be processed.
+   */
+  def open(partitionId: Long, version: Long): Boolean
+
+  /**
+   * Called to process the data in the executor side.
+   */
+  def process(value: T): Unit
+
+  /**
+   * Called when stopping to process one partition of new data in the 
executor side. This is
+   * guaranteed to be called when a `Throwable` is thrown during 
processing data. However,
+   * `close` won't be called in the following cases:
+   *  - JVM crashes without throwing a `Throwable`
+   *  - `open` throws a `Throwable`.
+   *
+   * @param errorOrNull the error thrown during processing data or null if 
nothing is thrown.
--- End diff --

if nothing is thrown --> if there was no error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13582: [SPARK-15850][SQL] Remove function grouping in SparkSess...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13582
  
**[Test build #60249 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60249/consoleFull)**
 for PR 13582 at commit 
[`a675b03`](https://github.com/apache/spark/commit/a675b03cb71097564320874d693e4ceaa323f1f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66537102
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -401,6 +381,53 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   }
 
   /**
+   * :: Experimental ::
+   * Starts the execution of the streaming query, which will continually 
send results to the given
+   * [[ForeachWriter]] as new data arrives. The returned 
[[ContinuousQuery]] object can be used to
+   * interact with the stream.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def foreach(writer: ForeachWriter[T]): ContinuousQuery = {
+assertNotBucketed("foreach")
+assertStreaming(
+  "foreach() on streaming Datasets and DataFrames can only be called 
on continuous queries")
+
+val queryName = extraOptions.getOrElse("queryName", 
StreamExecution.nextName)
+val sink = new 
ForeachSink[T](ds.sparkSession.sparkContext.clean(writer))(ds.exprEnc)
+df.sparkSession.sessionState.continuousQueryManager.startQuery(
+  queryName,
+  getCheckpointLocation(queryName, required = false),
+  df,
+  sink,
+  outputMode,
+  trigger)
+  }
+
+  /**
+   * Returns the checkpointLocation for a query. If `required` is `true` 
but the checkpoint
+   * location is not set, [[AnalysisException]] will be thrown. If 
`required` is `false`, a temp
+   * folder will be created if the checkpoint location is not set.
+   */
+  private def getCheckpointLocation(queryName: String, required: Boolean): 
String = {
+extraOptions.get("checkpointLocation").map { userSpecified =>
+  new Path(userSpecified).toUri.toString
+}.orElse {
+  df.sparkSession.conf.get(SQLConf.CHECKPOINT_LOCATION).map { location 
=>
+new Path(location, queryName).toUri.toString
+  }
+}.getOrElse {
+  if (required) {
+throw new AnalysisException("checkpointLocation must be specified 
either " +
+  "through option() or SQLConf")
--- End diff --

SQLConf is actually not a user facing concept.

Just say `SparkSession.conf.set`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13577
  
Actually thanks for the pr. Got me to do this one: 
https://github.com/apache/spark/pull/13582




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13582: [SPARK-15850][SQL] Remove function grouping in Sp...

2016-06-09 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/13582

[SPARK-15850][SQL] Remove function grouping in SparkSession

## What changes were proposed in this pull request?
SparkSession does not have that many functions due to better namespacing, 
and as a result we probably don't need the function grouping. This patch 
removes the grouping.

Closes #13577.

## How was this patch tested?
N/A - this is a documentation change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-15850

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13582


commit a675b03cb71097564320874d693e4ceaa323f1f1
Author: Reynold Xin 
Date:   2016-06-09T22:56:23Z

[SPARK-15850][SQL] Remove function grouping in SparkSession




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13371
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13371
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60246/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13371
  
**[Test build #60246 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60246/consoleFull)**
 for PR 13371 at commit 
[`077f7f8`](https://github.com/apache/spark/commit/077f7f8813a76d38c8a6d898ec54e401c91b6014).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13393: [SPARK-14615][ML][FOLLOWUP] Fix Python examples to use t...

2016-06-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13393
  
Hi @yanboliang , could you maybe take a quick look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13572
  
Are we breaking past behavior here? I would say we shouldn't change this, 
since this is a weird command anyway. I'm actually surprised we have this 
command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66534481
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala 
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.streaming.ContinuousQuery
+
+/**
+ * :: Experimental ::
+ * A writer to consume data generated by a [[ContinuousQuery]]. Each 
partition will use a new
+ * deserialized instance, so you usually should do the initialization work 
in the `open` method.
+ *
+ * @since 2.0.0
+ */
+@Experimental
+abstract class ForeachWriter[T] extends Serializable {
+
+  /**
+   * Called when starting to process one partition of new data in the 
executor side. `version` is
+   * for data deduplication. When recovering from a failure, some data may 
be processed twice. But
+   * it's guarantee that they will be opened with the same "version".
+   *
+   * If this method finds this is a partition from a duplicated data set, 
it can return `false` to
+   * skip the further data processing. However, `close` still will be 
called for cleaning up
+   * resources.
+   *
+   * @param partitionId the partition id.
+   * @param version a unique id for data deduplication.
+   * @return a flat that indicates if the data should be processed.
--- End diff --

return true if the corresponding partition and version id should be 
processed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13571
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13571
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60244/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13571
  
**[Test build #60244 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60244/consoleFull)**
 for PR 13571 at commit 
[`1ca895a`](https://github.com/apache/spark/commit/1ca895a085c79734bd5a51031b60b2553717c256).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66533974
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -401,6 +381,53 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   }
 
   /**
+   * :: Experimental ::
+   * Starts the execution of the streaming query, which will continually 
send results to the given
+   * [[ForeachWriter]] as new data arrives. The returned 
[[ContinuousQuery]] object can be used to
--- End diff --

Also can you add a simple example code showing the use of ForeachWriter.
```
datasetOfString.write.foreach(new ForeachWriter[String] {
def open(): Boolean = {   // open connection }
def process(record: String) = {   // write string to connection }
def close(errorOrNull: Throwable): Unit = { // close the connection } 
} 
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66533645
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -401,6 +381,53 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   }
 
   /**
+   * :: Experimental ::
+   * Starts the execution of the streaming query, which will continually 
send results to the given
+   * [[ForeachWriter]] as new data arrives. The returned 
[[ContinuousQuery]] object can be used to
--- End diff --

...as new data arrives. The [[ForeachWriter]] can be used to send the data 
generated by the DataFrame/Dataset to an external system. The returned...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13531#discussion_r66533473
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -298,6 +309,28 @@ trait FileFormat {
 }
 
 /**
+ * The base class file format that is based on text file.
+ */
+abstract class TextBasedFileFormat extends FileFormat {
+  private var codecFactory: CompressionCodecFactory = null
+  override def isSplitable(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  path: Path): Boolean = {
+if (codecFactory == null) {
+  synchronized {
--- End diff --

I am not sure we need "synchronized" here or not, do we want to ensure 
FileFormat is thread safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66533244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -401,6 +381,53 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   }
 
   /**
+   * :: Experimental ::
+   * Starts the execution of the streaming query, which will continually 
send results to the given
+   * [[ForeachWriter]] as new data arrives. The returned 
[[ContinuousQuery]] object can be used to
+   * interact with the stream.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def foreach(writer: ForeachWriter[T]): ContinuousQuery = {
+assertNotBucketed("foreach")
+assertStreaming(
+  "foreach() on streaming Datasets and DataFrames can only be called 
on continuous queries")
--- End diff --

foreach() can only be called on streaming Datasets/DataFrames.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRunner to...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11746
  
**[Test build #60248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60248/consoleFull)**
 for PR 11746 at commit 
[`0375da7`](https://github.com/apache/spark/commit/0375da7e2e4c7cfecd9b26c19401ebc31a5e6c69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66532833
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala 
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.streaming.ContinuousQuery
+
+/**
+ * :: Experimental ::
+ * A writer to consume data generated by a [[ContinuousQuery]]. Each 
partition will use a new
--- End diff --

A writer to **write** data generated 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66532767
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala 
---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.streaming.ContinuousQuery
+
+/**
+ * :: Experimental ::
+ * A writer to consume data generated by a [[ContinuousQuery]]. Each 
partition will use a new
+ * deserialized instance, so you usually should do the initialization work 
in the `open` method.
--- End diff --

usually should --> should do all the initialization (e.g. opening a 
connection or initiating a transaction) in the `open` method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10412: [SPARK-12447][YARN] Only update the states when e...

2016-06-09 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10412#discussion_r66532615
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -399,40 +402,61 @@ private[yarn] class YarnAllocator(
*/
   private def runAllocatedContainers(containersToUse: 
ArrayBuffer[Container]): Unit = {
 for (container <- containersToUse) {
-  numExecutorsRunning += 1
-  assert(numExecutorsRunning <= targetNumExecutors)
+  executorIdCounter += 1
   val executorHostname = container.getNodeId.getHost
   val containerId = container.getId
-  executorIdCounter += 1
   val executorId = executorIdCounter.toString
-
   assert(container.getResource.getMemory >= resource.getMemory)
-
   logInfo("Launching container %s for on host %s".format(containerId, 
executorHostname))
-  executorIdToContainer(executorId) = container
-  containerIdToExecutorId(container.getId) = executorId
-
-  val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
-new HashSet[ContainerId])
-
-  containerSet += containerId
-  allocatedContainerToHostMap.put(containerId, executorHostname)
-
-  val executorRunnable = new ExecutorRunnable(
-container,
-conf,
-sparkConf,
-driverUrl,
-executorId,
-executorHostname,
-executorMemory,
-executorCores,
-appAttemptId.getApplicationId.toString,
-securityMgr)
+
+  def updateInternalState(): Unit = synchronized {
+numExecutorsRunning += 1
+assert(numExecutorsRunning <= targetNumExecutors)
+executorIdToContainer(executorId) = container
+containerIdToExecutorId(container.getId) = executorId
+
+val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
+  new HashSet[ContainerId])
+containerSet += containerId
+allocatedContainerToHostMap.put(containerId, executorHostname)
+  }
+
   if (launchContainers) {
 logInfo("Launching ExecutorRunnable. driverUrl: %s,  
executorHostname: %s".format(
   driverUrl, executorHostname))
-launcherPool.execute(executorRunnable)
+
+val future = Future {
--- End diff --

I see, I don't have a strong opinion on either, let me change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66532405
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -401,6 +381,53 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   }
 
   /**
+   * :: Experimental ::
+   * Starts the execution of the streaming query, which will continually 
send results to the given
+   * [[ForeachWriter]] as new data arrives. The returned 
[[ContinuousQuery]] object can be used to
+   * interact with the stream.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def foreach(writer: ForeachWriter[T]): ContinuousQuery = {
+assertNotBucketed("foreach")
+assertStreaming(
+  "foreach() on streaming Datasets and DataFrames can only be called 
on continuous queries")
+
+val queryName = extraOptions.getOrElse("queryName", 
StreamExecution.nextName)
+val sink = new 
ForeachSink[T](ds.sparkSession.sparkContext.clean(writer))(ds.exprEnc)
+df.sparkSession.sessionState.continuousQueryManager.startQuery(
+  queryName,
+  getCheckpointLocation(queryName, required = false),
+  df,
+  sink,
+  outputMode,
+  trigger)
+  }
+
+  /**
+   * Returns the checkpointLocation for a query. If `required` is `true` 
but the checkpoint
+   * location is not set, [[AnalysisException]] will be thrown. If 
`required` is `false`, a temp
+   * folder will be created if the checkpoint location is not set.
+   */
+  private def getCheckpointLocation(queryName: String, required: Boolean): 
String = {
--- End diff --

The semantics of this method is very confusing. `required` implies that it 
will throw error if there is not checkpoint location set. Its not intuitive 
that when it is not required it creates a temp checkpoint dir. Furthermore it 
creates temp one named on memory. Does not make sense. 

Cleaner for this to return an Option[String] and `required` --> 
`failIfNotSet'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10412: [SPARK-12447][YARN] Only update the states when e...

2016-06-09 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10412#discussion_r66532257
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -399,40 +402,61 @@ private[yarn] class YarnAllocator(
*/
   private def runAllocatedContainers(containersToUse: 
ArrayBuffer[Container]): Unit = {
 for (container <- containersToUse) {
-  numExecutorsRunning += 1
-  assert(numExecutorsRunning <= targetNumExecutors)
+  executorIdCounter += 1
   val executorHostname = container.getNodeId.getHost
   val containerId = container.getId
-  executorIdCounter += 1
   val executorId = executorIdCounter.toString
-
   assert(container.getResource.getMemory >= resource.getMemory)
-
   logInfo("Launching container %s for on host %s".format(containerId, 
executorHostname))
-  executorIdToContainer(executorId) = container
-  containerIdToExecutorId(container.getId) = executorId
-
-  val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
-new HashSet[ContainerId])
-
-  containerSet += containerId
-  allocatedContainerToHostMap.put(containerId, executorHostname)
-
-  val executorRunnable = new ExecutorRunnable(
-container,
-conf,
-sparkConf,
-driverUrl,
-executorId,
-executorHostname,
-executorMemory,
-executorCores,
-appAttemptId.getApplicationId.toString,
-securityMgr)
+
+  def updateInternalState(): Unit = synchronized {
+numExecutorsRunning += 1
+assert(numExecutorsRunning <= targetNumExecutors)
+executorIdToContainer(executorId) = container
+containerIdToExecutorId(container.getId) = executorId
+
+val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
+  new HashSet[ContainerId])
+containerSet += containerId
+allocatedContainerToHostMap.put(containerId, executorHostname)
+  }
+
   if (launchContainers) {
 logInfo("Launching ExecutorRunnable. driverUrl: %s,  
executorHostname: %s".format(
   driverUrl, executorHostname))
-launcherPool.execute(executorRunnable)
+
+val future = Future {
--- End diff --

It's not a potential problem, it just seems like less code to use the Java 
executor instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10412: [SPARK-12447][YARN] Only update the states when e...

2016-06-09 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10412#discussion_r66532121
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -399,40 +402,61 @@ private[yarn] class YarnAllocator(
*/
   private def runAllocatedContainers(containersToUse: 
ArrayBuffer[Container]): Unit = {
 for (container <- containersToUse) {
-  numExecutorsRunning += 1
-  assert(numExecutorsRunning <= targetNumExecutors)
+  executorIdCounter += 1
   val executorHostname = container.getNodeId.getHost
   val containerId = container.getId
-  executorIdCounter += 1
   val executorId = executorIdCounter.toString
-
   assert(container.getResource.getMemory >= resource.getMemory)
-
   logInfo("Launching container %s for on host %s".format(containerId, 
executorHostname))
-  executorIdToContainer(executorId) = container
-  containerIdToExecutorId(container.getId) = executorId
-
-  val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
-new HashSet[ContainerId])
-
-  containerSet += containerId
-  allocatedContainerToHostMap.put(containerId, executorHostname)
-
-  val executorRunnable = new ExecutorRunnable(
-container,
-conf,
-sparkConf,
-driverUrl,
-executorId,
-executorHostname,
-executorMemory,
-executorCores,
-appAttemptId.getApplicationId.toString,
-securityMgr)
+
+  def updateInternalState(): Unit = synchronized {
+numExecutorsRunning += 1
+assert(numExecutorsRunning <= targetNumExecutors)
+executorIdToContainer(executorId) = container
+containerIdToExecutorId(container.getId) = executorId
+
+val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
+  new HashSet[ContainerId])
+containerSet += containerId
+allocatedContainerToHostMap.put(containerId, executorHostname)
+  }
+
   if (launchContainers) {
 logInfo("Launching ExecutorRunnable. driverUrl: %s,  
executorHostname: %s".format(
   driverUrl, executorHostname))
-launcherPool.execute(executorRunnable)
+
+val future = Future {
--- End diff --

@vanzin , what is the potential problem of using Scala Future and implicit 
value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13342: [SPARK-15593][SQL]Add DataFrameWriter.foreach to ...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13342#discussion_r66532056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -401,6 +381,53 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   }
 
   /**
+   * :: Experimental ::
+   * Starts the execution of the streaming query, which will continually 
send results to the given
+   * [[ForeachWriter]] as new data arrives. The returned 
[[ContinuousQuery]] object can be used to
+   * interact with the stream.
+   *
+   * @since 2.0.0
+   */
+  @Experimental
+  def foreach(writer: ForeachWriter[T]): ContinuousQuery = {
+assertNotBucketed("foreach")
+assertStreaming(
+  "foreach() on streaming Datasets and DataFrames can only be called 
on continuous queries")
+
+val queryName = extraOptions.getOrElse("queryName", 
StreamExecution.nextName)
+val sink = new 
ForeachSink[T](ds.sparkSession.sparkContext.clean(writer))(ds.exprEnc)
+df.sparkSession.sessionState.continuousQueryManager.startQuery(
+  queryName,
+  getCheckpointLocation(queryName, required = false),
+  df,
+  sink,
+  outputMode,
+  trigger)
+  }
+
+  /**
+   * Returns the checkpointLocation for a query. If `required` is `true` 
but the checkpoint
+   * location is not set, [[AnalysisException]] will be thrown. If 
`required` is `false`, a temp
+   * folder will be created if the checkpoint location is not set.
+   */
+  private def getCheckpointLocation(queryName: String, required: Boolean): 
String = {
+extraOptions.get("checkpointLocation").map { userSpecified =>
+  new Path(userSpecified).toUri.toString
+}.orElse {
+  df.sparkSession.conf.get(SQLConf.CHECKPOINT_LOCATION).map { location 
=>
+new Path(location, queryName).toUri.toString
+  }
+}.getOrElse {
+  if (required) {
+throw new AnalysisException("checkpointLocation must be specified 
either " +
+  "through option() or SQLConf")
--- End diff --

`option()` --> `option("checkpointLocation", ...)`
`SQLConf` --> `sqlContext.conf..` (complete it)
Makes it easier for the user



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10412: [SPARK-12447][YARN] Only update the states when executor...

2016-06-09 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/10412
  
Sure, I will do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13577: [Minor][Doc] Improve SQLContext Documentation and...

2016-06-09 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13577#discussion_r66529939
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -346,15 +346,68 @@ class SQLContext private[sql](val sparkSession: 
SparkSession)
 sparkSession.createDataFrame(rowRDD, schema, needsConversion)
   }
 
-
+  /**
+   * Creates a [[Dataset]] from a local Seq of data of a given type. This 
method requires an
+   * encoder (to convert a JVM object of type `T` to and from the internal 
Spark SQL representation)
+   * that is generally created automatically through implicits from a 
`SparkSession`, or can be
+   * created explicitly by calling static methods on [[Encoders]].
+   *
+   * == Example ==
--- End diff --

what does this look like when it is rendered? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/13549
  
@marmbrus could you take a look once again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13581
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13581
  
> in the PR description

@hvanhovell not PR title...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13581
  
**[Test build #60247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60247/consoleFull)**
 for PR 13581 at commit 
[`0ca9f99`](https://github.com/apache/spark/commit/0ca9f99e8336494e866708363b8ef415953ed339).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13581: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13581#discussion_r66527669
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -434,24 +436,21 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
 left.dataType match {
   case StringType if right.foldable =>
 val sdf = classOf[SimpleDateFormat].getName
-val fString = if (constFormat == null) null else 
constFormat.toString
-val formatter = ctx.freshName("formatter")
-if (fString == null) {
+if (formatter == null) {
   ev.copy(code = s"""
 boolean ${ev.isNull} = true;
 ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};""")
 } else {
+  val formatterName = ctx.addReferenceObj("formatter", formatter, 
sdf)
--- End diff --

nit: maybe we can just use `formatter` as name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13581
  
Tip: if you put "closes #13522" in the PR description then the subsumed PR 
will be automatically closed once this PR is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13581
  
LGTM except a minor comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13581: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13581#discussion_r66525197
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -549,21 +552,22 @@ case class FromUnixTime(sec: Expression, format: 
Expression)
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val sdf = classOf[SimpleDateFormat].getName
 if (format.foldable) {
-  if (constFormat == null) {
+  if (formatter == null) {
 ev.copy(code = s"""
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13581: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-09 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13581#discussion_r66525010
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -434,24 +436,21 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
 left.dataType match {
   case StringType if right.foldable =>
 val sdf = classOf[SimpleDateFormat].getName
-val fString = if (constFormat == null) null else 
constFormat.toString
-val formatter = ctx.freshName("formatter")
-if (fString == null) {
+if (formatter == null) {
   ev.copy(code = s"""
--- End diff --

this can be: `ev.copy(code = "", isNull = "false", value = 
ctx.defaultValue(dataType))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #7786: [SPARK-9468][Yarn][Core] Avoid scheduling tasks on preemp...

2016-06-09 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/7786
  
@steveloughran 

> I suspect that if you get told you are being pre-empted, you aren't 
likely to get containers elsewhere

That's very possible. But I'm just trying to point out that the current 
change doesn't really make things better. Without killing the executor, you'll 
still be holding on to resources, except now you wouldn't be using them. So 
might as well keep using them for as long as you can, or give them back as soon 
as possible (i.e. as soon as the executor becomes idle).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13371
  
**[Test build #60246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60246/consoleFull)**
 for PR 13371 at commit 
[`077f7f8`](https://github.com/apache/spark/commit/077f7f8813a76d38c8a6d898ec54e401c91b6014).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13581
  
**[Test build #60245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60245/consoleFull)**
 for PR 13581 at commit 
[`a086128`](https://github.com/apache/spark/commit/a0861285ccffe362a1a5758f9c99a36e6f7b2f5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13581: [SPARK-14321][SQL] Reduce date format cost and string-to...

2016-06-09 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13581
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13581: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-09 Thread hvanhovell

GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/13581

[SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date 
functions

## What changes were proposed in this pull request?
The current implementations of `UnixTime` and `FromUnixTime` do not cache 
their parser/formatter as much as they could. This PR resolved this issue.

This PR is a take over from https://github.com/apache/spark/pull/13522 and 
further optimizes the re-use of the parser/formatter. It also fixes the 
improves handling (catching the actual exception instead of `Throwable`). All 
credits for this work should go to @rajeshbalamohan.

## How was this patch tested?
Current tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-14321

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13581


commit 602d4a70ba845df3160a07c2c9afe2d5c3c574c4
Author: Rajesh Balamohan 
Date:   2016-06-06T12:54:02Z

[SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date 
functions

commit 425aa7ea30f1f92143fc9f539c7b928255dee1c9
Author: Rajesh Balamohan 
Date:   2016-06-07T08:17:20Z

[SPARK-14321][SQL] Reduce date format cost and string-to-date cost in date 
functions

commit a0861285ccffe362a1a5758f9c99a36e6f7b2f5d
Author: Herman van Hovell 
Date:   2016-06-09T21:07:49Z

Polishing code gen and exceptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-09 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
ping @yhuai @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13580: Revert "[SPARK-14485][CORE] ignore task finished for exe...

2016-06-09 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/13580
  
I don't have the strong feelings, but I still believe both cases hurt 
performance in different ways and in rare cases, and if that's to happen I 
prefer the version that yells less at the user. In any case, if you feel 
strongly, no worries on my side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13571
  
**[Test build #60244 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60244/consoleFull)**
 for PR 13571 at commit 
[`1ca895a`](https://github.com/apache/spark/commit/1ca895a085c79734bd5a51031b60b2553717c256).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13549
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60243/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13549
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13549
  
**[Test build #60243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60243/consoleFull)**
 for PR 13549 at commit 
[`810b802`](https://github.com/apache/spark/commit/810b802060792da78b61c44ad9b656ce3e02b4a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss

Github user frreiss commented on a diff in the pull request:

https://github.com/apache/spark/pull/13155#discussion_r66509510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1695,16 +1695,176 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
   }
 
   /**
+   * Statically evaluate an expression containing zero or more 
placeholders, given a set
+   * of bindings for placeholder values.
+   */
+  private def evalExpr(expr : Expression, bindings : Map[Long, 
Option[Any]]) : Option[Any] = {
+val rewrittenExpr = expr transform {
+  case r @ AttributeReference(_, dataType, _, _) =>
+bindings(r.exprId.id) match {
+  case Some(v) => Literal.create(v, dataType)
+  case None => Literal.default(NullType)
+}
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
+   * Statically evaluate an expression containing one or more aggregates 
on an empty input.
+   */
+  private def evalAggOnZeroTups(expr : Expression) : Option[Any] = {
+// AggregateExpressions are Unevaluable, so we need to replace all 
aggregates
+// in the expression with the value they would return for zero input 
tuples.
+val rewrittenExpr = expr transform {
+  case a @ AggregateExpression(aggFunc, _, _, resultId) =>
+aggFunc.defaultResult.getOrElse(Literal.default(NullType))
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
+   * Statically evaluate a scalar subquery on an empty input.
+   *
+   * WARNING: This method only covers subqueries that pass the 
checks under
+   * [[org.apache.spark.sql.catalyst.analysis.CheckAnalysis]]. If the 
checks in
+   * CheckAnalysis become less restrictive, this method will need to 
change.
+   */
+  private def evalSubqueryOnZeroTups(plan: LogicalPlan) : Option[Any] = {
+// Inputs to this method will start with a chain of zero or more 
SubqueryAlias
+// and Project operators, followed by an optional Filter, followed by 
an
+// Aggregate. Traverse the operators recursively.
+def evalPlan(lp : LogicalPlan) : Map[Long, Option[Any]] = {
+  lp match {
+case SubqueryAlias(_, child) => evalPlan(child)
+case Filter(condition, child) =>
+  val bindings = evalPlan(child)
+  if (bindings.size == 0) bindings
+  else {
+val exprResult = evalExpr(condition, bindings).getOrElse(false)
+  .asInstanceOf[Boolean]
+if (exprResult) bindings else Map()
+  }
+
+case Project(projectList, child) =>
+  val bindings = evalPlan(child)
+  if (bindings.size == 0) {
+bindings
+  } else {
+projectList.map(ne => (ne.exprId.id, evalExpr(ne, 
bindings))).toMap
+  }
+
+case Aggregate(_, aggExprs, _) =>
+  // Some of the expressions under the Aggregate node are the join 
columns
+  // for joining with the outer query block. Fill those 
expressions in with
+  // nulls and statically evaluate the remainder.
+  aggExprs.map(ne => ne match {
+case AttributeReference(_, _, _, _) => (ne.exprId.id, None)
+case Alias(AttributeReference(_, _, _, _), _) => 
(ne.exprId.id, None)
+case _ => (ne.exprId.id, evalAggOnZeroTups(ne))
+  }).toMap
+
+case _ => sys.error(s"Unexpected operator in scalar subquery: $lp")
+  }
+}
+
+val resultMap = evalPlan(plan)
+
+// By convention, the scalar subquery result is the leftmost field.
+resultMap(plan.output.head.exprId.id)
+  }
+
+  /**
+   * Split the plan for a scalar subquery into the parts above the 
Aggregate node
+   * (first part of returned value) and the parts below the Aggregate 
node, including
+   * the Aggregate (second part of returned value)
+   */
+  private def splitSubquery(plan : LogicalPlan) : Tuple2[Seq[LogicalPlan], 
Aggregate] = {
+var topPart = List[LogicalPlan]()
--- End diff --

Fixed in my local copy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss

Github user frreiss commented on a diff in the pull request:

https://github.com/apache/spark/pull/13155#discussion_r66509444
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1695,16 +1695,176 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
   }
 
   /**
+   * Statically evaluate an expression containing zero or more 
placeholders, given a set
+   * of bindings for placeholder values.
+   */
+  private def evalExpr(expr : Expression, bindings : Map[Long, 
Option[Any]]) : Option[Any] = {
+val rewrittenExpr = expr transform {
+  case r @ AttributeReference(_, dataType, _, _) =>
+bindings(r.exprId.id) match {
+  case Some(v) => Literal.create(v, dataType)
+  case None => Literal.default(NullType)
+}
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
+   * Statically evaluate an expression containing one or more aggregates 
on an empty input.
+   */
+  private def evalAggOnZeroTups(expr : Expression) : Option[Any] = {
+// AggregateExpressions are Unevaluable, so we need to replace all 
aggregates
+// in the expression with the value they would return for zero input 
tuples.
+val rewrittenExpr = expr transform {
+  case a @ AggregateExpression(aggFunc, _, _, resultId) =>
+aggFunc.defaultResult.getOrElse(Literal.default(NullType))
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
+   * Statically evaluate a scalar subquery on an empty input.
+   *
+   * WARNING: This method only covers subqueries that pass the 
checks under
+   * [[org.apache.spark.sql.catalyst.analysis.CheckAnalysis]]. If the 
checks in
+   * CheckAnalysis become less restrictive, this method will need to 
change.
+   */
+  private def evalSubqueryOnZeroTups(plan: LogicalPlan) : Option[Any] = {
+// Inputs to this method will start with a chain of zero or more 
SubqueryAlias
+// and Project operators, followed by an optional Filter, followed by 
an
+// Aggregate. Traverse the operators recursively.
+def evalPlan(lp : LogicalPlan) : Map[Long, Option[Any]] = {
+  lp match {
+case SubqueryAlias(_, child) => evalPlan(child)
+case Filter(condition, child) =>
+  val bindings = evalPlan(child)
+  if (bindings.size == 0) bindings
+  else {
+val exprResult = evalExpr(condition, bindings).getOrElse(false)
+  .asInstanceOf[Boolean]
+if (exprResult) bindings else Map()
--- End diff --

Fixed in my local copy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss

Github user frreiss commented on a diff in the pull request:

https://github.com/apache/spark/pull/13155#discussion_r66509405
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1695,16 +1695,176 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
   }
 
   /**
+   * Statically evaluate an expression containing zero or more 
placeholders, given a set
+   * of bindings for placeholder values.
+   */
+  private def evalExpr(expr : Expression, bindings : Map[Long, 
Option[Any]]) : Option[Any] = {
+val rewrittenExpr = expr transform {
+  case r @ AttributeReference(_, dataType, _, _) =>
+bindings(r.exprId.id) match {
+  case Some(v) => Literal.create(v, dataType)
+  case None => Literal.default(NullType)
+}
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
+   * Statically evaluate an expression containing one or more aggregates 
on an empty input.
+   */
+  private def evalAggOnZeroTups(expr : Expression) : Option[Any] = {
+// AggregateExpressions are Unevaluable, so we need to replace all 
aggregates
+// in the expression with the value they would return for zero input 
tuples.
+val rewrittenExpr = expr transform {
+  case a @ AggregateExpression(aggFunc, _, _, resultId) =>
+aggFunc.defaultResult.getOrElse(Literal.default(NullType))
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
+   * Statically evaluate a scalar subquery on an empty input.
+   *
+   * WARNING: This method only covers subqueries that pass the 
checks under
+   * [[org.apache.spark.sql.catalyst.analysis.CheckAnalysis]]. If the 
checks in
+   * CheckAnalysis become less restrictive, this method will need to 
change.
+   */
+  private def evalSubqueryOnZeroTups(plan: LogicalPlan) : Option[Any] = {
+// Inputs to this method will start with a chain of zero or more 
SubqueryAlias
+// and Project operators, followed by an optional Filter, followed by 
an
+// Aggregate. Traverse the operators recursively.
+def evalPlan(lp : LogicalPlan) : Map[Long, Option[Any]] = {
+  lp match {
+case SubqueryAlias(_, child) => evalPlan(child)
+case Filter(condition, child) =>
+  val bindings = evalPlan(child)
+  if (bindings.size == 0) bindings
--- End diff --

Fixed in my local copy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-09 Thread frreiss

Github user frreiss commented on the issue:

https://github.com/apache/spark/pull/13155
  
@rxin I'll have an updated set of changes in tonight


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13573: [SPARK-15839] Fix Maven doc-jar generation when J...

2016-06-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13573


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13580: Revert "[SPARK-14485][CORE] ignore task finished for exe...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13580
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13580: Revert "[SPARK-14485][CORE] ignore task finished for exe...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13580
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60241/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13573
  
I am going to trigger a snapshot build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13580: Revert "[SPARK-14485][CORE] ignore task finished for exe...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13580
  
**[Test build #60241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60241/consoleFull)**
 for PR 13580 at commit 
[`912ce5a`](https://github.com/apache/spark/commit/912ce5a66fae9db2cf84956375b24de994568a9e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13573
  
Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13573
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13549
  
**[Test build #60243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60243/consoleFull)**
 for PR 13549 at commit 
[`810b802`](https://github.com/apache/spark/commit/810b802060792da78b61c44ad9b656ce3e02b4a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66501686
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

The script sets --host to the value of SPARK_MASTER_HOST though, and I 
think that's considered the only valid way to start the master. It is a little 
behavior change but I was wondering if it's best to go ahead and move any 
unofficial use case away from env variables anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13549
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13436: [SPARK-15696][SQL] Improve `crosstab` to have a consiste...

2016-06-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13436
  
Hi, @rxin .
Could you review this PR and give some opinion when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13549
  
**[Test build #60242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60242/consoleFull)**
 for PR 13549 at commit 
[`810b802`](https://github.com/apache/spark/commit/810b802060792da78b61c44ad9b656ce3e02b4a3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13549
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60242/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13549
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60238/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13569: [SPARK-15791] Fix NPE in ScalarSubquery

2016-06-09 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/13569
  
Yeah I'm looking into how to reproduce this in a unit test. It seems we 
have to go through `DeserializeToObjectExec` to reproduce the npe:

```
Execution 'q9-v1.4' failed: Job aborted due to stage failure: Task 0 in 
stage 146.0 failed 4 times, most recent failure: Lost task 0.3 in stage 146.0 
(TID 48828, 10.0.206.208): java.lang.NullPointerException
at 
org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
at 
org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
at 
org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
at 
org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:291)
at 
org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:85)
at 
org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:84)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13579
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13579: [SPARK-15844] [core] HistoryServer doesn't come up if sp...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13579
  
**[Test build #60238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60238/consoleFull)**
 for PR 13579 at commit 
[`26d1ad2`](https://github.com/apache/spark/commit/26d1ad243c9c6db227a13303c3781546634794db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13573
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13573
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60239/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13573
  
**[Test build #60239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60239/consoleFull)**
 for PR 13573 at commit 
[`523487b`](https://github.com/apache/spark/commit/523487bb0be24c698ee66d8c09430d1909eff81c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread bomeng

Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66493098
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

MasterArguments.scala is used by Master.scala main() method, so there is a 
way to use `SPARK_MASTER_HOST


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13569: [SPARK-15791] Fix NPE in ScalarSubquery

2016-06-09 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/13569
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13400
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60240/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13400
  
**[Test build #60240 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60240/consoleFull)**
 for PR 13400 at commit 
[`5bc8996`](https://github.com/apache/spark/commit/5bc89966765e1ec37b7c8d167ac6156988a9a720).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13564: [SPARK-15827][BUILD] Publish Spark's forked sbt-p...

2016-06-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13564


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13549: [SPARK-15812][SQ][Streaming] Added support for so...

2016-06-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13549#discussion_r66491058
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -43,6 +43,41 @@ object UnsupportedOperationChecker {
 "Queries without streaming sources cannot be executed with 
write.startStream()")(plan)
 }
 
+// Disallow multiple streaming aggregations
+val aggregates = plan.collect { case a@Aggregate(_, _, _) if 
a.isStreaming => a }
+
+if (aggregates.size > 1) {
+  throwError(
+"Multiple streaming aggregations are not supported with " +
+  "streaming DataFrames/Datasets")(plan)
+}
+
+// Disallow some output mode
+outputMode match {
+  case InternalOutputModes.Append if aggregates.nonEmpty =>
+throwError(
+  s"$outputMode output mode not supported when there are streaming 
aggregations on " +
+s"streaming DataFrames/DataSets")(plan)
+
+  case InternalOutputModes.Complete | InternalOutputModes.Update if 
aggregates.isEmpty =>
+throwError(
+  s"$outputMode output mode not supported when there are no 
streaming aggregations on " +
+s"streaming DataFrames/Datasets")(plan)
+
+  case _ =>
+}
+
+/**
+ * Whether the subplan will contain complete data or incremental data 
in every incremental
+ * execution. Some operations may be allowed only when the child 
logical plan gives complete
+ * data.
+ */
+def containsCompleteData(subplan: LogicalPlan): Boolean = {
+  val aggs = plan.collect { case a@Aggregate(_, _, _) if a.isStreaming 
=> a }
--- End diff --

I completely agree. This whole file needs to be restructured in future. 
This is really a 2.0 interim fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: [SPARK-15812][SQ][Streaming] Added support for sorting a...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13549
  
**[Test build #60242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60242/consoleFull)**
 for PR 13549 at commit 
[`810b802`](https://github.com/apache/spark/commit/810b802060792da78b61c44ad9b656ce3e02b4a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13564: [SPARK-15827][BUILD] Publish Spark's forked sbt-pom-read...

2016-06-09 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13564
  
This is now sync'd to Maven Central and has passed tests, so I'm going to 
merge it into master, branch-2.0, and branch-1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66490508
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

Yeah, the env param already controls the value of `--host` from the script, 
so I think this is redundant. Right? Although I don't feel strongly about the 
second point, I thought it might be tidier to handle the env param in one place 
rather than two. Env variables do feel like something more from bash-land than 
Scala-land.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13481: [SPARK-15738][PYSPARK][ML] Adding Pyspark ml RFormula __...

2016-06-09 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/13481
  
Can this be merged @MLnick ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread bomeng

Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66488409
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

The code here is just to set its initial values and it may be changed by 
"--host" configuration. I think we should keep it there for now. For the 
warning message, we kind of always use logger, not sure it is a good idea to 
put into the script. I am open to your decision. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13580: Revert "[SPARK-14485][CORE] ignore task finished for exe...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13580
  
**[Test build #60241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60241/consoleFull)**
 for PR 13580 at commit 
[`912ce5a`](https://github.com/apache/spark/commit/912ce5a66fae9db2cf84956375b24de994568a9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13580: Revert "[SPARK-14485][CORE] ignore task finished ...

2016-06-09 Thread kayousterhout

GitHub user kayousterhout opened a pull request:

https://github.com/apache/spark/pull/13580

Revert "[SPARK-14485][CORE] ignore task finished for executor lost anâ¦

## What changes were proposed in this pull request?

This reverts commit 695dbc816a6d70289abeb145cb62ff4e62b3f49b.

This change is being reverted because it hurts performance of some jobs, and
only helps in a narrow set of cases.  For more discussion, refer to the 
JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kayousterhout/spark-1 revert-SPARK-14485

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13580.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13580


commit 912ce5a66fae9db2cf84956375b24de994568a9e
Author: Kay Ousterhout 
Date:   2016-06-09T17:07:32Z

Revert "[SPARK-14485][CORE] ignore task finished for executor lost and 
removed by driver"

This reverts commit 695dbc816a6d70289abeb145cb62ff4e62b3f49b.

This change is being reverted because it hurts performane of some jobs, and
only helps in a narrow set of cases.  For more discussion, refer to the 
JIRA.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13572
  
LGTM

@liancheng anything to add?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13477: [SPARK-15739][GraphX] Expose aggregateMessagesWithActive...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13477
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missi...

2016-06-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13540


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13400: [SPARK-15655] [SQL] Fix Wrong Partition Column Order whe...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13400
  
**[Test build #60240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60240/consoleFull)**
 for PR 13400 at commit 
[`5bc8996`](https://github.com/apache/spark/commit/5bc89966765e1ec37b7c8d167ac6156988a9a720).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13555


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13531#discussion_r66478617
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -298,6 +309,28 @@ trait FileFormat {
 }
 
 /**
+ * The base class file format that is based on text file.
+ */
+abstract class TextBasedFileFormat extends FileFormat {
+  private var codecFactory: CompressionCodecFactory = null
+  override def isSplitable(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  path: Path): Boolean = {
+if (codecFactory == null) {
+  synchronized {
+if (codecFactory == null) {
+  codecFactory = new CompressionCodecFactory(
--- End diff --

There could be "io.compression.codecs" in options, it's used by getCodec().

Since all other APIs have that, it's better to have that here too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

201 - 300 of 426 matches

Mail list logo