date:20180426

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21136
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89912/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21136
  
**[Test build #89912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89912/testReport)**
 for PR 21136 at commit 
[`77ee1c7`](https://github.com/apache/spark/commit/77ee1c7c9aaa3e3200f2fd029c2f0896c155cdcc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-26 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21143#discussion_r184598059
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -56,7 +56,7 @@ case class DataSourceV2Relation(
 
   lazy val (
--- End diff --

// afterScanFilters: predicates that need to be evaluated after the scan.
// pushedFilters: predicates that will be pushed down and only evaluated in 
the underlying data sources.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-26 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184597197
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,12 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("writeFully() can write buffer which is larger than 
bufferWriteChunkSize correctly") {
+val chunkedByteBuffer = new 
ChunkedByteBuffer(Array(ByteBuffer.allocate(80*1024*1024)))
--- End diff --

nit: space beside `*`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-26 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184596199
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,12 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("writeFully() can write buffer which is larger than 
bufferWriteChunkSize correctly") {
+val chunkedByteBuffer = new 
ChunkedByteBuffer(Array(ByteBuffer.allocate(80*1024*1024)))
+chunkedByteBuffer.writeFully(new 
ByteArrayWritableChannel(chunkedByteBuffer.size.toInt))
+assert(chunkedByteBuffer.getChunks().head.position() === 0)
--- End diff --

This assert is unnecessary for this PR change. Please replace it with 
assert channel's length here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184597402
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
   - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader 
for ORC files by default. To do that, `spark.sql.orc.impl` and 
`spark.sql.orc.filterPushdown` change their default values to `native` and 
`true` respectively.
   - In PySpark, when Arrow optimization is enabled, previously `toPandas` 
just failed when Arrow optimization is unable to be used whereas 
`createDataFrame` from Pandas DataFrame allowed the fallback to 
non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas 
DataFrame allow the fallback by default, which can be switched off by 
`spark.sql.execution.arrow.fallback.enabled`.
- - Since Spark 2.4, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing a 0-partition dataframe, so that schema inference can still work 
if users read that directory later. The new behavior is more reasonable and 
more consistent regarding writing empty dataframe.
- - Since Spark 2.4, expression IDs in UDF arguments do not appear in 
column names. For example, an column name in Spark 2.4 is not `UDF:f(col0 AS 
colA#28)` but ``UDF:f(col0 AS `colA`)``.
- - Since Spark 2.4, writing a dataframe with an empty or nested empty 
schema using any file formats (parquet, orc, json, text, csv etc.) is not 
allowed. An exception is thrown when attempting to write dataframes with empty 
schema. 
- - Since Spark 2.4, Spark compares a DATE type with a TIMESTAMP type after 
promotes both sides to TIMESTAMP. To set `false` to 
`spark.sql.hive.compareDateTimestampInTimestamp` restores the previous 
behavior. This option will be removed in Spark 3.0.
- - Since Spark 2.4, creating a managed table with nonempty location is not 
allowed. An exception is thrown when attempting to create a managed table with 
nonempty location. To set `true` to 
`spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the 
previous behavior. This option will be removed in Spark 3.0.
- - Since Spark 2.4, the type coercion rules can automatically promote the 
argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest 
common type, no matter how the input arguments order. In prior Spark versions, 
the promotion could fail in some specific orders (e.g., TimestampType, 
IntegerType and StringType) and throw an exception.
+  - Since Spark 2.4, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing a 0-partition dataframe, so that schema inference can still work 
if users read that directory later. The new behavior is more reasonable and 
more consistent regarding writing empty dataframe.
+  - Since Spark 2.4, expression IDs in UDF arguments do not appear in 
column names. For example, an column name in Spark 2.4 is not `UDF:f(col0 AS 
colA#28)` but ``UDF:f(col0 AS `colA`)``.
+  - Since Spark 2.4, writing a dataframe with an empty or nested empty 
schema using any file formats (parquet, orc, json, text, csv etc.) is not 
allowed. An exception is thrown when attempting to write dataframes with empty 
schema.
--- End diff --

like `new StructType("empty", new StructType())`, the table has a column, 
the column is struct type but has 0 fields. This schema is invalid to write out.

Anyway this is an existing comment and I just fixed its indentation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184596334
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
   - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader 
for ORC files by default. To do that, `spark.sql.orc.impl` and 
`spark.sql.orc.filterPushdown` change their default values to `native` and 
`true` respectively.
   - In PySpark, when Arrow optimization is enabled, previously `toPandas` 
just failed when Arrow optimization is unable to be used whereas 
`createDataFrame` from Pandas DataFrame allowed the fallback to 
non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas 
DataFrame allow the fallback by default, which can be switched off by 
`spark.sql.execution.arrow.fallback.enabled`.
- - Since Spark 2.4, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing a 0-partition dataframe, so that schema inference can still work 
if users read that directory later. The new behavior is more reasonable and 
more consistent regarding writing empty dataframe.
- - Since Spark 2.4, expression IDs in UDF arguments do not appear in 
column names. For example, an column name in Spark 2.4 is not `UDF:f(col0 AS 
colA#28)` but ``UDF:f(col0 AS `colA`)``.
- - Since Spark 2.4, writing a dataframe with an empty or nested empty 
schema using any file formats (parquet, orc, json, text, csv etc.) is not 
allowed. An exception is thrown when attempting to write dataframes with empty 
schema. 
- - Since Spark 2.4, Spark compares a DATE type with a TIMESTAMP type after 
promotes both sides to TIMESTAMP. To set `false` to 
`spark.sql.hive.compareDateTimestampInTimestamp` restores the previous 
behavior. This option will be removed in Spark 3.0.
- - Since Spark 2.4, creating a managed table with nonempty location is not 
allowed. An exception is thrown when attempting to create a managed table with 
nonempty location. To set `true` to 
`spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the 
previous behavior. This option will be removed in Spark 3.0.
- - Since Spark 2.4, the type coercion rules can automatically promote the 
argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest 
common type, no matter how the input arguments order. In prior Spark versions, 
the promotion could fail in some specific orders (e.g., TimestampType, 
IntegerType and StringType) and throw an exception.
+  - Since Spark 2.4, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing a 0-partition dataframe, so that schema inference can still work 
if users read that directory later. The new behavior is more reasonable and 
more consistent regarding writing empty dataframe.
+  - Since Spark 2.4, expression IDs in UDF arguments do not appear in 
column names. For example, an column name in Spark 2.4 is not `UDF:f(col0 AS 
colA#28)` but ``UDF:f(col0 AS `colA`)``.
+  - Since Spark 2.4, writing a dataframe with an empty or nested empty 
schema using any file formats (parquet, orc, json, text, csv etc.) is not 
allowed. An exception is thrown when attempting to write dataframes with empty 
schema.
--- End diff --

what's a nested empty schema?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...

2018-04-26 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21174
  
@maropu So with the fix, if the query predicate contains an scalar subquery 
expression, then that expression is not considered for partition pruning.  For 
example, if the predicate was , part_key1 = (select ...) and part_key2 = 5 , 
then only the 2nd part of the expression is considered for pruning purposes and 
the first part will be a regular filter. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21165
  
> We should not do these 2 things together, and to me the second one is way 
simpler to get in and we should do it first.

Agreed. For the scope of this pr, let's get killed tasks's accumulators 
into metrics first. After that we can discuss the possibility to expose the 
ability under users' request.

> but please make sure this patch only touches internal accumulators that 
are used for metrics reporting.

After a second look, this part is already be handled by Task's 
collectAccumulatorUpdates:
```
  def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
if (context != null) {
  // Note: internal accumulators representing task metrics always count 
failed values
  context.taskMetrics.nonZeroInternalAccums() ++
// zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not
// filter them out.
context.taskMetrics.externalAccums.filter(a => !taskFailed || 
a.countFailedValues)
} else {
  Seq.empty
}
  }

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-26 Thread manbuyun

Github user manbuyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184594822
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,12 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("writeFully() does not affect original buffer's position") {
--- End diff --

Done. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21176: [SPARK-24109] Remove class SnappyOutputStreamWrapper

2018-04-26 Thread manbuyun

Github user manbuyun commented on the issue:

https://github.com/apache/spark/pull/21176
  
@maropu Ok. It sounds reasonable to me


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21176: [SPARK-24109] Remove class SnappyOutputStreamWrap...

2018-04-26 Thread manbuyun

Github user manbuyun closed the pull request at:

https://github.com/apache/spark/pull/21176


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21165
  
> For example user may want to record CPU time for every task and get the 
total CPU time for the application.

The problem is, shall we allow end users to collect metrics via 
accumulators? Currently only Spark can do that via internal accumulators which 
count failed tasks. We need a careful API design about how to expose this 
ability in the end users.

In the meanwhile, since we already count failed tasks, it makes sense to 
also count killed tasks for internal metrics collecting.

We should not do these 2 things together, and to me the second one is way 
simpler to get in and we should do it first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89909/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89909/testReport)**
 for PR 21088 at commit 
[`932b7d1`](https://github.com/apache/spark/commit/932b7d197d12f0fe714f369d27482d402ac4a695).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2709/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21169
  
**[Test build #89915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89915/testReport)**
 for PR 21169 at commit 
[`d248d4c`](https://github.com/apache/spark/commit/d248d4c56d12a8d287c32a84ddca2b4037a89208).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-26 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184590989
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,12 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("writeFully() does not affect original buffer's position") {
--- End diff --

Hi @manbuyun .You should add a new unit test to support your own change. 
For example, "writeFully() can write buffer which is larger than 
`bufferWriteChunkSize` correctly. " And update the test code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89908/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21174
  
One question; we have no risk to miss any partition pruning by this fix?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89908/testReport)**
 for PR 21088 at commit 
[`27cc29c`](https://github.com/apache/spark/commit/27cc29cd62b704318ceddde82e279f7bd7f6e15f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184590539
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
   - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader 
for ORC files by default. To do that, `spark.sql.orc.impl` and 
`spark.sql.orc.filterPushdown` change their default values to `native` and 
`true` respectively.
   - In PySpark, when Arrow optimization is enabled, previously `toPandas` 
just failed when Arrow optimization is unable to be used whereas 
`createDataFrame` from Pandas DataFrame allowed the fallback to 
non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas 
DataFrame allow the fallback by default, which can be switched off by 
`spark.sql.execution.arrow.fallback.enabled`.
- - Since Spark 2.4, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing a 0-partition dataframe, so that schema inference can still work 
if users read that directory later. The new behavior is more reasonable and 
more consistent regarding writing empty dataframe.
- - Since Spark 2.4, expression IDs in UDF arguments do not appear in 
column names. For example, an column name in Spark 2.4 is not `UDF:f(col0 AS 
colA#28)` but ``UDF:f(col0 AS `colA`)``.
- - Since Spark 2.4, writing a dataframe with an empty or nested empty 
schema using any file formats (parquet, orc, json, text, csv etc.) is not 
allowed. An exception is thrown when attempting to write dataframes with empty 
schema. 
- - Since Spark 2.4, Spark compares a DATE type with a TIMESTAMP type after 
promotes both sides to TIMESTAMP. To set `false` to 
`spark.sql.hive.compareDateTimestampInTimestamp` restores the previous 
behavior. This option will be removed in Spark 3.0.
- - Since Spark 2.4, creating a managed table with nonempty location is not 
allowed. An exception is thrown when attempting to create a managed table with 
nonempty location. To set `true` to 
`spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the 
previous behavior. This option will be removed in Spark 3.0.
- - Since Spark 2.4, the type coercion rules can automatically promote the 
argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest 
common type, no matter how the input arguments order. In prior Spark versions, 
the promotion could fail in some specific orders (e.g., TimestampType, 
IntegerType and StringType) and throw an exception.
+  - Since Spark 2.4, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing a 0-partition dataframe, so that schema inference can still work 
if users read that directory later. The new behavior is more reasonable and 
more consistent regarding writing empty dataframe.
--- End diff --

fix indentation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method has no...

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21175
  
Plz add `[CORE]` in the title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21176: [SPARK-24109] Remove class SnappyOutputStreamWrapper

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21176
  
Have you checked the @srowen 's comment? 
https://github.com/apache/spark/pull/18949#issuecomment-323354674


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2708/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21131: [SPARK-23433][CORE] Late zombie task completions update ...

2018-04-26 Thread Ngone51

Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/21131
  
LGTM, and nice UT.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184589060
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -782,6 +782,22 @@ object TypeCoercion {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
+  // Special rules for `to/from_utc_timestamp`. 
`to/from_utc_timestamp` assumes its input is
+  // in UTC timezone, and if input is string, it should not contain 
timezone.
+  // TODO: We should move the type coercion logic to expressions 
instead of a central
+  // place to put all the rules.
+  case e: FromUTCTimestamp if e.left.dataType == StringType =>
--- End diff --

Catalyst suggests rules in the same batch is order insensitive, since here 
this rule must be run before implicit type cast, we'd better put them in the 
same rule to guarantee the order.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21169
  
**[Test build #89914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89914/testReport)**
 for PR 21169 at commit 
[`cbe37f2`](https://github.com/apache/spark/commit/cbe37f2e8f1f50aaa95b103d7b2f06e0ab756838).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method has no...

2018-04-26 Thread Ngone51

Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/21175
  
@manbuyun you need to add the unit test into `ChunkedByteBufferSuite.scala` 
and push a new commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21176: [SPARK-24109] Remove class SnappyOutputStreamWrapper

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21176
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21176: [SPARK-24109] Remove class SnappyOutputStreamWrapper

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21176
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21176: [SPARK-24109] Remove class SnappyOutputStreamWrap...

2018-04-26 Thread manbuyun

GitHub user manbuyun opened a pull request:

https://github.com/apache/spark/pull/21176

[SPARK-24109] Remove class SnappyOutputStreamWrapper

JIRA: 
https://issues.apache.org/jira/browse/SPARK-24109?jql=text%20~%20%22SnappyOutputStreamWrapper%22

Wrapper over `SnappyOutputStream` which guards against write-after-close 
and double-close
issues. See SPARK-7660 for more details.

This wrapping can be removed if we upgrade to a version
of snappy-java that contains the fix for 
https://github.com/xerial/snappy-java/issues/107.

snappy-java:1.1.2+ fixed the bug

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manbuyun/spark 
refactor-SnappyOutputStreamWrapper

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21176


commit 8b93ca15bf4a5d56cd0d4c6694a7f723877a147a
Author: WangJinhai02 
Date:   2018-04-27T03:40:11Z

remove SnappyOutputStreamWrapper

commit 06c8c2f60d5665da00234868b1186283e0fcd894
Author: WangJinhai02 
Date:   2018-04-27T03:42:34Z

Merge branch 'master' of https://github.com/apache/spark into 
refactor-SnappyOutputStreamWrapper




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21165
  
> However, I don't agree user side accumulators should get updates from 
killed tasks, that changes the semantic of accumulators. And I don't think 
end-users need to care about killed tasks. Similarly, when we implement task 
metrics, we need to count failed tasks, but user side accumulator still skips 
failed tasks. I think we should also follow that approach.

I don't agree that end-user didn't care killed tasks. For example user may 
want to record CPU time for every task and get the total CPU time for the 
application. However the default behaviour should keep backward-compatibility 
with existing behaviour.

```
private[spark] case class AccumulatorMetadata(
id: Long,
name: Option[String],
countFailedValues: Boolean) extends Serializable
```
The metadata has `countFailedValues` field, we can use this or add a new 
field?

However we didn't expose this field to end user...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20522: [SPARK-23355][SQL] convertMetastore should not ignore ta...

2018-04-26 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20522
  
Thank you for review and merge, @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method has no...

2018-04-26 Thread manbuyun

Github user manbuyun commented on the issue:

https://github.com/apache/spark/pull/21175
  
  test("writeFully() does not affect original buffer's position") {
val chunkedByteBuffer = new 
ChunkedByteBuffer(Array(ByteBuffer.allocate(80*1024*1024)))
chunkedByteBuffer.writeFully(new 
ByteArrayWritableChannel(chunkedByteBuffer.size.toInt))
assert(chunkedByteBuffer.getChunks().head.position() === 0)
  }


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method has no...

2018-04-26 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21175
  
Would it be possible to add a unit test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][YARN] added check to ensure main method is...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][YARN] added check to ensure main method is...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89913/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][YARN] added check to ensure main method is...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21168
  
**[Test build #89913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89913/testReport)**
 for PR 21168 at commit 
[`10a6232`](https://github.com/apache/spark/commit/10a623268bf67fcc34f6dc7adac696f7052ca712).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17086
  
overall good, @jkbradley Would you mind take a look ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r184584878
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
 ---
@@ -55,44 +60,128 @@ class MulticlassMetricsSuite extends SparkFunSuite 
with MLlibTestSparkContext {
 val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * 
precision1 + recall1)
 val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * 
precision2 + recall2)
 
-
assert(metrics.confusionMatrix.toArray.sameElements(confusionMatrix.toArray))
-assert(math.abs(metrics.truePositiveRate(0.0) - tpRate0) < delta)
-assert(math.abs(metrics.truePositiveRate(1.0) - tpRate1) < delta)
-assert(math.abs(metrics.truePositiveRate(2.0) - tpRate2) < delta)
-assert(math.abs(metrics.falsePositiveRate(0.0) - fpRate0) < delta)
-assert(math.abs(metrics.falsePositiveRate(1.0) - fpRate1) < delta)
-assert(math.abs(metrics.falsePositiveRate(2.0) - fpRate2) < delta)
-assert(math.abs(metrics.precision(0.0) - precision0) < delta)
-assert(math.abs(metrics.precision(1.0) - precision1) < delta)
-assert(math.abs(metrics.precision(2.0) - precision2) < delta)
-assert(math.abs(metrics.recall(0.0) - recall0) < delta)
-assert(math.abs(metrics.recall(1.0) - recall1) < delta)
-assert(math.abs(metrics.recall(2.0) - recall2) < delta)
-assert(math.abs(metrics.fMeasure(0.0) - f1measure0) < delta)
-assert(math.abs(metrics.fMeasure(1.0) - f1measure1) < delta)
-assert(math.abs(metrics.fMeasure(2.0) - f1measure2) < delta)
-assert(math.abs(metrics.fMeasure(0.0, 2.0) - f2measure0) < delta)
-assert(math.abs(metrics.fMeasure(1.0, 2.0) - f2measure1) < delta)
-assert(math.abs(metrics.fMeasure(2.0, 2.0) - f2measure2) < delta)
+assert(metrics.confusionMatrix.asML ~== confusionMatrix.asML relTol 
delta)
+assert(metrics.truePositiveRate(0.0) ~== tpRate0 absTol delta)
+assert(metrics.truePositiveRate(1.0) ~== tpRate1 absTol delta)
+assert(metrics.truePositiveRate(2.0) ~== tpRate2 absTol delta)
+assert(metrics.falsePositiveRate(0.0) ~== fpRate0 absTol delta)
+assert(metrics.falsePositiveRate(1.0) ~== fpRate1 absTol delta)
+assert(metrics.falsePositiveRate(2.0) ~== fpRate2 absTol delta)
+assert(metrics.precision(0.0) ~== precision0 absTol delta)
+assert(metrics.precision(1.0) ~== precision1 absTol delta)
+assert(metrics.precision(2.0) ~== precision2 absTol delta)
+assert(metrics.recall(0.0) ~== recall0 absTol delta)
+assert(metrics.recall(1.0) ~== recall1 absTol delta)
+assert(metrics.recall(2.0) ~== recall2 absTol delta)
+assert(metrics.fMeasure(0.0) ~== f1measure0 absTol delta)
+assert(metrics.fMeasure(1.0) ~== f1measure1 absTol delta)
+assert(metrics.fMeasure(2.0) ~== f1measure2 absTol delta)
+assert(metrics.fMeasure(0.0, 2.0) ~== f2measure0 absTol delta)
+assert(metrics.fMeasure(1.0, 2.0) ~== f2measure1 absTol delta)
+assert(metrics.fMeasure(2.0, 2.0) ~== f2measure2 absTol delta)
+
+assert(metrics.accuracy ~==
+  (2.0 + 3.0 + 1.0) / ((2 + 3 + 1) + (1 + 1 + 1)) absTol delta)
+assert(metrics.accuracy ~== metrics.precision absTol delta)
+assert(metrics.accuracy ~== metrics.recall absTol delta)
+assert(metrics.accuracy ~== metrics.fMeasure absTol delta)
+assert(metrics.accuracy ~== metrics.weightedRecall absTol delta)
+val weight0 = 4.0 / 9
+val weight1 = 4.0 / 9
+val weight2 = 1.0 / 9
+assert(metrics.weightedTruePositiveRate ~==
+  (weight0 * tpRate0 + weight1 * tpRate1 + weight2 * tpRate2) absTol 
delta)
+assert(metrics.weightedFalsePositiveRate ~==
+  (weight0 * fpRate0 + weight1 * fpRate1 + weight2 * fpRate2) absTol 
delta)
+assert(metrics.weightedPrecision ~==
+  (weight0 * precision0 + weight1 * precision1 + weight2 * precision2) 
absTol delta)
+assert(metrics.weightedRecall ~==
+  (weight0 * recall0 + weight1 * recall1 + weight2 * recall2) absTol 
delta)
+assert(metrics.weightedFMeasure ~==
+  (weight0 * f1measure0 + weight1 * f1measure1 + weight2 * f1measure2) 
absTol delta)
+assert(metrics.weightedFMeasure(2.0) ~==
+  (weight0 * f2measure0 + weight1 * f2measure1 + weight2 * f2measure2) 
absTol delta)
+assert(metrics.labels === labels)
+  }
+
+  test("Multiclass evaluation metrics with weights") {
+/*
+ * Confusion matrix for 3-class classification with total 9 instances 
with 2 weights:
+ * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances)
+ * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances)
+ * |0 |0  |1 * w2| true class2

[GitHub] spark issue #21168: [SPARK-23830][YARN] added check to ensure main method is...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21168
  
**[Test build #89913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89913/testReport)**
 for PR 21168 at commit 
[`10a6232`](https://github.com/apache/spark/commit/10a623268bf67fcc34f6dc7adac696f7052ca712).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method has no...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21175
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21165
  
I do agree task killed event should carry metrics update, as it's 
reasonable to count killed tasks for something like how many bytes were read 
from files.

However, I don't agree user side accumulators should get updates from 
killed tasks, that changes the semantic of accumulators. And I don't think 
end-users need to care about killed tasks.

I haven't read the PR yet, but please make sure this patch only touches 
internal accumulators that are used for metrics reporting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method has no...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21175
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107] ChunkedByteBuffer.writeFully method...

2018-04-26 Thread manbuyun

GitHub user manbuyun opened a pull request:

https://github.com/apache/spark/pull/21175

[SPARK-24107] ChunkedByteBuffer.writeFully method has not reset the limit 
value

JIRA Issue: 
https://issues.apache.org/jira/browse/SPARK-24107?jql=text%20~%20%22ChunkedByteBuffer%22

ChunkedByteBuffer.writeFully method has not reset the limit value. When 
chunks larger than bufferWriteChunkSize, such as 80*1024*1024 larger than
config.BUFFER_WRITE_CHUNK_SIZE(64 * 1024 * 1024)ï¼only while once, will 
lost 16*1024*1024 byte

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manbuyun/spark bugfix-ChunkedByteBuffer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21175


commit fae181433ca1eda6be0ad450223d73c6eb5f3f35
Author: WangJinhai02 
Date:   2018-04-26T14:43:44Z

restore bytes limit value




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/21165
  
I add a note for accumulator update. Please comment if more document is 
needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20522: [SPARK-23355][SQL] convertMetastore should not ig...

2018-04-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20522


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20522: [SPARK-23355][SQL] convertMetastore should not ignore ta...

2018-04-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20522
  
LGTM, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21174
  
plz add `[SQL]` in the title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21174
  
Ah, ok. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][CORE] added check to ensure main method is...

2018-04-26 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21168
  
Also this is a yarn, not core, change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21136
  
**[Test build #89912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89912/testReport)**
 for PR 21136 at commit 
[`77ee1c7`](https://github.com/apache/spark/commit/77ee1c7c9aaa3e3200f2fd029c2f0896c155cdcc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89904/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89904/testReport)**
 for PR 21088 at commit 
[`1758577`](https://github.com/apache/spark/commit/17585773104fa0169f6e162b656873a3ed11786a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...

2018-04-26 Thread yanlin-Lynn

Github user yanlin-Lynn commented on a diff in the pull request:

https://github.com/apache/spark/pull/21136#discussion_r184576733
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
 ---
@@ -771,6 +778,16 @@ class UnsupportedOperationsSuite extends SparkFunSuite 
{
 }
   }
 
+  /** Assert that the logical plan is not supportd for continuous 
procsssing mode */
--- End diff --

ah, you got it!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21174
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21174
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21174
  
**[Test build #89906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89906/testReport)**
 for PR 21174 at commit 
[`38c7692`](https://github.com/apache/spark/commit/38c769274fca2931d0b0147e5e666b9cd7c99f59).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89903/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21068
  
**[Test build #89903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89903/testReport)**
 for PR 21068 at commit 
[`17bbbee`](https://github.com/apache/spark/commit/17bbbee0cf952a32e44fd0767bba08814e351da2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-26 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21068#discussion_r184575258
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -170,8 +170,7 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 if (executorDataMap.contains(executorId)) {
   executorRef.send(RegisterExecutorFailed("Duplicate executor ID: 
" + executorId))
   context.reply(true)
-} else if (scheduler.nodeBlacklist != null &&
-  scheduler.nodeBlacklist.contains(hostname)) {
+} else if 
(scheduler.nodeBlacklistWithExpiryTimes.contains(hostname)) {
--- End diff --

this change seems to be causing all the test failures.  Its because there 
are tests that use a mock TaskSchedulerImpl, and so 
`scheduler.nodeBlacklistWithExpiryTimes` returns null from the mock.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18717: [SPARK-21510] [SQL] Add isMaterialized() and eager persi...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18717
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18717: [SPARK-21510] [SQL] Add isMaterialized() and eager persi...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18717
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89907/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18717: [SPARK-21510] [SQL] Add isMaterialized() and eager persi...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18717
  
**[Test build #89907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89907/testReport)**
 for PR 18717 at commit 
[`31bf797`](https://github.com/apache/spark/commit/31bf79742a219ca13cca2a6774da783b061864d1).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][CORE] added check to ensure main method is...

2018-04-26 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21168
  
The change is fail to build, please fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21173
  
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89905/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21173: [SPARK-23856][SQL] Add an option `queryTimeout` in JDBCO...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21173
  
**[Test build #89905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89905/testReport)**
 for PR 21173 at commit 
[`f134548`](https://github.com/apache/spark/commit/f134548bd6b7b9f2bc2c508698404a61eb9ea43e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][CORE] added check to ensure main method is...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][CORE] added check to ensure main method is...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21168
  
**[Test build #89911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89911/testReport)**
 for PR 21168 at commit 
[`3d90a5e`](https://github.com/apache/spark/commit/3d90a5e61b69d6483dd90ce60324436bbbf2c42d).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][CORE] added check to ensure main method is...

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89911/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: [SPARK-23830][CORE] added check to ensure main method is...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21168
  
**[Test build #89911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89911/testReport)**
 for PR 21168 at commit 
[`3d90a5e`](https://github.com/apache/spark/commit/3d90a5e61b69d6483dd90ce60324436bbbf2c42d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184569738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -782,6 +782,22 @@ object TypeCoercion {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
+  // Special rules for `to/from_utc_timestamp`. 
`to/from_utc_timestamp` assumes its input is
+  // in UTC timezone, and if input is string, it should not contain 
timezone.
+  // TODO: We should move the type coercion logic to expressions 
instead of a central
+  // place to put all the rules.
+  case e: FromUTCTimestamp if e.left.dataType == StringType =>
--- End diff --

Should these checks go in their own rule that runs before ImplicitTypeCasts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21168
  
**[Test build #89910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89910/testReport)**
 for PR 21168 at commit 
[`571ae40`](https://github.com/apache/spark/commit/571ae400a226af53e0ccd7e5d9e3ba837f2f8ad1).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89910/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21168
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21168
  
**[Test build #89910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89910/testReport)**
 for PR 21168 at commit 
[`571ae40`](https://github.com/apache/spark/commit/571ae400a226af53e0ccd7e5d9e3ba837f2f8ad1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
Please go ahead if there's another approach to avoid to remove but fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21168: added check to ensure main method is found [SPARK-23830]

2018-04-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21168
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21107: [SPARK-24044][PYTHON] Explicitly print out skipped tests...

2018-04-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21107
  
Thank you for reviewing this @bersprockets, @viirya, @BryanCutler, 
@icexelloss and @felixcheung.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184565093
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/datetime.sql ---
@@ -27,3 +27,8 @@ select current_date = current_date(), current_timestamp = 
current_timestamp(), a
 select a, b from ttf2 order by a, current_date;
 
 select weekday('2007-02-03'), weekday('2009-07-30'), 
weekday('2017-05-27'), weekday(null), weekday('1582-10-15 13:10:15');
+
--- End diff --

Does it matter if there are no success cases for from_utc_timestamp and 
to_utc_timestamp in here (that is, cases that don't return null)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184561518
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -296,10 +296,27 @@ object DateTimeUtils {
* `T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us]+[h]h:[m]m`
*/
   def stringToTimestamp(s: UTF8String): Option[SQLTimestamp] = {
-stringToTimestamp(s, defaultTimeZone())
+stringToTimestamp(s, defaultTimeZone(), forceTimezone = false)
   }
 
   def stringToTimestamp(s: UTF8String, timeZone: TimeZone): 
Option[SQLTimestamp] = {
+stringToTimestamp(s, timeZone, forceTimezone = false)
+  }
+
+  /**
+   * Converts a timestamp string to microseconds from the unix epoch, 
w.r.t. the given timezone.
+   * Returns None if the input string is not a valid timestamp format.
+   *
+   * @param s the input timestamp string.
+   * @param timeZone the timezone of the timestamp string, will be ignored 
if the timestamp string
+   * already contains timezone information and 
`forceTimezone` is false.
+   * @param forceTimezone if true, force to apply the given timezone to 
the timestamp string. If the
+   *  timestamp string already contains timezone, 
return None.
+   */
+  def stringToTimestamp(
+  s: UTF8String,
+  timeZone: TimeZone,
+  forceTimezone: Boolean): Option[SQLTimestamp] = {
--- End diff --

It seems more like rejectTzInString or rejectStringTZ.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89909/testReport)**
 for PR 21088 at commit 
[`932b7d1`](https://github.com/apache/spark/commit/932b7d197d12f0fe714f369d27482d402ac4a695).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread devaraj-kavali

Github user devaraj-kavali commented on the issue:

https://github.com/apache/spark/pull/21088
  
For standalone cluster(`DriverRunner.scala:182`), there is driverId which 
we can use here and for Mesos cluster(`MesosRestServer.scala:107`), there is 
submission ID available but there is no app ID available in these two cases. In 
k8s mode, it requires some change to do this(by updating the existing conf with 
substituted val) not sure whether it is worth changing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21088: [SPARK-24003][CORE] Add support to provide spark.executo...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21088
  
**[Test build #89908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89908/testReport)**
 for PR 21088 at commit 
[`27cc29c`](https://github.com/apache/spark/commit/27cc29cd62b704318ceddde82e279f7bd7f6e15f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r184566012
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
 ---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   ((4.0 / 9) * f2measure0 + (4.0 / 9) * f2measure1 + (1.0 / 9) * 
f2measure2)) < delta)
 assert(metrics.labels.sameElements(labels))
   }
+
+  test("Multiclass evaluation metrics with weights") {
+/*
+ * Confusion matrix for 3-class classification with total 9 instances 
with 2 weights:
+ * |2 * w1|1 * w2 |1 * w1| true class0 (4 instances)
+ * |1 * w2|2 * w1 + 1 * w2|0 | true class1 (4 instances)
+ * |0 |0  |1 * w2| true class2 (1 instance)
+ */
+val w1 = 2.2
+val w2 = 1.5
+val tw = 2.0 * w1 + 1.0 * w2 + 1.0 * w1 + 1.0 * w2 + 2.0 * w1 + 1.0 * 
w2 + 1.0 * w2
+val confusionMatrix = Matrices.dense(3, 3,
+  Array(2 * w1, 1 * w2, 0, 1 * w2, 2 * w1 + 1 * w2, 0, 1 * w1, 0, 1 * 
w2))
+val labels = Array(0.0, 1.0, 2.0)
+val predictionAndLabelsWithWeights = sc.parallelize(
+  Seq((0.0, 0.0, w1), (0.0, 1.0, w2), (0.0, 0.0, w1), (1.0, 0.0, w2),
+(1.0, 1.0, w1), (1.0, 1.0, w2), (1.0, 1.0, w1), (2.0, 2.0, w2),
+(2.0, 0.0, w1)), 2)
+val metrics = new MulticlassMetrics(predictionAndLabelsWithWeights)
+val delta = 0.001
+val tpRate0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1)
+val tpRate1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2)
+val tpRate2 = (1.0 * w2) / (1.0 * w2 + 0)
+val fpRate0 = (1.0 * w2) / (tw - (2.0 * w1 + 1.0 * w2 + 1.0 * w1))
+val fpRate1 = (1.0 * w2) / (tw - (1.0 * w2 + 2.0 * w1 + 1.0 * w2))
+val fpRate2 = (1.0 * w1) / (tw - (1.0 * w2))
+val precision0 = (2.0 * w1) / (2 * w1 + 1 * w2)
+val precision1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * 
w2)
+val precision2 = (1.0 * w2) / (1 * w1 + 1 * w2)
+val recall0 = (2.0 * w1) / (2.0 * w1 + 1.0 * w2 + 1.0 * w1)
+val recall1 = (2.0 * w1 + 1.0 * w2) / (2.0 * w1 + 1.0 * w2 + 1.0 * w2)
+val recall2 = (1.0 * w2) / (1.0 * w2 + 0)
+val f1measure0 = 2 * precision0 * recall0 / (precision0 + recall0)
+val f1measure1 = 2 * precision1 * recall1 / (precision1 + recall1)
+val f1measure2 = 2 * precision2 * recall2 / (precision2 + recall2)
+val f2measure0 = (1 + 2 * 2) * precision0 * recall0 / (2 * 2 * 
precision0 + recall0)
+val f2measure1 = (1 + 2 * 2) * precision1 * recall1 / (2 * 2 * 
precision1 + recall1)
+val f2measure2 = (1 + 2 * 2) * precision2 * recall2 / (2 * 2 * 
precision2 + recall2)
+
+
assert(metrics.confusionMatrix.toArray.sameElements(confusionMatrix.toArray))
--- End diff --

Oh, that's because you use `Matrices` in `mllib`, change it to `Matrices` 
in `ml`, i.e.,  `import org.apache.spark.ml.linalg.Matrices`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21174
  
@maropu Thanks for your response. ORC has CONVERT_METASTORE_ORC set to 
false as default. So its not converted to a file based datasource. If we set 
this to true then we would see the same issue for ORC. I have added test to 
cover the case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085] Query returns UnsupportedOperationExceptio...

2018-04-26 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21174
  
Why the orc works fine?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18717: [SPARK-21510] [SQL] Add isMaterialized() and eager persi...

2018-04-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18717
  
**[Test build #89907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89907/testReport)**
 for PR 18717 at commit 
[`31bf797`](https://github.com/apache/spark/commit/31bf79742a219ca13cca2a6774da783b061864d1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21151: [SPARK-24083][YARN] Log stacktrace for uncaught e...

2018-04-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21151


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21151: [SPARK-24083][YARN] Log stacktrace for uncaught exceptio...

2018-04-26 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21151
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 431 matches

Mail list logo