date:20180208

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20490
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20549
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/722/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20549
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20490
  
@cloud-fan, please have another look. I fixed the problems you spotted.

I haven't added support for the streaming side. It is different enough that 
I think we should do it in a follow-up. More details are on the thread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r167011220
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage 
message) {}
* failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
* is undefined and @{@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
*
-   * Note that, one partition may have multiple committed data writers 
because of speculative tasks.
-   * Spark will pick the first successful one and get its commit message. 
Implementations should be
-   * aware of this and handle it correctly, e.g., have a coordinator to 
make sure only one data
-   * writer can commit, or have a way to clean up the data of 
already-committed writers.
+   * Note that speculative execution may cause multiple tasks to run for a 
partition. By default,
+   * Spark uses the OutputCommitCoordinator to allow only one attempt to 
commit.
+   * {@link DataWriterFactory} implementations can disable this behavior. 
If disabled, multiple
--- End diff --

I clarified this and added a note about how to do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r167011250
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage 
message) {}
* failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
* is undefined and @{@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
*
-   * Note that, one partition may have multiple committed data writers 
because of speculative tasks.
-   * Spark will pick the first successful one and get its commit message. 
Implementations should be
-   * aware of this and handle it correctly, e.g., have a coordinator to 
make sure only one data
-   * writer can commit, or have a way to clean up the data of 
already-committed writers.
+   * Note that speculative execution may cause multiple tasks to run for a 
partition. By default,
+   * Spark uses the OutputCommitCoordinator to allow only one attempt to 
commit.
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r167011291
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
 ---
@@ -32,6 +32,16 @@
 @InterfaceStability.Evolving
 public interface DataWriterFactory extends Serializable {
 
+  /**
+   * Returns whether Spark should use the OutputCommitCoordinator to 
ensure that only one attempt
+   * for each task commits.
+   *
+   * @return true if commit coordinator should be used, false otherwise.
+   */
+  default boolean useCommitCoordinator() {
--- End diff --

I moved it. Good idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: SPARK-18844[MLLIB] Add more binary classification metric...

2018-02-08 Thread sandecho

Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20549
  


[SPARK-18844.zip](https://github.com/apache/spark/files/1708136/SPARK-18844.zip)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: SPARK-18844[MLLIB] Add more binary classification...

2018-02-08 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20549

SPARK-18844[MLLIB] Add more binary classification metrics to 
BinaryClassificationMetrics

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary 
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary 
Date:   2018-02-08T17:20:52Z

SPARK-18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87219/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20546
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20521
  
**[Test build #87219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87219/testReport)**
 for PR 20521 at commit 
[`87fd406`](https://github.com/apache/spark/commit/87fd406813d5bc263d07b8aa87a87d6b360133de).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20546
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87216/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r167009143
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
 ---
@@ -117,20 +118,43 @@ object DataWritingSparkTask extends Logging {
   writeTask: DataWriterFactory[InternalRow],
   context: TaskContext,
   iter: Iterator[InternalRow]): WriterCommitMessage = {
-val dataWriter = writeTask.createDataWriter(context.partitionId(), 
context.attemptNumber())
+val stageId = context.stageId()
+val partId = context.partitionId()
+val attemptId = context.attemptNumber()
+val dataWriter = writeTask.createDataWriter(partId, attemptId)
 
 // write the data and commit this writer.
 Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
   iter.foreach(dataWriter.write)
-  logInfo(s"Writer for partition ${context.partitionId()} is 
committing.")
-  val msg = dataWriter.commit()
-  logInfo(s"Writer for partition ${context.partitionId()} committed.")
+
+  val msg = if (writeTask.useCommitCoordinator) {
--- End diff --

Is it a good idea to add streaming to this commit?

The changes differ significantly. It isn't clear how commit coordination 
happens for streaming writes. The OutputCommitCoordinator's `canCommit` method 
takes stage, partition, and attempt ids, not epochs. Either the other 
components aren't ready to have commit coordination, or I'm not familiar enough 
with how it is done for streaming.

I think we can keep the two separate, and I'm happy to open a follow-up 
issue for the streaming side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87223 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87223/testReport)**
 for PR 20537 at commit 
[`2c1a258`](https://github.com/apache/spark/commit/2c1a2582c04a5b9cb7d011892343ca0a07ddb854).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87223/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20546
  
**[Test build #87216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87216/testReport)**
 for PR 20546 at commit 
[`3f5dba6`](https://github.com/apache/spark/commit/3f5dba62cfa8b07d602639e9f88f3fd8cec92831).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

GitHub user sandecho reopened a pull request:

https://github.com/apache/spark/pull/20549

Add more binary classification metrics to BinaryClassificationMetrics

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary 
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics

commit d7144f63a99e575d5c996fd7919bdbe44266620f
Author: Sandeep Kumar Choudhary 
Date:   2018-02-08T17:20:52Z

SPARK-18844




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20549: Add more binary classification metrics to BinaryClassifi...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20549
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

Github user sandecho closed the pull request at:

https://github.com/apache/spark/pull/20549


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r167005865
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +598,11 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, 
"prediction") {
+case actRows: Seq[Row] =>
+  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
+actRows(0).getDouble(0) absTol 1e-6
+  }
--- End diff --

Woah, didn't check the original functionality in such a depth. This is 
really dead code in runtime environment. Fixed ð 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20549: Add more binary classification metrics to BinaryC...

2018-02-08 Thread sandecho

GitHub user sandecho opened a pull request:

https://github.com/apache/spark/pull/20549

Add more binary classification metrics to BinaryClassificationMetrics

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandecho/spark new_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20549


commit 9f33d677586043fe7c75ac1930c51c138f281a49
Author: Sandeep Kumar Choudhary 
Date:   2018-02-08T16:49:13Z

Add more binary classification metrics to BinaryClassificationMetrics




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167005107
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
+projection match {
+  case Some(attrs) =>
+// use the projection attributes to avoid assigning new ids. 
fields that are not projected
+// will be assigned new ids, which is okay because they are not 
projected.
+val attrMap = attrs.map(a => a.name -> a).toMap
+schema.map(f => attrMap.getOrElse(f.name,
+  AttributeReference(f.name, f.dataType, f.nullable, 
f.metadata)()))
+  case _ =>
+schema.toAttributes
+}
+  }
+
+  private lazy val v2Options: DataSourceOptions = {
+// ensure path and table options are set correctly
+val updatedOptions = new mutable.HashMap[String, String]
+updatedOptions ++= options
+
+new DataSourceOptions(options.asJava)
+  }
+
+  private val sourceName: String = {
+source match {
+  case registered: DataSourceRegister =>
+registered.shortName()
+  case _ =>
+source.getClass.getSimpleName
+}
+  }
+
+  lazy val (
+  reader: DataSourceReader,
+  unsupportedFilters: Seq[Expression],
+  pushedFilters: Seq[Expression]) = {
+val newReader = userSchema match {
+  case Some(s) =>
+asReadSupportWithSchema.createReader(s, v2Options)
+  case _ =>
+asReadSupport.createReader(v2Options)
+}
+
+projection.foreach { attrs =>
+  DataSourceV2Relation.pushRequiredColumns(newReader, 
attrs.toStructType)
+}
+
+val (remainingFilters, pushedFilters) = filters match {
+  case Some(filterSeq) =>
+DataSourceV2Relation.pushFilters(newReader, filterSeq)
+  case _ =>
+(Nil, Nil)
+}
+
+(newReader, remainingFilters, pushedFilters)
+  }
 
-  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2Relation]
+  def writer(dfSchema: StructType, mode: SaveMode): 
Option[DataSourceWriter] = {
--- End diff --

If you want this removed to get this commit in, please say so.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167004846
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
--- End diff --

That commit was not reverted when I rebased. The test is still present and 
passing: 
https://github.com/apache/spark/blob/181946d1f1c5889661544830a77bd23c4b4f685a/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala#L320-L336


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167001183
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
+projection match {
+  case Some(attrs) =>
+// use the projection attributes to avoid assigning new ids. 
fields that are not projected
+// will be assigned new ids, which is okay because they are not 
projected.
+val attrMap = attrs.map(a => a.name -> a).toMap
+schema.map(f => attrMap.getOrElse(f.name,
+  AttributeReference(f.name, f.dataType, f.nullable, 
f.metadata)()))
+  case _ =>
+schema.toAttributes
+}
+  }
+
+  private lazy val v2Options: DataSourceOptions = {
+// ensure path and table options are set correctly
+val updatedOptions = new mutable.HashMap[String, String]
+updatedOptions ++= options
+
+new DataSourceOptions(options.asJava)
--- End diff --

Also, keep in mind that this is a lazy val. It is only referenced when 
creating a reader or writer


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87222/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87222 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87222/testReport)**
 for PR 20537 at commit 
[`f6b5d28`](https://github.com/apache/spark/commit/f6b5d2868c3ca7c8c2cc2bfb6e7a06ce7c01998c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87223 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87223/testReport)**
 for PR 20537 at commit 
[`2c1a258`](https://github.com/apache/spark/commit/2c1a2582c04a5b9cb7d011892343ca0a07ddb854).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/721/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r166998163
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
--- End diff --

I'll have a look. I didn't realize you'd committed that one already.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r166997697
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
+projection match {
+  case Some(attrs) =>
+// use the projection attributes to avoid assigning new ids. 
fields that are not projected
+// will be assigned new ids, which is okay because they are not 
projected.
+val attrMap = attrs.map(a => a.name -> a).toMap
+schema.map(f => attrMap.getOrElse(f.name,
+  AttributeReference(f.name, f.dataType, f.nullable, 
f.metadata)()))
+  case _ =>
+schema.toAttributes
+}
+  }
+
+  private lazy val v2Options: DataSourceOptions = {
+// ensure path and table options are set correctly
+val updatedOptions = new mutable.HashMap[String, String]
+updatedOptions ++= options
+
+new DataSourceOptions(options.asJava)
+  }
+
+  private val sourceName: String = {
+source match {
+  case registered: DataSourceRegister =>
+registered.shortName()
+  case _ =>
+source.getClass.getSimpleName
+}
+  }
+
+  lazy val (
+  reader: DataSourceReader,
+  unsupportedFilters: Seq[Expression],
+  pushedFilters: Seq[Expression]) = {
+val newReader = userSchema match {
+  case Some(s) =>
+asReadSupportWithSchema.createReader(s, v2Options)
+  case _ =>
+asReadSupport.createReader(v2Options)
+}
+
+projection.foreach { attrs =>
+  DataSourceV2Relation.pushRequiredColumns(newReader, 
attrs.toStructType)
+}
+
+val (remainingFilters, pushedFilters) = filters match {
+  case Some(filterSeq) =>
+DataSourceV2Relation.pushFilters(newReader, filterSeq)
+  case _ =>
+(Nil, Nil)
+}
+
+(newReader, remainingFilters, pushedFilters)
+  }
 
-  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2Relation]
+  def writer(dfSchema: StructType, mode: SaveMode): 
Option[DataSourceWriter] = {
--- End diff --

No, it isn't. But a relation should be able to return a writer. This is 
going to be needed as we improve the logical plans used by v2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r166997241
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{AnalysisException, SaveMode}
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Option[Seq[AttributeReference]] = None,
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
+
+  override def simpleString: String = {
+s"DataSourceV2Relation(source=$sourceName, " +
+  s"schema=[${output.map(a => s"$a 
${a.dataType.simpleString}").mkString(", ")}], " +
+  s"filters=[${pushedFilters.mkString(", ")}], options=$options)"
+  }
+
+  override lazy val schema: StructType = reader.readSchema()
+
+  override lazy val output: Seq[AttributeReference] = {
+projection match {
+  case Some(attrs) =>
+// use the projection attributes to avoid assigning new ids. 
fields that are not projected
+// will be assigned new ids, which is okay because they are not 
projected.
+val attrMap = attrs.map(a => a.name -> a).toMap
+schema.map(f => attrMap.getOrElse(f.name,
+  AttributeReference(f.name, f.dataType, f.nullable, 
f.metadata)()))
+  case _ =>
+schema.toAttributes
+}
+  }
+
+  private lazy val v2Options: DataSourceOptions = {
+// ensure path and table options are set correctly
+val updatedOptions = new mutable.HashMap[String, String]
+updatedOptions ++= options
+
+new DataSourceOptions(options.asJava)
--- End diff --

As we've already discussed at length, I think it is a bad idea to create 
`DataSourceOptions` and pass it to the relation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20525
  
@gatorsmile Thanks. I will create a doc pr and address it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r166995080
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage 
message) {}
* failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
* is undefined and @{@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
*
-   * Note that, one partition may have multiple committed data writers 
because of speculative tasks.
-   * Spark will pick the first successful one and get its commit message. 
Implementations should be
-   * aware of this and handle it correctly, e.g., have a coordinator to 
make sure only one data
-   * writer can commit, or have a way to clean up the data of 
already-committed writers.
+   * Note that speculative execution may cause multiple tasks to run for a 
partition. By default,
+   * Spark uses the OutputCommitCoordinator to allow only one attempt to 
commit.
+   * {@link DataWriterFactory} implementations can disable this behavior. 
If disabled, multiple
--- End diff --

It says that already: "DataWriterFactory implementations can disable this 
behavior."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87221 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87221/testReport)**
 for PR 20537 at commit 
[`304666a`](https://github.com/apache/spark/commit/304666ad089d497d666de25476955da52aae5395).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87221/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87222/testReport)**
 for PR 20537 at commit 
[`f6b5d28`](https://github.com/apache/spark/commit/f6b5d2868c3ca7c8c2cc2bfb6e7a06ce7c01998c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/720/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...

2018-02-08 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20408
  
just a quick note -- I realized I was confused about one part of the inner 
workings of the history server which I want to confirm before I merge this, but 
got sick and now have a bit of a backlog.  Will get back to this next week.  
Sorry for the delay


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Remove StorageStatus, or make it pri...

2018-02-08 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20546
  
This won't go into 2.3.

Also, please don't copy & paste the bug title in your PR. Explain what 
you're doing instead. The current title does not explain what the change does.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20537
  
**[Test build #87221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87221/testReport)**
 for PR 20537 at commit 
[`304666a`](https://github.com/apache/spark/commit/304666ad089d497d666de25476955da52aae5395).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/719/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87220/testReport)**
 for PR 20477 at commit 
[`4bff16d`](https://github.com/apache/spark/commit/4bff16deb6debda55ae7bdcbe56fbc527ff73d23).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87220/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87220/testReport)**
 for PR 20477 at commit 
[`4bff16d`](https://github.com/apache/spark/commit/4bff16deb6debda55ae7bdcbe56fbc527ff73d23).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/718/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20521
  
**[Test build #87219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87219/testReport)**
 for PR 20521 at commit 
[`87fd406`](https://github.com/apache/spark/commit/87fd406813d5bc263d07b8aa87a87d6b360133de).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/717/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

2018-02-08 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20537#discussion_r166975718
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, 
timezone):
 # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
 if is_datetime64_dtype(s.dtype):
 tz = timezone or 'tzlocal()'
-return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+"""
+tz_localize with ambiguous=False has the same behavior of 
pytz.localize
+>>> import datetime
+>>> import pandas as pd
+>>> import pytz
+>>>
+>>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+>>> ts = pd.Series([t])
+>>> tz = pytz.timezone('America/New_York')
+>>>
+>>> ts.dt.tz_localize(tz, ambiguous=False)
+>>> 0   2015-11-01 01:23:24-05:00
+>>> dtype: datetime64[ns, America/New_York]
+>>>
+>>> ts.dt.tz_localize(tz, ambiguous=True)
+>>> 0   2015-11-01 01:23:24-04:00
+>>> dtype: datetime64[ns, America/New_York]
+>>>
+>>> str(tz.localize(t))
+>>> '2015-11-01 01:23:24-05:00'
+"""
+return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
--- End diff --

Yes will create a new for `pandas_udf`.

Seems `ambiguous=False` is undocumented in the method doc, @jreback can you 
please confirm this usage is correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2018-02-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19077


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

2018-02-08 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20537#discussion_r166974650
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, 
timezone):
 # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
 if is_datetime64_dtype(s.dtype):
 tz = timezone or 'tzlocal()'
-return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+"""
+tz_localize with ambiguous=False has the same behavior of 
pytz.localize
+>>> import datetime
+>>> import pandas as pd
+>>> import pytz
+>>>
+>>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+>>> ts = pd.Series([t])
+>>> tz = pytz.timezone('America/New_York')
+>>>
+>>> ts.dt.tz_localize(tz, ambiguous=False)
+>>> 0   2015-11-01 01:23:24-05:00
+>>> dtype: datetime64[ns, America/New_York]
+>>>
+>>> ts.dt.tz_localize(tz, ambiguous=True)
+>>> 0   2015-11-01 01:23:24-04:00
+>>> dtype: datetime64[ns, America/New_York]
+>>>
+>>> str(tz.localize(t))
+>>> '2015-11-01 01:23:24-05:00'
--- End diff --

Yeah Let me clean up the format...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

2018-02-08 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20537#discussion_r166974367
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, 
timezone):
 # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
 if is_datetime64_dtype(s.dtype):
 tz = timezone or 'tzlocal()'
-return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+"""
+tz_localize with ambiguous=False has the same behavior of 
pytz.localize
--- End diff --

Oh definitely not doctest..Let me change to comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19077
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20548
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/716/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/19431
  
I mean the difference between `test("use backpressure.initialRate with 
backpressure")` and `test("backpressure.initialRate should honor 
maxRatePerPartition")` are 3 numbers. Wrapping the common code into one 
function and making 2 function call would be better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20548
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20548
  
**[Test build #87218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87218/testReport)**
 for PR 20548 at commit 
[`367c70b`](https://github.com/apache/spark/commit/367c70bd3aa9cf82358462deb624b7634567f0c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...

2018-02-08 Thread bogdanrdc

GitHub user bogdanrdc opened a pull request:

https://github.com/apache/spark/pull/20548

[SPARK-23316][SQL] AnalysisException after max iteration reached for IN 
query

## What changes were proposed in this pull request?
Added flag ignoreNullability to DataType.equalsStructurally.
The previous semantic is for ignoreNullability=false.
When ignoreNullability=true equalsStructurally ignores nullability of 
contained types (map key types, value types, array element types, structure 
field types).
In.checkInputTypes calls equalsStructurally to check if the children types 
match. They should match regardless of nullability (which is just a hint), so 
it is now called with ignoreNullability=true.

## How was this patch tested?
New test in SubquerySuite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bogdanrdc/spark SPARK-23316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20548.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20548


commit 367c70bd3aa9cf82358462deb624b7634567f0c9
Author: Bogdan Raducanu 
Date:   2018-02-08T15:19:34Z

fix + test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20547: [SPARK-23316][SQL] AnalysisException after max it...

2018-02-08 Thread bogdanrdc

Github user bogdanrdc closed the pull request at:

https://github.com/apache/spark/pull/20547


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87215/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20547: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20547
  
**[Test build #87217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87217/testReport)**
 for PR 20547 at commit 
[`79e2593`](https://github.com/apache/spark/commit/79e2593b90ce33788e012ee28fc4cbd3bf6e4264).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20547: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/715/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20545
  
**[Test build #87215 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87215/testReport)**
 for PR 20545 at commit 
[`664a62c`](https://github.com/apache/spark/commit/664a62c7da9ba5da2007d40ef9c157f7e82938c5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20547: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20547
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-02-08 Thread akonopko

Github user akonopko commented on the issue:

https://github.com/apache/spark/pull/19431
  
> Related the doc I thought it's kafka specific but it's not so fine like 
that
Yes, it was implemented only in Kafka Streams but doc doesnt limit usage of 
this parameter to Kafka
 
> good to merge the common functionalities
Not sure I understood you correctly here. You mean in tests ?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20547: [SPARK-23316][SQL] AnalysisException after max it...

2018-02-08 Thread bogdanrdc

GitHub user bogdanrdc opened a pull request:

https://github.com/apache/spark/pull/20547

[SPARK-23316][SQL] AnalysisException after max iteration reached for IN 
query

## What changes were proposed in this pull request?
Added flag ignoreNullability to DataType.equalsStructurally.
The previous semantic is for ignoreNullability=false.
When ignoreNullability=true equalsStructurally ignores nullability of 
contained types (map key types, value types, array element types, structure 
field types).
In.checkInputTypes calls equalsStructurally to check if the children types 
match. They should match regardless of nullability (which is just a hint), so 
it is now called with ignoreNullability=true.

## How was this patch tested?
New test in SubquerySuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bogdanrdc/spark SPARK-23316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20547


commit 03a4281751e02acd2b97ceff6cf8e1621e83eb93
Author: Bogdan Raducanu 
Date:   2017-04-20T10:59:49Z

fix + test

commit 72cf1d117890abe45aa30c6b91a7e2c527fc4969
Author: Bogdan Raducanu 
Date:   2017-04-20T11:01:40Z

reverted mistake commit

commit 2c96a8d65059db3b808e05241b870ccd17937095
Author: Bogdan Raducanu 
Date:   2017-05-12T15:24:57Z

erge remote-tracking branch 'upstream/master'

commit fa11b0b97b38bb98b599a8edf1d43e01b067a926
Author: Bogdan Raducanu 
Date:   2017-05-23T11:28:34Z

Merge remote-tracking branch 'upstream/master'

commit 21ad3aa4468b58aa4e552e2922e1bceda61097f7
Author: Bogdan Raducanu 
Date:   2017-05-23T11:28:57Z

Merge remote-tracking branch 'upstream/master'

commit 7f78cce9d6869371e2e28ce5d9fc4766d7dbc3de
Author: Bogdan Raducanu 
Date:   2017-06-06T11:19:35Z

Merge remote-tracking branch 'upstream/master'

commit eea2e5d466558aa3f2f6232024e8150dc246ba8a
Author: Bogdan Raducanu 
Date:   2017-06-07T10:35:37Z

Merge remote-tracking branch 'upstream/master'

commit b30788eac9df5e76863393826230481e23e52550
Author: Bogdan Raducanu 
Date:   2017-06-16T10:49:48Z

Merge remote-tracking branch 'upstream/master'

commit 38a0347e3079a3e56d70b77f5e25994497eabe41
Author: Bogdan Raducanu 
Date:   2017-06-27T10:33:10Z

Merge remote-tracking branch 'upstream/master'

commit 1057abe6262353093ccf9b75ed24ed54fdfc0095
Author: Bogdan Raducanu 
Date:   2017-06-28T12:35:12Z

Merge remote-tracking branch 'upstream/master'

commit 3f7bf43fab830c7cc6473b654ca290b23a9886be
Author: Bogdan Raducanu 
Date:   2017-07-07T11:08:05Z

Merge remote-tracking branch 'upstream/master'

commit 0a51b0f8640236da4054a25ca50bb8d19ba73b70
Author: Bogdan Raducanu 
Date:   2017-07-09T11:08:41Z

Merge remote-tracking branch 'upstream/master'

commit 3d11dca76380c7c53710141114aa768b1477b893
Author: Bogdan Raducanu 
Date:   2017-07-10T10:18:46Z

Merge remote-tracking branch 'upstream/master'

commit 015c84b6beb29c0b275b01f7774a2b7d8aa8d180
Author: Bogdan Raducanu 
Date:   2017-07-12T10:21:48Z

Merge remote-tracking branch 'upstream/master'

commit 597ddf2265149427f796eed1a43539b13e1516d9
Author: Bogdan Raducanu 
Date:   2017-08-03T12:45:36Z

Merge remote-tracking branch 'upstream/master'

commit 38549ba22681ba6622c4f1cd9c1b97592c5a34a5
Author: Bogdan Raducanu 
Date:   2017-08-04T09:17:09Z

Merge remote-tracking branch 'upstream/master'

commit 89b86c0d1e4616eae2902da79be308e0573ab5e3
Author: Bogdan Raducanu 
Date:   2017-09-10T19:23:22Z

Merge remote-tracking branch 'upstream/master'

commit edd1fbf107501bc9a0bdbf4f712577b9fe1fd3f6
Author: Bogdan Raducanu 
Date:   2017-10-30T14:18:09Z

Merge remote-tracking branch 'upstream/master'

commit 6ead465cb00fc36869d152693a4cd1318fa005b9
Author: Bogdan Raducanu 
Date:   2017-12-21T15:17:45Z

Merge remote-tracking branch 'upstream/master'

commit d83a3adfba5d790270214887f615ab95ba50a2f9
Author: Bogdan Raducanu 
Date:   2018-01-10T11:53:31Z

Merge remote-tracking branch 'upstream/master'

commit 4b79fd683102137401ed4e77ee351439c0d254b5
Author: Bogdan Raducanu 
Date:   2018-01-11T04:01:12Z

Merge remote-tracking branch 'upstream/master'

commit ffa5debdd3b6a7e0bd70d01e640f048883b23440
Author: Bogdan Raducanu 
Date:   2018-01-18T12:44:12Z

Merge remote-tracking branch 'upstream/master'

commit aec35c98a4ea0db2cee52785e51130243cdc0b61
Author: Bogdan Raducanu 
Date:   2018-01-24T17:23:37Z

Merge remote-tracking branch 'upstream/master'

commit 87d8649777d7481beca7a04f195e04bab5b059a0
Author: Bogdan Raducanu 
Date:   2018-02-02T11:02:31Z

Merge remote-tracking branch 'upstream/master'

commit 4d6b42a3a7cbc7254f7fa159a7d04b898c8b65e6
Author: Bogdan Raducanu 
Date:   2018-02-08T14:57:57Z

fix and test

commit

[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20544
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87213/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20544
  
**[Test build #87213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87213/testReport)**
 for PR 20544 at commit 
[`9ad0bfd`](https://github.com/apache/spark/commit/9ad0bfdc00373a9732338e17ca2dfa05b0c28cfb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r166952412
  
--- Diff: 
external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -91,9 +91,16 @@ class DirectKafkaInputDStream[
   private val maxRateLimitPerPartition: Long = 
context.sparkContext.getConf.getLong(
   "spark.streaming.kafka.maxRatePerPartition", 0)
 
+  private val initialRate = context.sparkContext.getConf.getLong(
+"spark.streaming.backpressure.initialRate", 0)
+
   protected[streaming] def maxMessagesPerPartition(
   offsets: Map[TopicAndPartition, Long]): 
Option[Map[TopicAndPartition, Long]] = {
-val estimatedRateLimit = rateController.map(_.getLatestRate())
+
+val estimatedRateLimit = rateController.map(x => {
+  val lr = x.getLatestRate()
+  if (lr > 0) lr else initialRate
--- End diff --

Same applies here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r166953420
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala
 ---
@@ -551,6 +551,76 @@ class DirectKafkaStreamSuite
   Map(new TopicPartition(topic, 0) -> 5L, new TopicPartition(topic, 1) 
-> 10L))
   }
 
+  test("use backpressure.initialRate with backpressure") {
+val topic = "backpressureInitialRate"
+val kafkaParams = getKafkaParams("auto.offset.reset" -> "earliest")
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.backpressure.enabled", "true")
+  .set("spark.streaming.kafka.maxRatePerPartition", "1000")
+  .set("spark.streaming.backpressure.initialRate", "500")
+
+val messages = Map("foo" -> 5000)
+kafkaTestUtils.sendMessages(topic, messages)
+
+ssc = new StreamingContext(sparkConf, Milliseconds(500))
+
+val kafkaStream = withClue("Error creating direct stream") {
+  new DirectKafkaInputDStream[String, String](
+ssc,
+preferredHosts,
+ConsumerStrategies.Subscribe[String, String](List(topic), 
kafkaParams.asScala),
+new DefaultPerPartitionConfig(sparkConf)
+  )
+}
+kafkaStream.start()
+
+val input = Map(new TopicPartition(topic, 0) -> 1000L)
+
+
assert(kafkaStream.maxMessagesPerPartition(input).flatMap(_.headOption).contains(
--- End diff --

Just caught this could be a bit simplified:

> assert(kafkaStream.maxMessagesPerPartition(input).get ==
>  Map(new TopicPartition(topic, 0) -> 250)) // we run for half a second



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r166953742
  
--- Diff: 
external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -387,6 +387,89 @@ class DirectKafkaStreamSuite
   Map(TopicAndPartition(topic, 0) -> 10L, TopicAndPartition(topic, 1) 
-> 10L))
   }
 
+  test("use backpressure.initialRate with backpressure") {
+val topic = "backpressureInitialRate"
+val topicPartitions = Set(TopicAndPartition(topic, 0))
+kafkaTestUtils.createTopic(topic, 1)
+val kafkaParams = Map(
+  "metadata.broker.list" -> kafkaTestUtils.brokerAddress,
+  "auto.offset.reset" -> "smallest"
+)
+
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.backpressure.enabled", "true")
+  .set("spark.streaming.backpressure.initialRate", "500")
+
+val messages = Map("foo" -> 5000)
+kafkaTestUtils.sendMessages(topic, messages)
+
+ssc = new StreamingContext(sparkConf, Milliseconds(500))
+
+val kafkaStream = withClue("Error creating direct stream") {
+  val kc = new KafkaCluster(kafkaParams)
+  val messageHandler = (mmd: MessageAndMetadata[String, String]) => 
(mmd.key, mmd.message)
+  val m = kc.getEarliestLeaderOffsets(topicPartitions)
+.fold(e => Map.empty[TopicAndPartition, Long], m => m.mapValues(lo 
=> lo.offset))
+
+  new DirectKafkaInputDStream[String, String, StringDecoder, 
StringDecoder, (String, String)](
+ssc, kafkaParams, m, messageHandler)
+}
+kafkaStream.start()
+
+val input = Map(new TopicAndPartition(topic, 0) -> 1000L)
+
+
assert(kafkaStream.maxMessagesPerPartition(input).flatMap(_.headOption).contains(
+  new TopicAndPartition(topic, 0) -> 250)) // we run for half a second
+
+kafkaStream.stop()
+  }
+
+  test("backpressure.initialRate should honor maxRatePerPartition") {
+val topic = "backpressureInitialRate2"
+val topicPartitions = Set(TopicAndPartition(topic, 0))
+kafkaTestUtils.createTopic(topic, 1)
+val kafkaParams = Map(
+  "metadata.broker.list" -> kafkaTestUtils.brokerAddress,
+  "auto.offset.reset" -> "smallest"
+)
+
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.backpressure.enabled", "true")
+  .set("spark.streaming.kafka.maxRatePerPartition", "300")
+  .set("spark.streaming.backpressure.initialRate", "1000")
+
+val messages = Map("foo" -> 5000)
+kafkaTestUtils.sendMessages(topic, messages)
+
+ssc = new StreamingContext(sparkConf, Milliseconds(500))
+
+val kafkaStream = withClue("Error creating direct stream") {
+  val kc = new KafkaCluster(kafkaParams)
+  val messageHandler = (mmd: MessageAndMetadata[String, String]) => 
(mmd.key, mmd.message)
+  val m = kc.getEarliestLeaderOffsets(topicPartitions)
+.fold(e => Map.empty[TopicAndPartition, Long], m => m.mapValues(lo 
=> lo.offset))
+
+  new DirectKafkaInputDStream[String, String, StringDecoder, 
StringDecoder, (String, String)](
+ssc, kafkaParams, m, messageHandler)
+}
+kafkaStream.start()
+
+val input = Map(new TopicAndPartition(topic, 0) -> 1000L)
+
+
assert(kafkaStream.maxMessagesPerPartition(input).flatMap(_.headOption).contains(
--- End diff --

Same simplification here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r166953894
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala
 ---
@@ -551,6 +551,76 @@ class DirectKafkaStreamSuite
   Map(new TopicPartition(topic, 0) -> 5L, new TopicPartition(topic, 1) 
-> 10L))
   }
 
+  test("use backpressure.initialRate with backpressure") {
+val topic = "backpressureInitialRate"
+val kafkaParams = getKafkaParams("auto.offset.reset" -> "earliest")
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.backpressure.enabled", "true")
+  .set("spark.streaming.kafka.maxRatePerPartition", "1000")
+  .set("spark.streaming.backpressure.initialRate", "500")
+
+val messages = Map("foo" -> 5000)
+kafkaTestUtils.sendMessages(topic, messages)
+
+ssc = new StreamingContext(sparkConf, Milliseconds(500))
+
+val kafkaStream = withClue("Error creating direct stream") {
+  new DirectKafkaInputDStream[String, String](
+ssc,
+preferredHosts,
+ConsumerStrategies.Subscribe[String, String](List(topic), 
kafkaParams.asScala),
+new DefaultPerPartitionConfig(sparkConf)
+  )
+}
+kafkaStream.start()
+
+val input = Map(new TopicPartition(topic, 0) -> 1000L)
+
+
assert(kafkaStream.maxMessagesPerPartition(input).flatMap(_.headOption).contains(
+  new TopicPartition(topic, 0) -> 250)) // we run for half a second
+
+kafkaStream.stop()
+  }
+
+  test("backpressure.initialRate should honor maxRatePerPartition") {
+val topic = "backpressureInitialRate"
+val kafkaParams = getKafkaParams("auto.offset.reset" -> "earliest")
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.backpressure.enabled", "true")
+  .set("spark.streaming.kafka.maxRatePerPartition", "300")
+  .set("spark.streaming.backpressure.initialRate", "1000")
+
+val messages = Map("foo" -> 5000)
+kafkaTestUtils.sendMessages(topic, messages)
+
+ssc = new StreamingContext(sparkConf, Milliseconds(500))
+
+val kafkaStream = withClue("Error creating direct stream") {
+  new DirectKafkaInputDStream[String, String](
+ssc,
+preferredHosts,
+ConsumerStrategies.Subscribe[String, String](List(topic), 
kafkaParams.asScala),
+new DefaultPerPartitionConfig(sparkConf)
+  )
+}
+kafkaStream.start()
+
+val input = Map(new TopicPartition(topic, 0) -> 1000L)
+
+
assert(kafkaStream.maxMessagesPerPartition(input).flatMap(_.headOption).contains(
--- End diff --

Same simplification here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r166953800
  
--- Diff: 
external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -387,6 +387,89 @@ class DirectKafkaStreamSuite
   Map(TopicAndPartition(topic, 0) -> 10L, TopicAndPartition(topic, 1) 
-> 10L))
   }
 
+  test("use backpressure.initialRate with backpressure") {
+val topic = "backpressureInitialRate"
+val topicPartitions = Set(TopicAndPartition(topic, 0))
+kafkaTestUtils.createTopic(topic, 1)
+val kafkaParams = Map(
+  "metadata.broker.list" -> kafkaTestUtils.brokerAddress,
+  "auto.offset.reset" -> "smallest"
+)
+
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.backpressure.enabled", "true")
+  .set("spark.streaming.backpressure.initialRate", "500")
+
+val messages = Map("foo" -> 5000)
+kafkaTestUtils.sendMessages(topic, messages)
+
+ssc = new StreamingContext(sparkConf, Milliseconds(500))
+
+val kafkaStream = withClue("Error creating direct stream") {
+  val kc = new KafkaCluster(kafkaParams)
+  val messageHandler = (mmd: MessageAndMetadata[String, String]) => 
(mmd.key, mmd.message)
+  val m = kc.getEarliestLeaderOffsets(topicPartitions)
+.fold(e => Map.empty[TopicAndPartition, Long], m => m.mapValues(lo 
=> lo.offset))
+
+  new DirectKafkaInputDStream[String, String, StringDecoder, 
StringDecoder, (String, String)](
+ssc, kafkaParams, m, messageHandler)
+}
+kafkaStream.start()
+
+val input = Map(new TopicAndPartition(topic, 0) -> 1000L)
+
+
assert(kafkaStream.maxMessagesPerPartition(input).flatMap(_.headOption).contains(
--- End diff --

Same simplification here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r166952263
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
 ---
@@ -126,7 +129,10 @@ private[spark] class DirectKafkaInputDStream[K, V](
 
   protected[streaming] def maxMessagesPerPartition(
 offsets: Map[TopicPartition, Long]): Option[Map[TopicPartition, Long]] 
= {
-val estimatedRateLimit = rateController.map(_.getLatestRate())
+val estimatedRateLimit = rateController.map(x => {
+  val lr = x.getLatestRate()
+  if (lr > 0) lr else initialRate
--- End diff --

Thanks for the info. My concern was the `LatestRate = 0` case, where limit 
can be lost. In the meantime taken a look at the `PIDRateEstimator` which could 
not produce 0 rate because of this:

`
val newRate = (latestRate - proportional * error -
integral * historicalError -
derivative * dError).max(minRate)
`

and minRate is limited:

`
  require(
minRate > 0,
s"Minimum rate in PIDRateEstimator should be > 0")
`

I'm fine with this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20520: [SPARK-23344][PYTHON][ML] Add distanceMeasure param to K...

2018-02-08 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20520
  
cc @BryanCutler @holdenk @jkbradley @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Remove StorageStatus, or make it pri...

2018-02-08 Thread attilapiros

Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20546
  
If this change goes into the 2.3 branch then MimaExcludes.scala should be 
changed accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for R...

2018-02-08 Thread ashashwat

Github user ashashwat commented on the issue:

https://github.com/apache/spark/pull/20503
  
@HyukjinKwon Should I add more tests covering Unicode?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Remove StorageStatus, or make it pri...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20546
  
**[Test build #87216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87216/testReport)**
 for PR 20546 at commit 
[`3f5dba6`](https://github.com/apache/spark/commit/3f5dba62cfa8b07d602639e9f88f3fd8cec92831).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Remove StorageStatus, or make it pri...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20546
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20546: [SPARK-20659][Core] Remove StorageStatus, or make...

2018-02-08 Thread attilapiros

GitHub user attilapiros opened a pull request:

https://github.com/apache/spark/pull/20546

[SPARK-20659][Core] Remove StorageStatus, or make it private.

## What changes were proposed in this pull request?

In this PR StorageStatus is made to private and simplified a bit moreover 
SparkContext.getExecutorStorageStatus method is removed. The reason of keeping 
StorageStatus is that it is usage from SparkContext.getRDDStorageInfo.

Instead of the method SparkContext.getExecutorStorageStatus executor infos 
are extended with additional memory metrics such as usedOnHeapStorageMemory, 
usedOffHeapStorageMemory, totalOnHeapStorageMemory, totalOffHeapStorageMemory.

## How was this patch tested?

By running existing unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/attilapiros/spark SPARK-20659

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20546.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20546


commit 3f5dba62cfa8b07d602639e9f88f3fd8cec92831
Author: âattilapirosâ 
Date:   2018-02-07T13:41:31Z

initial version




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20531
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87214/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20531
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20531
  
**[Test build #87214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87214/testReport)**
 for PR 20531 at commit 
[`36617e4`](https://github.com/apache/spark/commit/36617e4bd864e0fbca5c617d009de45a8231a5d6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19077
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19077
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87212/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19077
  
**[Test build #87212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87212/testReport)**
 for PR 19077 at commit 
[`78ede3f`](https://github.com/apache/spark/commit/78ede3fae243c5379eaea1c86584e200ce697c19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

301 - 400 of 573 matches

Mail list logo