date:20190830

[GitHub] [spark] SparkQA commented on issue #23531: [SPARK-24497][SQL] Support recursive SQL query

2019-08-30 Thread GitBox

SparkQA commented on issue #23531: [SPARK-24497][SQL] Support recursive SQL 
query
URL: https://github.com/apache/spark/pull/23531#issuecomment-526573618
 
 
   **[Test build #109944 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109944/testReport)**
 for PR 23531 at commit 
[`f35a784`](https://github.com/apache/spark/commit/f35a78495732d03baf671cb9465ed4b00c2d05a3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #23531: [SPARK-24497][SQL] Support recursive SQL query

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #23531: [SPARK-24497][SQL] Support recursive 
SQL query
URL: https://github.com/apache/spark/pull/23531#issuecomment-526575217
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #23531: [SPARK-24497][SQL] Support recursive SQL query

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #23531: [SPARK-24497][SQL] Support recursive 
SQL query
URL: https://github.com/apache/spark/pull/23531#issuecomment-526575225
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14970/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #23531: [SPARK-24497][SQL] Support recursive SQL query

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #23531: [SPARK-24497][SQL] Support 
recursive SQL query
URL: https://github.com/apache/spark/pull/23531#issuecomment-526575217
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #23531: [SPARK-24497][SQL] Support recursive SQL query

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #23531: [SPARK-24497][SQL] Support 
recursive SQL query
URL: https://github.com/apache/spark/pull/23531#issuecomment-526575225
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14970/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] 
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r319474398
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##
 @@ -206,6 +206,8 @@ private[spark] class BlockManager(
 new BlockManager.RemoteBlockDownloadFileManager(this)
   private val maxRemoteBlockToMem = 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM)
 
+  private val executorIdToLocalDirsCache = new mutable.HashMap[String, 
Array[String]]()
 
 Review comment:
   when do we update it? e.g. what if an executor is down.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] 
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r319469667
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -2851,6 +2851,9 @@ object SparkContext extends Logging {
   memoryPerSlaveInt, sc.executorMemory))
 }
 
+// for local cluster mode the SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED 
defaults to false
+sc.conf.setIfMissing(config.SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED, 
false)
 
 Review comment:
   why is this necessary?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] 
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r319473468
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/network/BlockDataManager.scala
 ##
 @@ -22,16 +22,22 @@ import scala.reflect.ClassTag
 import org.apache.spark.TaskContext
 import org.apache.spark.network.buffer.ManagedBuffer
 import org.apache.spark.network.client.StreamCallbackWithID
-import org.apache.spark.storage.{BlockId, StorageLevel}
+import org.apache.spark.storage.{BlockId, ShuffleBlockId, StorageLevel}
 
 private[spark]
 trait BlockDataManager {
 
+  /**
+   * Interface to get host-local shuffle block data. Throws an exception if 
the block cannot be
 
 Review comment:
   The block manager keeps RDD blocks as well, shall we support it? I think 
it's better to have a `def getHostLocalBlockData(blockId: BlockId, dirs: 
Array[String])`, to be consistent with `getLocalBlockData`. We can add an 
assert in `getHostLocalBlockData` to make sure the `blockId` is 
`ShuffleBlockId`, if we don't want to support RDD blocks now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #23531: [SPARK-24497][SQL] Support recursive SQL query

2019-08-30 Thread GitBox

SparkQA commented on issue #23531: [SPARK-24497][SQL] Support recursive SQL 
query
URL: https://github.com/apache/spark/pull/23531#issuecomment-526575750
 
 
   **[Test build #109945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109945/testReport)**
 for PR 23531 at commit 
[`1def9fa`](https://github.com/apache/spark/commit/1def9fa7078948d50fa9ff4a80fe0321ce948ada).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25299: [SPARK-27651][Core] 
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r319477754
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
 ##
 @@ -51,6 +53,13 @@ class BlockManagerMasterEndpoint(
   // Mapping from block manager id to the block manager's information.
   private val blockManagerInfo = new mutable.HashMap[BlockManagerId, 
BlockManagerInfo]
 
+  // Mapping from executor id to the block manager's local disk directories.
+  private val executorIdToLocalDirs =
 
 Review comment:
   shall we update it in `removeBlockManager`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi opened a new pull request #25631: [SPARK-28928][SS] Take over Kafka delegation token protocol on sources/sinks

2019-08-30 Thread GitBox

gaborgsomogyi opened a new pull request #25631: [SPARK-28928][SS] Take over 
Kafka delegation token protocol on sources/sinks
URL: https://github.com/apache/spark/pull/25631
 
 
   ### What changes were proposed in this pull request?
   At the moment there are 3 places where communication protocol with Kafka 
cluster has to be set when delegation token used:
   * On delegation token
   * On source
   * On sink
   
   Most of the time users are using the same protocol on all these places 
(within one Kafka cluster). It would be better to declare it in one place 
(delegation token side) and Kafka sources/sinks can take this config over.
   
   In this PR I've I've modified the code in a way that Kafka sources/sinks are 
taking over delegation token side `security.protocol` configuration when the 
token and the source/sink matches in `bootstrap.servers` configuration. This 
default configuration can be overwritten on each source/sink independently by 
using `kafka.security.protocol` configuration.
   
   ### Why are the changes needed?
   The actual configuration's default behavior represents the minority of the 
use-cases and inconvenient.
   
   ### Does this PR introduce any user-facing change?
   Yes, with this change users need to provide less configuration parameters by 
default.
   
   ### How was this patch tested?
   Existing + additional unit tests.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25628: [SPARK-28897][Core]'coalesce' error when executing dataframe.na.fill

2019-08-30 Thread GitBox

HyukjinKwon commented on a change in pull request #25628: 
[SPARK-28897][Core]'coalesce' error when executing dataframe.na.fill
URL: https://github.com/apache/spark/pull/25628#discussion_r319482514
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##
 @@ -435,11 +435,10 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
* Returns a [[Column]] expression that replaces null value in `col` with 
`replacement`.
*/
   private def fillCol[T](col: StructField, replacement: T): Column = {
-val quotedColName = "`" + col.name + "`"
 
 Review comment:
   Does `.` work too?
   
   ```scala
   scala> val df = spark.range(1).selectExpr("1 as `a.b`")
   df: org.apache.spark.sql.DataFrame = [a.b: int]
   
   scala> df.col("a.b")
   org.apache.spark.sql.AnalysisException: Cannot resolve column name "a.b" 
among (a.b);
 at org.apache.spark.sql.Dataset.$anonfun$resolve$1(Dataset.scala:259)
 at scala.Option.getOrElse(Option.scala:138)
 at org.apache.spark.sql.Dataset.resolve(Dataset.scala:259)
 at org.apache.spark.sql.Dataset.col(Dataset.scala:1340)
 ... 47 elided
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25631: [SPARK-28928][SS] Take over Kafka delegation token protocol on sources/sinks

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25631: [SPARK-28928][SS] Take over Kafka 
delegation token protocol on sources/sinks
URL: https://github.com/apache/spark/pull/25631#issuecomment-526577073
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25631: [SPARK-28928][SS] Take over Kafka delegation token protocol on sources/sinks

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25631: [SPARK-28928][SS] Take over Kafka 
delegation token protocol on sources/sinks
URL: https://github.com/apache/spark/pull/25631#issuecomment-526577253
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25631: [SPARK-28928][SS] Take over Kafka delegation token protocol on sources/sinks

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25631: [SPARK-28928][SS] Take over 
Kafka delegation token protocol on sources/sinks
URL: https://github.com/apache/spark/pull/25631#issuecomment-526577073
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25631: [SPARK-28928][SS] Take over Kafka delegation token protocol on sources/sinks

2019-08-30 Thread GitBox

SparkQA commented on issue #25631: [SPARK-28928][SS] Take over Kafka delegation 
token protocol on sources/sinks
URL: https://github.com/apache/spark/pull/25631#issuecomment-526578034
 
 
   **[Test build #109946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109946/testReport)**
 for PR 25631 at commit 
[`a79d77f`](https://github.com/apache/spark/commit/a79d77fbf793a37752912a8d84d9caf5d906b187).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla 
in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526578038
 
 
   **[Test build #109947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109947/testReport)**
 for PR 25497 at commit 
[`1068514`](https://github.com/apache/spark/commit/1068514162cc9f27e57d0342d3a953967aaf76e2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25631: [SPARK-28928][SS] Take over Kafka delegation token protocol on sources/sinks

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25631: [SPARK-28928][SS] Take over 
Kafka delegation token protocol on sources/sinks
URL: https://github.com/apache/spark/pull/25631#issuecomment-526577253
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shivusondur opened a new pull request #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

shivusondur opened a new pull request #25632: [SPARK-28809][DOC][SQL]Document 
SHOW TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632
 
 
   
   
   ### What changes were proposed in this pull request?
   Added the document reference for SHOW TABLE EXTENDED sql command
   
   
   
   ### Why are the changes needed?
   For User reference
   
   
   
   ### Does this PR introduce any user-facing change?
   yes, it provides document reference for SHOW TABLE EXTENDED sql command
   
   
   
   ### How was this patch tested?
   verified in snap
   
Attached the Snap
   
   
![image](https://user-images.githubusercontent.com/7912929/64019686-79545100-cb4d-11e9-9954-f6b5b8f10780.png)
   
![image](https://user-images.githubusercontent.com/7912929/64019738-95f08900-cb4d-11e9-9769-ee2be926fdc1.png)
   
![image](https://user-images.githubusercontent.com/7912929/64019775-ab65b300-cb4d-11e9-9e7e-140616af7790.png)
   
![image](https://user-images.githubusercontent.com/7912929/64019809-c1737380-cb4d-11e9-91d6-ec2950ae65db.png)
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shivusondur commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

shivusondur commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW 
TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632#issuecomment-526580336
 
 
   @dilipbiswal @gatorsmile 
   plz review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW 
TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632#issuecomment-526580405
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-30 Thread GitBox

SparkQA commented on issue #25104: [SPARK-28341][SQL] create a public API for 
V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-526580515
 
 
   **[Test build #109948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109948/testReport)**
 for PR 25104 at commit 
[`8cd5cde`](https://github.com/apache/spark/commit/8cd5cde53c12ba363e2ec556ce03ba4544d76cf2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW 
TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632#issuecomment-526582100
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25632: 
[SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632#issuecomment-526580405
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW 
TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632#issuecomment-526582292
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API 
for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-526582495
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14972/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] 
Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526582463
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526582463
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526582474
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14971/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] 
Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526582474
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14971/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a 
public API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-526582495
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14972/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a 
public API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-526582490
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25632: [SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25632: 
[SPARK-28809][DOC][SQL]Document SHOW TABLE in SQL Reference
URL: https://github.com/apache/spark/pull/25632#issuecomment-526582100
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API 
for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-526582490
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla 
in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526583180
 
 
   **[Test build #109949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109949/testReport)**
 for PR 25497 at commit 
[`806d443`](https://github.com/apache/spark/commit/806d443ab71dc34ed555b9d3fa7f894fe660eacc).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #25477: [SPARK-28760][SS][TESTS] Add Kafka delegation token end-to-end test with mini KDC

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #25477: 
[SPARK-28760][SS][TESTS] Add Kafka delegation token end-to-end test with mini 
KDC
URL: https://github.com/apache/spark/pull/25477#discussion_r319491077
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala
 ##
 @@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.UUID
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.apache.kafka.common.security.auth.SecurityProtocol.SASL_PLAINTEXT
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.config.{KEYTAB, PRINCIPAL}
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.streaming.{OutputMode, StreamTest}
+import org.apache.spark.sql.test.SharedSQLContext
+
+class KafkaDelegationTokenSuite extends StreamTest with SharedSQLContext with 
KafkaTest {
+
+  import testImplicits._
+
+  protected var testUtils: KafkaTestUtils = _
+
+  protected override def sparkConf = super.sparkConf
+.set("spark.security.credentials.hadoopfs.enabled", "false")
+.set("spark.security.credentials.hbase.enabled", "false")
+.set(KEYTAB, testUtils.clientKeytab)
+.set(PRINCIPAL, testUtils.clientPrincipal)
+.set("spark.kafka.clusters.cluster1.auth.bootstrap.servers", 
testUtils.brokerAddress)
+.set("spark.kafka.clusters.cluster1.security.protocol", 
SASL_PLAINTEXT.name)
+
+  override def beforeAll(): Unit = {
+testUtils = new KafkaTestUtils(Map.empty, true)
+testUtils.setup()
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+try {
+  if (testUtils != null) {
+testUtils.teardown()
+testUtils = null
+  }
+  UserGroupInformation.reset()
+} finally {
+  super.afterAll()
+}
+  }
+
+  test("Roundtrip") {
+val hadoopConf = new Configuration()
+val manager = new HadoopDelegationTokenManager(spark.sparkContext.conf, 
hadoopConf, null)
+val credentials = new Credentials()
+manager.obtainDelegationTokens(credentials)
+val serializedCredentials = SparkHadoopUtil.get.serialize(credentials)
+SparkHadoopUtil.get.addDelegationTokens(serializedCredentials, 
spark.sparkContext.conf)
+
+val topic = "topic-" + UUID.randomUUID().toString
+testUtils.createTopic(topic, partitions = 5)
+
+withTempDir { checkpointDir =>
+  val input = MemoryStream[String]
+
+  val df = input.toDF()
+  val writer = df.writeStream
+.outputMode(OutputMode.Append)
+.format("kafka")
+.option("checkpointLocation", checkpointDir.getCanonicalPath)
+.option("kafka.bootstrap.servers", testUtils.brokerAddress)
+.option("kafka.security.protocol", SASL_PLAINTEXT.name)
 
 Review comment:
   For tracking purposes I've filed SPARK-28928.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #25477: [SPARK-28760][SS][TESTS] Add Kafka delegation token end-to-end test with mini KDC

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #25477: 
[SPARK-28760][SS][TESTS] Add Kafka delegation token end-to-end test with mini 
KDC
URL: https://github.com/apache/spark/pull/25477#discussion_r319491077
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala
 ##
 @@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.UUID
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.apache.kafka.common.security.auth.SecurityProtocol.SASL_PLAINTEXT
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.config.{KEYTAB, PRINCIPAL}
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.streaming.{OutputMode, StreamTest}
+import org.apache.spark.sql.test.SharedSQLContext
+
+class KafkaDelegationTokenSuite extends StreamTest with SharedSQLContext with 
KafkaTest {
+
+  import testImplicits._
+
+  protected var testUtils: KafkaTestUtils = _
+
+  protected override def sparkConf = super.sparkConf
+.set("spark.security.credentials.hadoopfs.enabled", "false")
+.set("spark.security.credentials.hbase.enabled", "false")
+.set(KEYTAB, testUtils.clientKeytab)
+.set(PRINCIPAL, testUtils.clientPrincipal)
+.set("spark.kafka.clusters.cluster1.auth.bootstrap.servers", 
testUtils.brokerAddress)
+.set("spark.kafka.clusters.cluster1.security.protocol", 
SASL_PLAINTEXT.name)
+
+  override def beforeAll(): Unit = {
+testUtils = new KafkaTestUtils(Map.empty, true)
+testUtils.setup()
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+try {
+  if (testUtils != null) {
+testUtils.teardown()
+testUtils = null
+  }
+  UserGroupInformation.reset()
+} finally {
+  super.afterAll()
+}
+  }
+
+  test("Roundtrip") {
+val hadoopConf = new Configuration()
+val manager = new HadoopDelegationTokenManager(spark.sparkContext.conf, 
hadoopConf, null)
+val credentials = new Credentials()
+manager.obtainDelegationTokens(credentials)
+val serializedCredentials = SparkHadoopUtil.get.serialize(credentials)
+SparkHadoopUtil.get.addDelegationTokens(serializedCredentials, 
spark.sparkContext.conf)
+
+val topic = "topic-" + UUID.randomUUID().toString
+testUtils.createTopic(topic, partitions = 5)
+
+withTempDir { checkpointDir =>
+  val input = MemoryStream[String]
+
+  val df = input.toDF()
+  val writer = df.writeStream
+.outputMode(OutputMode.Append)
+.format("kafka")
+.option("checkpointLocation", checkpointDir.getCanonicalPath)
+.option("kafka.bootstrap.servers", testUtils.brokerAddress)
+.option("kafka.security.protocol", SASL_PLAINTEXT.name)
 
 Review comment:
   For tracking purposes I've filed 
[SPARK-28928](https://issues.apache.org/jira/browse/SPARK-28928).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 commented on a change in pull request #23516: [SPARK-26598] Fix HiveThriftServer2 set hiveconf and hivevar in every sql

2019-08-30 Thread GitBox

cxzl25 commented on a change in pull request #23516: [SPARK-26598] Fix 
HiveThriftServer2 set hiveconf and hivevar in every sql
URL: https://github.com/apache/spark/pull/23516#discussion_r319491134
 
 

 ##
 File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 ##
 @@ -51,9 +50,6 @@ private[thriftserver] class SparkSQLOperationManager()
 require(sqlContext != null, s"Session handle: 
${parentSession.getSessionHandle} has not been" +
   s" initialized or had already closed.")
 val conf = sqlContext.sessionState.conf
-val hiveSessionState = parentSession.getSessionState
-setConfMap(conf, hiveSessionState.getOverriddenConfigurations)
-setConfMap(conf, hiveSessionState.getHiveVariables)
 
 Review comment:
   ```
   cat < test.sql
   select '\${a}', '\${b}';
   set b=MOD_VALUE;
   set b;
   EOF
   
   beeline -u jdbc:hive2://localhost:1 --hiveconf a=avalue --hivevar 
b=bvalue -f test.sql
   ```
   Result:
   ```
   +-+-+--+
   |   key   |  value  |
   +-+-+--+
   | b   | bvalue  |
   +-+-+--+
   1 row selected (0.022 seconds)
   ```
   
   It is wrong to set the hivevar/hiveconf variable in every operation, which 
prevents variable updates.
   
   The intention is just an initialized value, so setting it once in 
SparkSQLSessionManager#openSession is enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526584963
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14973/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526584956
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] 
Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526584963
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14973/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] 
Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526584956
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319495883
 
 

 ##
 File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
 ##
 @@ -106,7 +106,7 @@ protected void handleMessage(
 numBlockIds += ids.length;
   }
   streamId = streamManager.registerStream(client.getClientId(),
-new ManagedBufferIterator(msg, numBlockIds), client.getChannel());
+new ShuffleManagedBufferIterator(msg), client.getChannel());
 
 Review comment:
   we can also remove
   ```
   numBlockIds = 0;
 for (int[] ids: msg.reduceIds) {
   numBlockIds += ids.length;
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319496507
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/InternalKafkaConsumerPool.scala
 ##
 @@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+import java.util.concurrent.ConcurrentHashMap
+
+import org.apache.commons.pool2.{BaseKeyedPooledObjectFactory, PooledObject, 
SwallowedExceptionListener}
+import org.apache.commons.pool2.impl.{DefaultEvictionPolicy, 
DefaultPooledObject, GenericKeyedObjectPool, GenericKeyedObjectPoolConfig}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.kafka010.InternalKafkaConsumerPool._
+import org.apache.spark.sql.kafka010.KafkaDataConsumer.CacheKey
+
+/**
+ * Provides object pool for [[InternalKafkaConsumer]] which is grouped by 
[[CacheKey]].
+ *
+ * This class leverages [[GenericKeyedObjectPool]] internally, hence providing 
methods based on
+ * the class, and same contract applies: after using the borrowed object, you 
must either call
+ * returnObject() if the object is healthy to return to pool, or 
invalidateObject() if the object
+ * should be destroyed.
+ *
+ * The soft capacity of pool is determined by 
"spark.kafka.consumer.cache.capacity" config value,
+ * and the pool will have reasonable default value if the value is not 
provided.
+ * (The instance will do its best effort to respect soft capacity but it can 
exceed when there's
+ * a borrowing request and there's neither free space nor idle object to 
clear.)
+ *
+ * This class guarantees that no caller will get pooled object once the object 
is borrowed and
+ * not yet returned, hence provide thread-safety usage of non-thread-safe 
[[InternalKafkaConsumer]]
+ * unless caller shares the object to multiple threads.
+ */
+private[kafka010] class InternalKafkaConsumerPool(
+objectFactory: ObjectFactory,
+poolConfig: PoolConfig) extends Logging {
+
+  def this(conf: SparkConf) = {
+this(new ObjectFactory, new PoolConfig(conf))
+  }
+
+  // the class is intended to have only soft capacity
+  assert(poolConfig.getMaxTotal < 0)
+
+  private val pool = {
+val internalPool = new GenericKeyedObjectPool[CacheKey, 
InternalKafkaConsumer](
+  objectFactory, poolConfig)
+
internalPool.setSwallowedExceptionListener(CustomSwallowedExceptionListener)
+internalPool
+  }
+
+  /**
+   * Borrows [[InternalKafkaConsumer]] object from the pool. If there's no 
idle object for the key,
+   * the pool will create the [[InternalKafkaConsumer]] object.
+   *
+   * If the pool doesn't have idle object for the key and also exceeds the 
soft capacity,
+   * pool will try to clear some of idle objects.
+   *
+   * Borrowed object must be returned by either calling returnObject or 
invalidateObject, otherwise
+   * the object will be kept in pool as active object.
+   */
+  def borrowObject(key: CacheKey, kafkaParams: ju.Map[String, Object]): 
InternalKafkaConsumer = {
+updateKafkaParamForKey(key, kafkaParams)
+
+if (size >= poolConfig.softMaxSize) {
+  logWarning("Pool exceeds its soft max size, cleaning up idle objects...")
+  pool.clearOldest()
+}
+
+pool.borrowObject(key)
+  }
+
+  /** Returns borrowed object to the pool. */
+  def returnObject(consumer: InternalKafkaConsumer): Unit = {
+pool.returnObject(extractCacheKey(consumer), consumer)
+  }
+
+  /** Invalidates (destroy) borrowed object to the pool. */
+  def invalidateObject(consumer: InternalKafkaConsumer): Unit = {
+pool.invalidateObject(extractCacheKey(consumer), consumer)
+  }
+
+  /** Invalidates all idle consumers for the key */
+  def invalidateKey(key: CacheKey): Unit = {
+pool.clear(key)
+  }
+
+  /**
+   * Closes the keyed object pool. Once the pool is closed,
+   * borrowObject will fail with [[IllegalStateException]], but returnObject 
and invalidateObject
+   * will continue to work, with returned objects destroyed on return.
+   *
+   * Also destroys idle instances in

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319498094
 
 

 ##
 File path: 
core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java
 ##
 @@ -39,17 +39,15 @@
   /**
* Called once per map task to create a writer that will be responsible for 
persisting all the
* partitioned bytes written by that map task.
-   *  @param shuffleId Unique identifier for the shuffle the map task is a 
part of
-   * @param mapId Within the shuffle, the identifier of the map task
-   * @param mapTaskAttemptId Identifier of the task attempt. Multiple attempts 
of the same map task
- * with the same (shuffleId, mapId) pair can be 
distinguished by the
- * different values of mapTaskAttemptId.
+   * @param shuffleId Unique identifier for the shuffle the map task is a part 
of
+   * @param mapId Identifier of the task attempt. Multiple attempts of the 
same map task with the
 
 Review comment:
   let's rephrase it. How about `An id of the map task which is unique within 
this Spark application.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319498094
 
 

 ##
 File path: 
core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java
 ##
 @@ -39,17 +39,15 @@
   /**
* Called once per map task to create a writer that will be responsible for 
persisting all the
* partitioned bytes written by that map task.
-   *  @param shuffleId Unique identifier for the shuffle the map task is a 
part of
-   * @param mapId Within the shuffle, the identifier of the map task
-   * @param mapTaskAttemptId Identifier of the task attempt. Multiple attempts 
of the same map task
- * with the same (shuffleId, mapId) pair can be 
distinguished by the
- * different values of mapTaskAttemptId.
+   * @param shuffleId Unique identifier for the shuffle the map task is a part 
of
+   * @param mapId Identifier of the task attempt. Multiple attempts of the 
same map task with the
 
 Review comment:
   let's rephrase it. How about `An ID of the map task. The ID is unique within 
this Spark application.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319498159
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSparkConfSuite.scala
 ##
 @@ -1,30 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.kafka010
-
-import org.apache.spark.{LocalSparkContext, SparkConf, SparkFunSuite}
-import org.apache.spark.util.ResetSystemProperties
-
-class KafkaSparkConfSuite extends SparkFunSuite with LocalSparkContext with 
ResetSystemProperties {
 
 Review comment:
   Hmm, what has happened with this test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319500037
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -269,9 +300,12 @@ private[kafka010] case class InternalKafkaConsumer(
   // When there is some error thrown, it's better to use a new 
consumer to drop all cached
   // states in the old consumer. We don't need to worry about the 
performance because this
   // is not a common path.
-  resetConsumer()
-  reportDataLoss(failOnDataLoss, s"Cannot fetch offset 
$toFetchOffset", e)
-  toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
+  releaseConsumer()
+  fetchedData.reset()
 
 Review comment:
   Don't we need `releaseFetchedData` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319500084
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSparkConfSuite.scala
 ##
 @@ -1,30 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.kafka010
-
-import org.apache.spark.{LocalSparkContext, SparkConf, SparkFunSuite}
-import org.apache.spark.util.ResetSystemProperties
-
-class KafkaSparkConfSuite extends SparkFunSuite with LocalSparkContext with 
ResetSystemProperties {
 
 Review comment:
   I have renamed the config to old one per feedback, so no longer need this 
test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319500695
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -269,9 +300,12 @@ private[kafka010] case class InternalKafkaConsumer(
   // When there is some error thrown, it's better to use a new 
consumer to drop all cached
   // states in the old consumer. We don't need to worry about the 
performance because this
   // is not a common path.
-  resetConsumer()
-  reportDataLoss(failOnDataLoss, s"Cannot fetch offset 
$toFetchOffset", e)
-  toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
+  releaseConsumer()
+  fetchedData.reset()
 
 Review comment:
   Yes, as FetchedData is designed to be modified per task. Once you get the 
one for that task, you can just modify it, and also reset if necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319500695
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -269,9 +300,12 @@ private[kafka010] case class InternalKafkaConsumer(
   // When there is some error thrown, it's better to use a new 
consumer to drop all cached
   // states in the old consumer. We don't need to worry about the 
performance because this
   // is not a common path.
-  resetConsumer()
-  reportDataLoss(failOnDataLoss, s"Cannot fetch offset 
$toFetchOffset", e)
-  toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
+  releaseConsumer()
+  fetchedData.reset()
 
 Review comment:
   Yes, as FetchedData is designed to be modified per task. (So based on the 
desired offset, in most cases pool will provide same FetchedData. Once you get 
the one for that task, you can just modify it, and also reset if necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319500695
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -269,9 +300,12 @@ private[kafka010] case class InternalKafkaConsumer(
   // When there is some error thrown, it's better to use a new 
consumer to drop all cached
   // states in the old consumer. We don't need to worry about the 
performance because this
   // is not a common path.
-  resetConsumer()
-  reportDataLoss(failOnDataLoss, s"Cannot fetch offset 
$toFetchOffset", e)
-  toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
+  releaseConsumer()
+  fetchedData.reset()
 
 Review comment:
   Yes, as FetchedData is designed to be modified per task. So based on the 
desired offset, in most cases pool will provide same FetchedData. Once you get 
the one for that task, you can just modify it, and also reset if necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319500695
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -269,9 +300,12 @@ private[kafka010] case class InternalKafkaConsumer(
   // When there is some error thrown, it's better to use a new 
consumer to drop all cached
   // states in the old consumer. We don't need to worry about the 
performance because this
   // is not a common path.
-  resetConsumer()
-  reportDataLoss(failOnDataLoss, s"Cannot fetch offset 
$toFetchOffset", e)
-  toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
+  releaseConsumer()
+  fetchedData.reset()
 
 Review comment:
   No, as FetchedData is designed to be modified per task. So based on the 
desired offset, in most cases pool will provide same FetchedData. Once you get 
the one for that task, you can just modify it, and also reset if necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming

2019-08-30 Thread GitBox

HeartSaVioR commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS 
sink for Structured Streaming
URL: https://github.com/apache/spark/pull/25618#issuecomment-526593592
 
 
   Well, someone could say it as 2PC since the behavior is similar, but 
generally 2PC assumes coordinator and participants. In second phase, 
coordinator "ask" for commit/abort to participants, not committing/aborting 
things participants just did in first phase by itself. Based on that, driver 
should request tasks to commit their outputs, but Spark doesn't provide such 
flow. So that's pretty simplified version of 2PC and also pretty limited.
   
   I think the point is whether we are feeling OK to have exactly-once with 
some restrictions end users need to be aware of. Could you please initiate 
discussion on this in Spark dev mailing list? That would be good to hear 
others' voices.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319502863
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
 ##
 @@ -100,16 +108,19 @@ private[spark] object MapStatus {
  *
  * @param loc location where the task is being executed.
  * @param compressedSizes size of the blocks, indexed by reduce partition id.
+ * @param mapTaskId unique task id for the task
 
 Review comment:
   let's call it `mapId`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319502863
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
 ##
 @@ -100,16 +108,19 @@ private[spark] object MapStatus {
  *
  * @param loc location where the task is being executed.
  * @param compressedSizes size of the blocks, indexed by reduce partition id.
+ * @param mapTaskId unique task id for the task
 
 Review comment:
   let's call it `mapId`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla 
in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526594832
 
 
   **[Test build #109943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109943/testReport)**
 for PR 25497 at commit 
[`99a2182`](https://github.com/apache/spark/commit/99a21824580349e9f0524573e1c7ce5a739360e9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

SparkQA removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate 
the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526556962
 
 
   **[Test build #109943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109943/testReport)**
 for PR 25497 at commit 
[`99a2182`](https://github.com/apache/spark/commit/99a21824580349e9f0524573e1c7ce5a739360e9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

HyukjinKwon opened a new pull request #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to upgrade scala-maven-plugin from 3.4.4 to 4.2.0.
   
   Upgrade to 4.1.1 was reverted due to unexpected build failure on AppVeyor.
   
   The root cause seems to be an issue specific to AppVeyor - loading the 
system library 'kernel32.dll' seems being failed.
   
   ```
   Suppressed: java.lang.NoClassDefFoundError: Could not initialize class 
com.sun.jna.platform.win32.Kernel32
   at sbt.internal.io.WinMilli$.getHandle(Milli.scala:264)
   at sbt.internal.io.WinMilli$.getModifiedTimeNative(Milli.scala:289)
   at sbt.internal.io.WinMilli$.getModifiedTimeNative(Milli.scala:260)
   at sbt.internal.io.MilliNative.getModifiedTime(Milli.scala:61)
   at sbt.internal.io.Milli$.getModifiedTime(Milli.scala:360)
   at sbt.io.IO$.$anonfun$getModifiedTimeOrZero$1(IO.scala:1373)
   at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
   at sbt.internal.io.Retry$.liftedTree2$1(Retry.scala:38)
   at sbt.internal.io.Retry$.impl$1(Retry.scala:38)
   at sbt.internal.io.Retry$.apply(Retry.scala:52)
   at sbt.internal.io.Retry$.apply(Retry.scala:24)
   at sbt.io.IO$.getModifiedTimeOrZero(IO.scala:1373)
   at 
sbt.internal.inc.caching.ClasspathCache$.fromCacheOrHash$1(ClasspathCache.scala:44)
   at 
sbt.internal.inc.caching.ClasspathCache$.$anonfun$hashClasspath$1(ClasspathCache.scala:53)
   at 
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:659)
   at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
   at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
   at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
   at 
scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
   at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:170)
   ... 25 more
   ```
   
   By setting `-Djna.nosys=true`, it directly loads the library from the jar 
instead of system's.
   
   In this way, the build seems working fine.
   
   ### Why are the changes needed?
   
   It upgrades the plugin to fix bugs and fixes the CI build.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   It was tested at https://github.com/apache/spark/pull/25497
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

HyukjinKwon commented on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526595448
 
 
   cc @dongjoon-hyun, @srowen and @wangyum 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526595458
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526595460
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109943/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] 
Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526595458
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] 
Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526595460
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109943/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526595755
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

HyukjinKwon closed pull request #25497: [BUILD][DO-NOT-MERGE] Investigate the 
detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526595766
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14974/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319504661
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -48,9 +48,10 @@ import org.apache.spark.util.{CompletionIterator, 
TaskCompletionListener, Utils}
  * @param shuffleClient [[BlockStoreClient]] for fetching remote blocks
  * @param blockManager [[BlockManager]] for reading local blocks
  * @param blocksByAddress list of blocks to fetch grouped by the 
[[BlockManagerId]].
- *For each block we also require the size (in bytes as 
a long field) in
- *order to throttle the memory usage. Note that 
zero-sized blocks are
- *already excluded, which happened in
+ *For each block we also require two info: 1. the size 
(in bytes as a long
+ *field) in order to throttle the memory usage; 2. the 
mapId for this
 
 Review comment:
   `mapId` -> `mapIndex`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

SparkQA commented on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to 
make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526596051
 
 
   **[Test build #109942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109942/testReport)**
 for PR 25630 at commit 
[`28e7f2f`](https://github.com/apache/spark/commit/28e7f2fc326270b63e9d15c1f366f96742f7e282).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a 
clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526596275
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a 
clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526596279
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109942/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319505166
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSparkConfSuite.scala
 ##
 @@ -1,30 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.kafka010
-
-import org.apache.spark.{LocalSparkContext, SparkConf, SparkFunSuite}
-import org.apache.spark.util.ResetSystemProperties
-
-class KafkaSparkConfSuite extends SparkFunSuite with LocalSparkContext with 
ResetSystemProperties {
 
 Review comment:
   Which comment are you referring to? Looking for it but maybe closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319505049
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -591,6 +596,7 @@ private class BufferReleasingInputStream(
 private[storage] val delegate: InputStream,
 private val iterator: ShuffleBlockFetcherIterator,
 private val blockId: BlockId,
+private val mapId: Int,
 
 Review comment:
   `mapIndex`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

SparkQA removed a comment on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a 
clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526552539
 
 
   **[Test build #109942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109942/testReport)**
 for PR 25630 at commit 
[`28e7f2f`](https://github.com/apache/spark/commit/28e7f2fc326270b63e9d15c1f366f96742f7e282).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r319505384
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -706,6 +714,7 @@ object ShuffleBlockFetcherIterator {
*/
   private[storage] case class SuccessFetchResult(
   blockId: BlockId,
+  mapId: Int,
 
 Review comment:
   why we need the map index here？


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526595766
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14974/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526595755
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

SparkQA commented on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526596416
 
 
   **[Test build #109950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109950/testReport)**
 for PR 25633 at commit 
[`0fd34b3`](https://github.com/apache/spark/commit/0fd34b30ab9f07d6c6f9ccfad0541a59db31405e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25630: [WIP][SPARK-28894][SQL][TESTS] 
Add a clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526596275
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

HyukjinKwon commented on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526596865
 
 
   4.2.0 has https://github.com/davidB/scala-maven-plugin/pull/358 fix too FWIW.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25630: [WIP][SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25630: [WIP][SPARK-28894][SQL][TESTS] 
Add a clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526596279
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109942/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] iRakson opened a new pull request #25634: [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG in Executor Plugin API

2019-08-30 Thread GitBox

iRakson opened a new pull request #25634: [SPARK-28929][CORE] Spark Logging 
level should be INFO instead of DEBUG in Executor Plugin API
URL: https://github.com/apache/spark/pull/25634
 
 
   
   
   ### What changes were proposed in this pull request?
   Log levels in Executor.scala are changed from DEBUG to INFO. 
   
   
   
   
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Manually tested.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25634: [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG in Executor Plugin API

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25634: [SPARK-28929][CORE] Spark Logging 
level should be INFO instead of DEBUG in Executor Plugin API
URL: https://github.com/apache/spark/pull/25634#issuecomment-526598072
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25634: [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG in Executor Plugin API

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25634: [SPARK-28929][CORE] Spark 
Logging level should be INFO instead of DEBUG in Executor Plugin API
URL: https://github.com/apache/spark/pull/25634#issuecomment-526598072
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25634: [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG in Executor Plugin API

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25634: [SPARK-28929][CORE] Spark Logging 
level should be INFO instead of DEBUG in Executor Plugin API
URL: https://github.com/apache/spark/pull/25634#issuecomment-526598284
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25634: [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG in Executor Plugin API

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25634: [SPARK-28929][CORE] Spark Logging 
level should be INFO instead of DEBUG in Executor Plugin API
URL: https://github.com/apache/spark/pull/25634#issuecomment-526599082
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

SparkQA commented on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue to make 
it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526599196
 
 
   **[Test build #109951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109951/testReport)**
 for PR 25630 at commit 
[`a3c73c8`](https://github.com/apache/spark/commit/a3c73c86bba940054b3434e876411ded24b6c2b3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming

2019-08-30 Thread GitBox

gaborgsomogyi commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS 
sink for Structured Streaming
URL: https://github.com/apache/spark/pull/25618#issuecomment-526599116
 
 
   +1 having discussion on that. My perspective is clear. Having such 
limitation is a bit too much in such scenario so I'm not feeling comfortable 
with it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25634: [SPARK-28929][CORE] Spark Logging level should be INFO instead of DEBUG in Executor Plugin API

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25634: [SPARK-28929][CORE] Spark 
Logging level should be INFO instead of DEBUG in Executor Plugin API
URL: https://github.com/apache/spark/pull/25634#issuecomment-526598284
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS sink for Structured Streaming

2019-08-30 Thread GitBox

gaborgsomogyi commented on issue #25618: [SPARK-28908][SS]Implement Kafka EOS 
sink for Structured Streaming
URL: https://github.com/apache/spark/pull/25618#issuecomment-526599976
 
 
   In the meantime I'm speaking with Gyula from Flink side to understand things 
deeper...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue 
to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526601210
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14975/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins commented on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue 
to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526601202
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25630: [SPARK-28894][SQL][TESTS] Add 
a clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526601202
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25630: [SPARK-28894][SQL][TESTS] Add a clue to make it easier to debug via Jenkins's test results

2019-08-30 Thread GitBox

AmplabJenkins removed a comment on issue #25630: [SPARK-28894][SQL][TESTS] Add 
a clue to make it easier to debug via Jenkins's test results
URL: https://github.com/apache/spark/pull/25630#issuecomment-526601210
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14975/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #25633: [SPARK-28759][BUILD] Upgrade scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor

2019-08-30 Thread GitBox

srowen commented on issue #25633: [SPARK-28759][BUILD] Upgrade 
scala-maven-plugin to 4.2.0 and fix build profile on AppVeyor
URL: https://github.com/apache/spark/pull/25633#issuecomment-526602411
 
 
   Oh nice, so this possibly enables cross-compilation from JDK 11 to JDK 8 
now? great


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

gaborgsomogyi commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r319517484
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -269,9 +300,12 @@ private[kafka010] case class InternalKafkaConsumer(
   // When there is some error thrown, it's better to use a new 
consumer to drop all cached
   // states in the old consumer. We don't need to worry about the 
performance because this
   // is not a common path.
-  resetConsumer()
-  reportDataLoss(failOnDataLoss, s"Cannot fetch offset 
$toFetchOffset", e)
-  toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
+  releaseConsumer()
+  fetchedData.reset()
 
 Review comment:
   Had a refreshing look and see the concept, looks OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-30 Thread GitBox

gaborgsomogyi commented on issue #22138: [SPARK-25151][SS] Apply Apache Commons 
Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#issuecomment-526608400
 
 
   As a general comment since the consumer caching is described in the doc it 
would be good to adopt.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25612: [SPARK-3137][Core]Replace the global TorrentBroadcast lock with fine grained KeyLock

2019-08-30 Thread GitBox

Ngone51 commented on a change in pull request #25612: [SPARK-3137][Core]Replace 
the global TorrentBroadcast lock with fine grained KeyLock
URL: https://github.com/apache/spark/pull/25612#discussion_r319525047
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/KeyLock.scala
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.util.concurrent.ConcurrentHashMap
+
+/**
+ * A special locking mechanism to provide locking with a given key. By 
providing the same key
+ * (identity is tested using the `equals` method), we ensure there is only one 
`func` running at
+ * the same time.
+ *
+ * @tparam K the type of key to identify a lock. This type must implement 
`equals` and `hashCode`
+ *   correctly as it will be the key type of an internal Map.
+ */
+private[spark] class KeyLock[K] {
+
+  private val lockMap = new ConcurrentHashMap[K, AnyRef]()
+
+  private def acquireLock(key: K): Unit = {
+while (true) {
+  val lock = lockMap.putIfAbsent(key, new Object)
+  if (lock == null) return
+  lock.synchronized {
+while (lockMap.get(key) eq lock) {
 
 Review comment:
   After releasing keylock, if a new thread for the same broadcastId enters to 
put a new object before another queueing thread to check `lockMap.get(key) eq 
lock`,  both threads could get the keylock finally ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

SparkQA commented on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla 
in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526624198
 
 
   **[Test build #109947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109947/testReport)**
 for PR 25497 at commit 
[`1068514`](https://github.com/apache/spark/commit/1068514162cc9f27e57d0342d3a953967aaf76e2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate the detla in scala-maven-plugin 3.4.6 <> 4.0.0

2019-08-30 Thread GitBox

SparkQA removed a comment on issue #25497: [BUILD][DO-NOT-MERGE] Investigate 
the detla in scala-maven-plugin 3.4.6 <> 4.0.0
URL: https://github.com/apache/spark/pull/25497#issuecomment-526578038
 
 
   **[Test build #109947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109947/testReport)**
 for PR 25497 at commit 
[`1068514`](https://github.com/apache/spark/commit/1068514162cc9f27e57d0342d3a953967aaf76e2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 >

101 - 200 of 758 matches

Mail list logo