[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22824 @koeninger , is this PR okay to merge or some other review comments are pending ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22824 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22824 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22824 > I mean its easy to miss if a new "case" is added and "update" mode is not supported. Even now how about LeftSemi, LeftAnti, FullOuter etc? Currently Stream stream join is supported for inner , left outer and right outer join . Please refer the document http://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 cc @gatorsmile if everything is okay ,this PR can be merged ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22824 cc @zsxwing @jose-torres @brkyvz --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22824 cc @koeninger can you please review this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22824: [SPARK-25834] Update Mode should not be supported...
GitHub user sandeep-katta opened a pull request: https://github.com/apache/spark/pull/22824 [SPARK-25834] Update Mode should not be supported for Outer Joins ## What changes were proposed in this pull request? As per spark documentation only Append mode is supported for stream stream join, Need to add check for this in case of left outer and right outer join ## How was this patch tested? With UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeep-katta/spark outerJoin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22824.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22824 commit f36e30761de1a3960451987774187b9b4bd68d24 Author: sandeep-katta Date: 2018-10-25T14:15:35Z Update Mode should not be supported for Outer Joins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22571: [SPARK-25392][Spark Job History]Inconsistent beha...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22571#discussion_r227630523 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2434,8 +2434,15 @@ class SparkContext(config: SparkConf) extends Logging { val schedulingMode = getSchedulingMode.toString val addedJarPaths = addedJars.keys.toSeq val addedFilePaths = addedFiles.keys.toSeq + // SPARK-25392 pool Information should be stored in the event + val poolInformation = getAllPools.map { it => +val xmlString = ("
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226673513 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath) + && fs.listStatus(dbPath).nonEmpty) { --- End diff -- as per my knowledge this the only way to check for empty condition, and also I have moved this logic to HiveExternalCatalog --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226548140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath) + && fs.listStatus(dbPath).nonEmpty) { --- End diff -- listing files is expensive, but create database command is not frequent. Same is mentioned here https://github.com/apache/spark/pull/22466#discussion_r220795973 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 > Btw, what if `create database if not exists ...`? Seems like an exception will be thrown if the table exists even if we specify `if not exists`? Good catch :+1: ,I have updated the code accordingly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 @cloud-fan @gatorsmile all the testcases are passed and review comments are addressed,can you help me to merge this PR please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r225396925 --- Diff: python/pyspark/sql/tests.py --- @@ -2993,6 +2990,7 @@ def test_current_database(self): AnalysisException, "does_not_exist", lambda: spark.catalog.setCurrentDatabase("does_not_exist")) +spark.sql("DROP DATABASE some_db") --- End diff -- create and drop should be part of test case,if there is any exception then test case will fail.So no need to put in the finally block --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 > The major comments are in the test cases. Could you help clean up the existing test cases? All the comments are fixed and corrected the testcases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 can one of the admin ask for retest please ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 @cloud-fan @gatorsmile if everything is okay,can you please merge this PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r223394163 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,16 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (fs.exists(dbPath)) { + val fileStatus = fs.listStatus(dbPath) + if (fileStatus.nonEmpty) { +throw new AnalysisException( + s"Cannot create Database to the location which is not empty.Given path is ${dbPath}") --- End diff -- if the folder exists and it is non-empty then we allow to create database,so this statement does not conflict the user ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r222891761 --- Diff: python/pyspark/sql/tests.py --- @@ -351,7 +351,7 @@ def tearDown(self): super(SQLTests, self).tearDown() # tear down test_bucketed_write state -self.spark.sql("DROP TABLE IF EXISTS pyspark_bucket") +self.spark.sql("DROP DATABASE IF EXISTS some_db CASCADE") --- End diff -- test_current_database,test_list_databases,test_list_tables,test_list_functions and test_list_columns all these test case create the database and does not drop it,so the folder will be present which result in the test case failures --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r222891525 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -407,6 +407,7 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { test("create a managed table with the existing non-empty directory") { withTable("tab1") { + Utils.createDirectory(spark.sessionState.conf.warehousePath) --- End diff -- As per line 414 it is trying to create empty directory under warehouse path,this statement will fail as parent folder(ware house) does not exists --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour f...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22571 cc @cloud-fan Please review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour f...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22571 Yes I am sending the pool Info around,so history server also can have details. And regarding the dark red ,I think git unable to parse it correctly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour f...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22571 cc @vanzin @srowen ,Please review this improvement in History UI --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22571: [SPARK-25392][Spark Job History]Inconsistent beha...
GitHub user sandeep-katta opened a pull request: https://github.com/apache/spark/pull/22571 [SPARK-25392][Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page ## What changes were proposed in this pull request? 1. Added the pool information to the "Pool Information" field in the environmentDetails by using the poolname as the key value, the xml string of some other fields of the pool is Value. This event is persist in event log dir 2.AppStatusListener::onEnvironmentUpdate will parse the xml String and construct the Map[poolName,Pool] which will be stored in kvstore. 2, the page is divided into two types: one is AllStagesPage, and the other is PoolPage,both will fetch the pool Details from kvstore and display accordingly In summary, modify the environmentDetails, save the pool information through the SparkListenerEnvironmentUpdate event information; modify it according to the modification scheme. ## How was this patch tested? Added Testcases and also verified manually in the cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeep-katta/spark historyserver Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22571.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22571 commit 65e3ca79c3ce8bcd16d9e92d7e9b5e69b532edef Author: sandeep-katta Date: 2018-09-27T10:18:45Z RootCause:History server does not maintain Pool Details as no event was sent to maintain pool Details Modification content:Pool Information is sent as a part of EnvironmentInfo Event --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 I am running the same test case with hive version **1.2.1.spark2** and it is passing,can I know with what hive version CI is running and how org.apache.hive.jdbc.HiveStatement and external catalog are linked,I don't see any such code. cc @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r220873051 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -66,6 +66,19 @@ case class CreateDatabaseCommand( extends RunnableCommand { override def run(sparkSession: SparkSession): Seq[Row] = { +// SPARK-25464 fail if DB location exists and is not empty +if (path.isDefined) { --- End diff -- > Please try the following cases: > > * the directory does not exist and the parent directory does not exist too. spark-sql> create database noparent location '/user/dropTest/noparent/dblocation'; Time taken: 2.246 seconds > * the directory exists but empty spark-sql> create database emptydir location '/user/dropTest/emptydir'; Time taken: 0.262 seconds > * the directory exists and non-empty spark-sql> create database dirExistsAndNotempty location '/user/dropTest/'; Error in query: Cannot create Database to the location which is not empty.; > * the directory is not accessible spark-sql> create database userNoPermission location '/user/hive/dropTest'; Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=sandeep, access=WRITE, inode="/user/hive":hive:hive:drwxr-x--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r220872507 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -66,6 +66,19 @@ case class CreateDatabaseCommand( extends RunnableCommand { override def run(sparkSession: SparkSession): Seq[Row] = { +// SPARK-25464 fail if DB location exists and is not empty +if (path.isDefined) { --- End diff -- > Just want to confirm it what is the behavior of Hive (3.x) when we try to create a database in a non-empty file path? User can create the Database on the non-empty file path,results are below host1:/opt/tools/hadoop3.1/hadoop-3.1.1/bin # ./hadoop fs -ls /user/hive/ 18/09/27 18:05:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items drwxr-xr-x - root supergroup 0 2018-09-27 18:05 /user/hive/dropTest drwxr-xr-x - root supergroup 0 2018-09-27 17:19 /user/hive/warehouse hive> create database testjira location '/user/hive/'; OK Time taken: 0.187 seconds hive> desc database testjira; OK testjirahdfs://localhost:9000/user/hive rootUSER Time taken: 0.045 seconds, Fetched: 1 row(s) hive> drop database testjira; OK Time taken: 0.194 seconds hive> host1:/opt/tools/hadoop3.1/hadoop-3.1.1/bin # ./hadoop fs -ls /user 18/09/27 18:07:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable host1:/opt/tools/hadoop3.1/hadoop-3.1.1/bin # --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r220564446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -66,6 +66,19 @@ case class CreateDatabaseCommand( extends RunnableCommand { override def run(sparkSession: SparkSession): Seq[Row] = { +// SPARK-25464 fail if DB location exists and is not empty +if (path.isDefined) { --- End diff -- The reason I did here is, external catalog always have path (default or user specified),so we end up in checking the path always.This may be the costly operation if the file system is S3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 cc @cloud-fan @srowen I have updated the code,Please review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 seems @cloud-fan comments are valid as it will not result in any behavior change, I will update the PR accordingly WDYT @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 > See JIRA, I don't think this should be merged. I have referred Databricks doc https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-database.html and implemented accordingly.Let me know if any suggesstion --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/22466 Yes I agree 2 database should not point to same path,**currently this is the loop hole in spark which is required to fix**.If this solution is not okay ,then we can append the dbname.db to the location given by the user for e.g create database db1 location /user/hive/warehouse then the location of the DB should be /user/hive/warehouse/db1.db --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL]When database is dropped all th...
GitHub user sandeep-katta opened a pull request: https://github.com/apache/spark/pull/22466 [SPARK-25464][SQL]When database is dropped all the data related to it is deleted Modification content:If the database is external then not required to delete it's content. What changes were proposed in this pull request? If the user specifies the location while creating the Database then it will be considered as External Database. For External Database on dropping the database,location will not be deleted. for e.g create database db1 location '/user/hive/warehouse'; drop database db1; warehouse folder will not be deleted How was this patch tested? Added testcases and also manually verified in the cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeep-katta/spark dropDatabse Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22466.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22466 commit c2eec35d3fb4c0402c4a243c389316aecb639f6f Author: sandeep-katta Date: 2018-09-18T19:15:52Z RootCause:When database is dropped all the data related to it is deleted Modification content:If the database is external then not required to delete it's content. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/21565 yes all the review comments are addressed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/21565#discussion_r198447523 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager( newExecutorTotal = numExistingExecutors if (testing || executorsRemoved.nonEmpty) { executorsRemoved.foreach { removedExecutorId => +// If it is cachedBlcok timeout is configured using +// spark.dynamicAllocation.cachedExecutorIdleTimeout +val idleTimeout = if (blockManagerMaster.hasCachedBlocks(removedExecutorId)) { --- End diff -- blockManagerMaster already provides API to check it is cached block or not,so I feel it will be overhead to maintain another HashMap --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/21565 @cloud-fan Build is passed.Can you push this PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...
Github user sandeep-katta commented on the issue: https://github.com/apache/spark/pull/21565 @cloud-fan can you please review this small piece of code and merge this PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/21565#discussion_r195989897 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager( newExecutorTotal = numExistingExecutors if (testing || executorsRemoved.nonEmpty) { executorsRemoved.foreach { removedExecutorId => +// If it is cachedBlcok timeout is configured using +// spark.dynamicAllocation.cachedExecutorIdleTimeout --- End diff -- @maropu can you review and merge the PR if the changes are fine --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/21565#discussion_r195895035 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager( newExecutorTotal = numExistingExecutors if (testing || executorsRemoved.nonEmpty) { executorsRemoved.foreach { removedExecutorId => +// If it is cachedBlcok timeout is configured using +// spark.dynamicAllocation.cachedExecutorIdleTimeout +val idleTimeout = if (SparkEnv.get.blockManager.master.hasCachedBlocks(removedExecutorId)) { --- End diff -- I have updated the code as per comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/21565#discussion_r195894845 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager( newExecutorTotal = numExistingExecutors if (testing || executorsRemoved.nonEmpty) { executorsRemoved.foreach { removedExecutorId => +// If it is cachedBlcok timeout is configured using +// spark.dynamicAllocation.cachedExecutorIdleTimeout --- End diff -- yes, if it is cached block then IDLE time out taken from the spark.dynamicAllocation.cachedExecutorIdleTimeout --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...
Github user sandeep-katta commented on a diff in the pull request: https://github.com/apache/spark/pull/21565#discussion_r195894823 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager( newExecutorTotal = numExistingExecutors if (testing || executorsRemoved.nonEmpty) { executorsRemoved.foreach { removedExecutorId => +// If it is cachedBlcok timeout is configured using +// spark.dynamicAllocation.cachedExecutorIdleTimeout +val idleTimeout = if (SparkEnv.get.blockManager.master.hasCachedBlocks(removedExecutorId)) { --- End diff -- yes blockManagerMaster is referring to same SparkEnv.get.blockManager.master. Will update the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: wrong Idle Timeout value is used in case of the c...
GitHub user sandeep-katta opened a pull request: https://github.com/apache/spark/pull/21565 wrong Idle Timeout value is used in case of the cacheBlock. It is corrected as per the configuration. ## What changes were proposed in this pull request? IdleTimeout info used to print in the logs is taken based on the cacheBlock. If it is cacheBlock then cachedExecutorIdleTimeoutS is considered else executorIdleTimeoutS ## How was this patch tested? Manual Test spark-sql> cache table sample; 2018-05-15 14:44:02 INFO DAGScheduler:54 - Submitting 3 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1, 2)) 2018-05-15 14:44:02 INFO YarnScheduler:54 - Adding task set 0.0 with 3 tasks 2018-05-15 14:44:03 INFO ExecutorAllocationManager:54 - Requesting 1 new executor because tasks are backlogged (new desired total will be 1) ... ... 2018-05-15 14:46:10 INFO YarnClientSchedulerBackend:54 - Actual list of executor(s) to be killed is 1 2018-05-15 14:46:10 INFO **ExecutorAllocationManager:54 - Removing executor 1 because it has been idle for 120 seconds (new desired total will be 0)** 2018-05-15 14:46:11 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Disabling executor 1. 2018-05-15 14:46:11 INFO DAGScheduler:54 - Executor lost: 1 (epoch 1) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeep-katta/spark loginfoBug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21565.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21565 commit 30fcef650ee2bd2873bf402448652acba055f989 Author: sandeep-katta Date: 2018-06-14T09:56:59Z wrong Idle Timeout value is used in case of the cacheBlock. It is corrected as per the configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org