[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-11-23 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22824
  
@koeninger , is this PR okay to merge or some other review comments are 
pending ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-28 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22824
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22824
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22824
  
> I mean its easy to miss if a new "case" is added and "update" mode is not 
supported. Even now how about LeftSemi, LeftAnti, FullOuter etc?

Currently Stream stream join is supported for inner , left outer and right 
outer join . Please refer the document 

http://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-26 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
cc @gatorsmile if everything is okay ,this PR can be merged ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22824
  
cc @zsxwing @jose-torres @brkyvz 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22824
  
cc @koeninger can you please review this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22824: [SPARK-25834] Update Mode should not be supported...

2018-10-25 Thread sandeep-katta
GitHub user sandeep-katta opened a pull request:

https://github.com/apache/spark/pull/22824

[SPARK-25834] Update Mode should not be supported for Outer Joins

## What changes were proposed in this pull request?

As per spark documentation only Append mode is supported for stream stream 
join, Need to add check for this in case of left outer and right outer join

## How was this patch tested?

With UT 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeep-katta/spark outerJoin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22824


commit f36e30761de1a3960451987774187b9b4bd68d24
Author: sandeep-katta 
Date:   2018-10-25T14:15:35Z

Update Mode should not be supported for Outer Joins




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22571: [SPARK-25392][Spark Job History]Inconsistent beha...

2018-10-23 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22571#discussion_r227630523
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2434,8 +2434,15 @@ class SparkContext(config: SparkConf) extends 
Logging {
   val schedulingMode = getSchedulingMode.toString
   val addedJarPaths = addedJars.keys.toSeq
   val addedFilePaths = addedFiles.keys.toSeq
+  // SPARK-25392 pool Information should be stored in the event
+  val poolInformation = getAllPools.map { it =>
+val xmlString = ("

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-19 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226673513
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -207,6 +207,14 @@ class SessionCatalog(
   "you cannot create a database with this name.")
 }
 validateName(dbName)
+// SPARK-25464 fail if DB location exists and is not empty
+val dbPath = new Path(dbDefinition.locationUri)
+val fs = dbPath.getFileSystem(hadoopConf)
+if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath)
+  && fs.listStatus(dbPath).nonEmpty) {
--- End diff --

as per my knowledge this the only way to check for empty condition, and 
also I have moved this logic to HiveExternalCatalog


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-19 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r226548140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -207,6 +207,14 @@ class SessionCatalog(
   "you cannot create a database with this name.")
 }
 validateName(dbName)
+// SPARK-25464 fail if DB location exists and is not empty
+val dbPath = new Path(dbDefinition.locationUri)
+val fs = dbPath.getFileSystem(hadoopConf)
+if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath)
+  && fs.listStatus(dbPath).nonEmpty) {
--- End diff --

listing files is expensive, but create database command is not frequent. 
Same is mentioned here
https://github.com/apache/spark/pull/22466#discussion_r220795973


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-18 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-18 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
> Btw, what if `create database if not exists ...`? Seems like an exception 
will be thrown if the table exists even if we specify `if not exists`?

 Good catch :+1:  ,I have updated the code accordingly


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-18 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
@cloud-fan @gatorsmile all the testcases are passed and review comments are 
addressed,can you help me to merge this PR please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-17 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-15 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r225396925
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2993,6 +2990,7 @@ def test_current_database(self):
 AnalysisException,
 "does_not_exist",
 lambda: spark.catalog.setCurrentDatabase("does_not_exist"))
+spark.sql("DROP DATABASE some_db")
--- End diff --

create and drop should be part of test case,if there is any exception then 
test case will fail.So no need to put in the finally block


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-10 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
> The major comments are in the test cases. Could you help clean up the 
existing test cases?

 All the comments are fixed and corrected the testcases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-09 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
can one of the admin ask for retest please ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-08 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
@cloud-fan @gatorsmile if everything is okay,can you please merge this PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-08 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r223394163
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -207,6 +207,16 @@ class SessionCatalog(
   "you cannot create a database with this name.")
 }
 validateName(dbName)
+// SPARK-25464 fail if DB location exists and is not empty
+val dbPath = new Path(dbDefinition.locationUri)
+val fs = dbPath.getFileSystem(hadoopConf)
+if (fs.exists(dbPath)) {
+  val fileStatus = fs.listStatus(dbPath)
+  if (fileStatus.nonEmpty) {
+throw new AnalysisException(
+  s"Cannot create Database to the location which is not 
empty.Given path is ${dbPath}")
--- End diff --

if the folder exists and it is non-empty then we allow to create 
database,so this statement does not conflict the user ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-04 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r222891761
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -351,7 +351,7 @@ def tearDown(self):
 super(SQLTests, self).tearDown()
 
 # tear down test_bucketed_write state
-self.spark.sql("DROP TABLE IF EXISTS pyspark_bucket")
+self.spark.sql("DROP DATABASE IF EXISTS some_db CASCADE")
--- End diff --


test_current_database,test_list_databases,test_list_tables,test_list_functions 
and test_list_columns all these test case create the database and does not drop 
it,so the folder will be present which result in the test case failures


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-04 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r222891525
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -407,6 +407,7 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 
   test("create a managed table with the existing non-empty directory") {
 withTable("tab1") {
+  Utils.createDirectory(spark.sessionState.conf.warehousePath)
--- End diff --

As per line 414 it is trying to create empty directory under warehouse 
path,this statement will fail as parent folder(ware house) does not exists


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour f...

2018-09-28 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22571
  
cc @cloud-fan Please review 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour f...

2018-09-28 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22571
  
Yes I am sending the pool Info around,so history server also can have 
details.

And regarding the dark red ,I think git unable to parse it correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour f...

2018-09-27 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22571
  
cc @vanzin @srowen ,Please review this improvement in History UI


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22571: [SPARK-25392][Spark Job History]Inconsistent beha...

2018-09-27 Thread sandeep-katta
GitHub user sandeep-katta opened a pull request:

https://github.com/apache/spark/pull/22571

[SPARK-25392][Spark Job History]Inconsistent behaviour for pool details in 
spark web UI and history server page 


## What changes were proposed in this pull request?

1. Added the pool information to the "Pool Information" field in the 
environmentDetails by using the poolname as the key value, the xml string of 
some other fields of the pool is Value.
This event is persist in event log dir

2.AppStatusListener::onEnvironmentUpdate will parse the xml String and 
construct the Map[poolName,Pool] which will be stored in kvstore.

2, the page is divided into two types: one is AllStagesPage, and the other 
is PoolPage,both will fetch the pool Details from kvstore and display 
accordingly

In summary, modify the environmentDetails, save the pool information 
through the SparkListenerEnvironmentUpdate event information; modify it 
according to the modification scheme. 

## How was this patch tested?

Added Testcases and also verified manually in the cluster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeep-katta/spark historyserver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22571


commit 65e3ca79c3ce8bcd16d9e92d7e9b5e69b532edef
Author: sandeep-katta 
Date:   2018-09-27T10:18:45Z

RootCause:History server does not maintain Pool Details as no event was 
sent to maintain pool Details
Modification content:Pool Information is sent as a part of EnvironmentInfo 
Event




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-27 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
I am running the same test case with hive version **1.2.1.spark2**  and it 
is passing,can I know with what hive version CI is running and how 
org.apache.hive.jdbc.HiveStatement  and external catalog are linked,I don't see 
any such code. cc @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-27 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r220873051
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -66,6 +66,19 @@ case class CreateDatabaseCommand(
   extends RunnableCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
+// SPARK-25464 fail if DB location exists and is not empty
+if (path.isDefined) {
--- End diff --

> Please try the following cases:
> 
>   * the directory does not exist and the parent directory does not exist 
too.
spark-sql> create database noparent location 
'/user/dropTest/noparent/dblocation';
Time taken: 2.246 seconds
> * the directory exists but empty

spark-sql> create database emptydir location '/user/dropTest/emptydir';
Time taken: 0.262 seconds

> * the directory exists and non-empty
spark-sql> create database dirExistsAndNotempty location '/user/dropTest/';
Error in query: Cannot create Database to the location which is not empty.;

> * the directory is not accessible

spark-sql> create database userNoPermission location '/user/hive/dropTest';
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.security.AccessControlException: Permission denied: 
user=sandeep, access=WRITE, inode="/user/hive":hive:hive:drwxr-x---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-27 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r220872507
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -66,6 +66,19 @@ case class CreateDatabaseCommand(
   extends RunnableCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
+// SPARK-25464 fail if DB location exists and is not empty
+if (path.isDefined) {
--- End diff --

> Just want to confirm it what is the behavior of Hive (3.x) when we try to 
create a database in a non-empty file path?

User can create the Database on the non-empty file path,results are below

host1:/opt/tools/hadoop3.1/hadoop-3.1.1/bin # ./hadoop fs -ls /user/hive/
18/09/27 18:05:39 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x   - root supergroup  0 2018-09-27 18:05 
/user/hive/dropTest
drwxr-xr-x   - root supergroup  0 2018-09-27 17:19 
/user/hive/warehouse

hive> create database testjira location '/user/hive/';
OK
Time taken: 0.187 seconds
hive> desc database testjira;
OK
testjirahdfs://localhost:9000/user/hive rootUSER
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive> drop database testjira;
OK
Time taken: 0.194 seconds
hive> 

host1:/opt/tools/hadoop3.1/hadoop-3.1.1/bin # ./hadoop fs -ls /user
18/09/27 18:07:32 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
host1:/opt/tools/hadoop3.1/hadoop-3.1.1/bin # 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-26 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/22466#discussion_r220564446
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -66,6 +66,19 @@ case class CreateDatabaseCommand(
   extends RunnableCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
+// SPARK-25464 fail if DB location exists and is not empty
+if (path.isDefined) {
--- End diff --

The reason I did here is, external catalog always have path (default or 
user specified),so we end up in checking the path always.This may be the costly 
operation if the file system is S3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-25 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
cc @cloud-fan @srowen  I have updated the code,Please review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-24 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
seems @cloud-fan  comments are valid as it will not result in any behavior 
change, I will update the PR accordingly WDYT @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-23 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
> See JIRA, I don't think this should be merged.

I have referred Databricks doc 
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-database.html
 and implemented accordingly.Let me know if any suggesstion


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-23 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
Yes I agree 2 database should not point to same path,**currently this is 
the loop hole in spark which is required to fix**.If this solution is not okay 
,then we can append the dbname.db to the location given by the user
for e.g
create database db1 location /user/hive/warehouse
then the location of the DB should be /user/hive/warehouse/db1.db




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22466: [SPARK-25464][SQL]When database is dropped all th...

2018-09-19 Thread sandeep-katta
GitHub user sandeep-katta opened a pull request:

https://github.com/apache/spark/pull/22466

[SPARK-25464][SQL]When database is dropped all the data related to it is 
deleted

Modification content:If the database is external then not required to 
delete it's content.

 What changes were proposed in this pull request?

If the user specifies the location while creating the Database then it will 
be considered as External Database.
For External Database on dropping the database,location will not be deleted.

for e.g create database db1 location '/user/hive/warehouse';
drop database db1;
warehouse folder will not be deleted

 How was this patch tested?
Added testcases and also manually verified in the cluster


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeep-katta/spark dropDatabse

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22466


commit c2eec35d3fb4c0402c4a243c389316aecb639f6f
Author: sandeep-katta 
Date:   2018-09-18T19:15:52Z

RootCause:When database is dropped all the data related to it is deleted
Modification content:If the database is external then not required to 
delete it's content.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...

2018-06-27 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/21565
  
yes all the review comments are addressed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...

2018-06-27 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/21565#discussion_r198447523
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager(
 newExecutorTotal = numExistingExecutors
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
+// If it is cachedBlcok timeout is configured using
+// spark.dynamicAllocation.cachedExecutorIdleTimeout
+val idleTimeout = if 
(blockManagerMaster.hasCachedBlocks(removedExecutorId)) {
--- End diff --

blockManagerMaster already provides API to check it is cached block or 
not,so I feel it will be overhead to maintain another HashMap 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...

2018-06-27 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/21565
  
@cloud-fan Build is passed.Can you push this PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...

2018-06-21 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/21565
  
@cloud-fan can you please review this small piece of code and merge this PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...

2018-06-18 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/21565#discussion_r195989897
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager(
 newExecutorTotal = numExistingExecutors
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
+// If it is cachedBlcok timeout is configured using
+// spark.dynamicAllocation.cachedExecutorIdleTimeout
--- End diff --

@maropu can you review and merge the PR if the changes are fine


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...

2018-06-15 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/21565#discussion_r195895035
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager(
 newExecutorTotal = numExistingExecutors
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
+// If it is cachedBlcok timeout is configured using
+// spark.dynamicAllocation.cachedExecutorIdleTimeout
+val idleTimeout = if 
(SparkEnv.get.blockManager.master.hasCachedBlocks(removedExecutorId)) {
--- End diff --

I have updated the code as per comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...

2018-06-15 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/21565#discussion_r195894845
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager(
 newExecutorTotal = numExistingExecutors
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
+// If it is cachedBlcok timeout is configured using
+// spark.dynamicAllocation.cachedExecutorIdleTimeout
--- End diff --

yes, if it is cached block then IDLE time out taken from the 
spark.dynamicAllocation.cachedExecutorIdleTimeout


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21565: [SPARK-24558][Core]wrong Idle Timeout value is us...

2018-06-15 Thread sandeep-katta
Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/spark/pull/21565#discussion_r195894823
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -488,9 +488,16 @@ private[spark] class ExecutorAllocationManager(
 newExecutorTotal = numExistingExecutors
 if (testing || executorsRemoved.nonEmpty) {
   executorsRemoved.foreach { removedExecutorId =>
+// If it is cachedBlcok timeout is configured using
+// spark.dynamicAllocation.cachedExecutorIdleTimeout
+val idleTimeout = if 
(SparkEnv.get.blockManager.master.hasCachedBlocks(removedExecutorId)) {
--- End diff --

yes  blockManagerMaster is referring to same 
SparkEnv.get.blockManager.master. Will update the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21565: wrong Idle Timeout value is used in case of the c...

2018-06-14 Thread sandeep-katta
GitHub user sandeep-katta opened a pull request:

https://github.com/apache/spark/pull/21565

wrong Idle Timeout value is used in case of the cacheBlock.

It is corrected as per the configuration.

## What changes were proposed in this pull request?
IdleTimeout info used to print in the logs is taken based on the 
cacheBlock. If it is cacheBlock then cachedExecutorIdleTimeoutS is considered 
else executorIdleTimeoutS 

## How was this patch tested?
Manual Test
spark-sql> cache table sample;
2018-05-15 14:44:02 INFO  DAGScheduler:54 - Submitting 3 missing tasks from 
ShuffleMapStage 0 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) 
(first 15 tasks are for partitions Vector(0, 1, 2))
2018-05-15 14:44:02 INFO  YarnScheduler:54 - Adding task set 0.0 with 3 
tasks
2018-05-15 14:44:03 INFO  ExecutorAllocationManager:54 - Requesting 1 new 
executor because tasks are backlogged (new desired total will be 1)
...
...
2018-05-15 14:46:10 INFO  YarnClientSchedulerBackend:54 - Actual list of 
executor(s) to be killed is 1
2018-05-15 14:46:10 INFO  **ExecutorAllocationManager:54 - Removing 
executor 1 because it has been idle for 120 seconds (new desired total will be 
0)**
2018-05-15 14:46:11 INFO  YarnSchedulerBackend$YarnDriverEndpoint:54 - 
Disabling executor 1.
2018-05-15 14:46:11 INFO  DAGScheduler:54 - Executor lost: 1 (epoch 1)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeep-katta/spark loginfoBug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21565


commit 30fcef650ee2bd2873bf402448652acba055f989
Author: sandeep-katta 
Date:   2018-06-14T09:56:59Z

wrong Idle Timeout value is used in case of the cacheBlock.
It is corrected as per the configuration.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org