from:"Sandeep Katta \(JIRA\)"

[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843620#comment-17843620
 ] 

Sandeep Katta commented on SPARK-35531:
---

Bug is tracked here https://issues.apache.org/jira/browse/SPARK-48140 

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48140) Can not alter bucketed table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-48140:
--
Description: 
Running below SQL command throws exception

 
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');

*Exception:*
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 
is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
comment:null), FieldSchema(name:s1, type:int, comment:null)]
        at 
org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1145)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:594)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:587)
        at 
org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:124)
        at 
org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:123)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:93)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:687)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
        ... 62 more
{code}

  was:
Running below SQL command throws exception

 
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');


> Can not alter bucketed table if create table with upper case schema
> ---
>
> Key: SPARK-48140
> URL: https://issues.apache.org/jira/browse/SPARK-48140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Sandeep Katta
>Priority: Major
>
> Running below SQL command throws exception
>  
> CREATE TABLE TEST1(
> V1 BIGINT,
> S1 INT)
> PARTITIONED BY (PK BIGINT)
> CLUSTERED BY (V1)
> SORTED BY (S1)
> INTO 200 BUCKETS
> STORED AS PARQUET;
> ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');
> *Exception:*
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns 
> V1 is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
> comment:null), FieldSchema(name:s1, type:int, comment:null)]
>         at 
> org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1145)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:594)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:587)
>         at 
> org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:124)
>         at 
> org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:123)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:93)
>         at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:687)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>         ... 62 more
> {code}



--
This message was sent by Atlassian Jira
(v8.

[jira] [Created] (SPARK-48140) Can not alter bucketed table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-48140:
-

 Summary: Can not alter bucketed table if create table with upper 
case schema
 Key: SPARK-48140
 URL: https://issues.apache.org/jira/browse/SPARK-48140
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Sandeep Katta


Running below SQL command throws exception

 
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843619#comment-17843619
 ] 

Sandeep Katta commented on SPARK-35531:
---

[~angerszhuuu] , I do see same issue in alter table command, I tested in 
SPARK-3.5.0 and issue still exists
{code:java}
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.'); 
{code}
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 
is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
comment:null), FieldSchema(name:s1, type:int, comment:null)]         at 
org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)         
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1145)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:594)
         at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)         
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:587)
         at 
org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:124)    
     at 
org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:123)   
      at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:93)
         at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:687)
         at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)         
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
         ... 62 more
{code}

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44265) Built-in XML data source support

2023-07-19 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744535#comment-17744535
 ] 

Sandeep Katta commented on SPARK-44265:
---

[~sandip.agarwala] could you update the SPIP link here also please ?

> Built-in XML data source support
> 
>
> Key: SPARK-44265
> URL: https://issues.apache.org/jira/browse/SPARK-44265
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Sandip Agarwala
>Priority: Major
>
> XML is a widely used data format. An external spark-xml package 
> ([https://github.com/databricks/spark-xml)] is available to read and write 
> XML data in spark. Making spark-xml built-in will provide a better user 
> experience for Spark SQL and structured streaming. The proposal is to inline 
> code from spark-xml package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39257) use spark.read.jdbc() to read data from SQL databse into dataframe, it fails silently, when the session is killed from SQL server side

2022-12-05 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta resolved SPARK-39257.
---
Resolution: Not A Problem

Issue is caused by mssql-jdbc driver and is fixed in 12.1.0 version using the 
PR [1942|https://github.com/microsoft/mssql-jdbc/pull/1942]

> use spark.read.jdbc() to read data from SQL databse into dataframe, it fails 
> silently, when the session is killed from SQL server side
> --
>
> Key: SPARK-39257
> URL: https://issues.apache.org/jira/browse/SPARK-39257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.2, 3.2.1
> Environment: {*}Spark version{*}: spark 3.0.1/3.1.2/3.2.1
> *Microsoft JDBC Driver* *for SQL server:* 
> mssql-jdbc-8.2.1.jre8/mssql-jdbc-10.2.1.jre8.jar
>Reporter: Xinran Tao
>Priority: Major
>
> I'm using *spark.read.jdbc()* to read form SQL database into a dataframe, 
> which utilizes *Microsoft JDBC Driver* *for SQL server* to get data from the 
> SQL server.
> *codes:*
>  
> {code:java}
> %scala
> val token = "xxx"
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = 1433
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="
> import java.util.Properties
> val connectionProperties = new Properties()
> val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> connectionProperties.setProperty("Driver", driverClass)
> connectionProperties.setProperty("accesstoken", token)
> val sql_pushdown = "(select UNITS from payment_balance_new) emp_alias"
> val df_stripe_dispute = spark.read.option("connectRetryCount", 
> 200).option("numPartitions",1).jdbc(url=jdbcUrl, table=sql_pushdown, 
> properties=connectionProperties)
> df_stripe_dispute.count()
> {code}
>  
>  
> The session was accidentally killed by some automatic scripts from SQL server 
> side, but no errors shows up from the spark side, no failure was observed. 
> But from the count() result, the reords are far less than it should be.
>  
> If I'm directly using *Microsoft JDBC Driver* *for SQL server* to run the 
> query and print the data out, which doesn't involve spark, there would be a 
> connection reset error thrown out.
> *codes:*
>  
> {code:java}
> %scala
> import java.sql.DriverManager
> import java.sql.Connection
> import java.util.Properties;
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = "1433"
> val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> val token = ""
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="+token
>  
> var connection:Connection = null
> val info:Properties = new Properties();
> info.setProperty("accesstoken", token);
>     
> // make the connection
> Class.forName(driver)
> connection = DriverManager.getConnection(jdbcUrl,info )
> // create the statement, and run the select query
> val statement = connection.createStatement()
> val resultSet = statement.executeQuery("select UNITS from 
> payment_balance_new")
> while ( resultSet.next() ) {
>   println("__"+resultSet.getString(1))
> }
> {code}
>  
> *errors:*
>  
> {code:java}
> com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
> at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2998)
>  at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2034) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6446) at 
> com.microsoft.sqlserver.jdbc.TDSReader.nextPacket(IOBuffer.java:6396) at 
> com.microsoft.sqlserver.jdbc.TDSReader.ensurePayload(IOBuffer.java:6374) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readBytes(IOBuffer.java:6675) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readWrappedBytes(IOBuffer.java:6696) 
> at com.microsoft.sqlserver.jdbc.TDSReader.readInt(IOBuffer.java:6645) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readUnsignedInt(IOBuffer.java:6659) at 
> com.microsoft.sqlserver.jdbc.PLPInputStream.readBytesInternal(PLPInputStream.java:309)
>  at 
> com.microsoft.sqlserver.jdbc.PLPInputStream.getBytes(PLPInputStream.java:105) 
> at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:757) at 
> com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(dtv.java:3748) at 
> com.microsoft.sqlserver.jdbc.DTV.getValue(dtv.java:247) at 
> com.microsoft.sqlserver.jdbc.Column.ge

[jira] [Commented] (SPARK-39257) use spark.read.jdbc() to read data from SQL databse into dataframe, it fails silently, when the session is killed from SQL server side

2022-12-05 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643501#comment-17643501
 ] 

Sandeep Katta commented on SPARK-39257:
---

Closing this jira as this is fixed by *mssql-jdbc 
[1942|https://github.com/microsoft/mssql-jdbc/pull/1942]*

> use spark.read.jdbc() to read data from SQL databse into dataframe, it fails 
> silently, when the session is killed from SQL server side
> --
>
> Key: SPARK-39257
> URL: https://issues.apache.org/jira/browse/SPARK-39257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.2, 3.2.1
> Environment: {*}Spark version{*}: spark 3.0.1/3.1.2/3.2.1
> *Microsoft JDBC Driver* *for SQL server:* 
> mssql-jdbc-8.2.1.jre8/mssql-jdbc-10.2.1.jre8.jar
>Reporter: Xinran Tao
>Priority: Major
>
> I'm using *spark.read.jdbc()* to read form SQL database into a dataframe, 
> which utilizes *Microsoft JDBC Driver* *for SQL server* to get data from the 
> SQL server.
> *codes:*
>  
> {code:java}
> %scala
> val token = "xxx"
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = 1433
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="
> import java.util.Properties
> val connectionProperties = new Properties()
> val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> connectionProperties.setProperty("Driver", driverClass)
> connectionProperties.setProperty("accesstoken", token)
> val sql_pushdown = "(select UNITS from payment_balance_new) emp_alias"
> val df_stripe_dispute = spark.read.option("connectRetryCount", 
> 200).option("numPartitions",1).jdbc(url=jdbcUrl, table=sql_pushdown, 
> properties=connectionProperties)
> df_stripe_dispute.count()
> {code}
>  
>  
> The session was accidentally killed by some automatic scripts from SQL server 
> side, but no errors shows up from the spark side, no failure was observed. 
> But from the count() result, the reords are far less than it should be.
>  
> If I'm directly using *Microsoft JDBC Driver* *for SQL server* to run the 
> query and print the data out, which doesn't involve spark, there would be a 
> connection reset error thrown out.
> *codes:*
>  
> {code:java}
> %scala
> import java.sql.DriverManager
> import java.sql.Connection
> import java.util.Properties;
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = "1433"
> val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> val token = ""
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="+token
>  
> var connection:Connection = null
> val info:Properties = new Properties();
> info.setProperty("accesstoken", token);
>     
> // make the connection
> Class.forName(driver)
> connection = DriverManager.getConnection(jdbcUrl,info )
> // create the statement, and run the select query
> val statement = connection.createStatement()
> val resultSet = statement.executeQuery("select UNITS from 
> payment_balance_new")
> while ( resultSet.next() ) {
>   println("__"+resultSet.getString(1))
> }
> {code}
>  
> *errors:*
>  
> {code:java}
> com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
> at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2998)
>  at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2034) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6446) at 
> com.microsoft.sqlserver.jdbc.TDSReader.nextPacket(IOBuffer.java:6396) at 
> com.microsoft.sqlserver.jdbc.TDSReader.ensurePayload(IOBuffer.java:6374) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readBytes(IOBuffer.java:6675) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readWrappedBytes(IOBuffer.java:6696) 
> at com.microsoft.sqlserver.jdbc.TDSReader.readInt(IOBuffer.java:6645) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readUnsignedInt(IOBuffer.java:6659) at 
> com.microsoft.sqlserver.jdbc.PLPInputStream.readBytesInternal(PLPInputStream.java:309)
>  at 
> com.microsoft.sqlserver.jdbc.PLPInputStream.getBytes(PLPInputStream.java:105) 
> at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:757) at 
> com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(dtv.java:3748) at 
> com.microsoft.sqlserver.jdbc.DTV.getValue(dtv.java:247) at 
> com.microsoft.sqlserver.jdbc.Column.getValu

[jira] [Comment Edited] (SPARK-39257) use spark.read.jdbc() to read data from SQL databse into dataframe, it fails silently, when the session is killed from SQL server side

2022-12-01 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641736#comment-17641736
 ] 

Sandeep Katta edited comment on SPARK-39257 at 12/1/22 8:02 AM:


[~xinrantao]  I believe you are facing same issue as this 
[https://github.com/microsoft/mssql-jdbc/issues/1846.]  And this is fixed in 
the version [12.1.0 
|https://github.com/microsoft/mssql-jdbc/releases/tag/v12.1.0]  by the 
*mssql-jdbc* team using the PR 
[1942|https://github.com/microsoft/mssql-jdbc/pull/1942] . So you can use the 
mssql-jdbc jar with version 12.1.0 to fix this issue.

 


was (Author: sandeep.katta2007):
[~xinrantao]  I believe you are facing same issue as this 
[https://github.com/microsoft/mssql-jdbc/issues/1846.]  And this is fixed by 
the *mssql-jdbc* team using the PR 
[1942.|https://github.com/microsoft/mssql-jdbc/pull/1942] which is available in 
the release [12.1.0 
|https://github.com/microsoft/mssql-jdbc/releases/tag/v12.1.0] . You can this 
upgraded jar to solve this issue.

 

> use spark.read.jdbc() to read data from SQL databse into dataframe, it fails 
> silently, when the session is killed from SQL server side
> --
>
> Key: SPARK-39257
> URL: https://issues.apache.org/jira/browse/SPARK-39257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.2, 3.2.1
> Environment: {*}Spark version{*}: spark 3.0.1/3.1.2/3.2.1
> *Microsoft JDBC Driver* *for SQL server:* 
> mssql-jdbc-8.2.1.jre8/mssql-jdbc-10.2.1.jre8.jar
>Reporter: Xinran Tao
>Priority: Major
>
> I'm using *spark.read.jdbc()* to read form SQL database into a dataframe, 
> which utilizes *Microsoft JDBC Driver* *for SQL server* to get data from the 
> SQL server.
> *codes:*
>  
> {code:java}
> %scala
> val token = "xxx"
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = 1433
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="
> import java.util.Properties
> val connectionProperties = new Properties()
> val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> connectionProperties.setProperty("Driver", driverClass)
> connectionProperties.setProperty("accesstoken", token)
> val sql_pushdown = "(select UNITS from payment_balance_new) emp_alias"
> val df_stripe_dispute = spark.read.option("connectRetryCount", 
> 200).option("numPartitions",1).jdbc(url=jdbcUrl, table=sql_pushdown, 
> properties=connectionProperties)
> df_stripe_dispute.count()
> {code}
>  
>  
> The session was accidentally killed by some automatic scripts from SQL server 
> side, but no errors shows up from the spark side, no failure was observed. 
> But from the count() result, the reords are far less than it should be.
>  
> If I'm directly using *Microsoft JDBC Driver* *for SQL server* to run the 
> query and print the data out, which doesn't involve spark, there would be a 
> connection reset error thrown out.
> *codes:*
>  
> {code:java}
> %scala
> import java.sql.DriverManager
> import java.sql.Connection
> import java.util.Properties;
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = "1433"
> val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> val token = ""
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="+token
>  
> var connection:Connection = null
> val info:Properties = new Properties();
> info.setProperty("accesstoken", token);
>     
> // make the connection
> Class.forName(driver)
> connection = DriverManager.getConnection(jdbcUrl,info )
> // create the statement, and run the select query
> val statement = connection.createStatement()
> val resultSet = statement.executeQuery("select UNITS from 
> payment_balance_new")
> while ( resultSet.next() ) {
>   println("__"+resultSet.getString(1))
> }
> {code}
>  
> *errors:*
>  
> {code:java}
> com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
> at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2998)
>  at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2034) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6446) at 
> com.microsoft.sqlserver.jdbc.TDSReader.nextPacket(IOBuffer.java:6396) at 
> com.microsoft.sqlserver.jdbc.TDSReader.ensurePayload(IOBuff

[jira] [Commented] (SPARK-39257) use spark.read.jdbc() to read data from SQL databse into dataframe, it fails silently, when the session is killed from SQL server side

2022-12-01 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641736#comment-17641736
 ] 

Sandeep Katta commented on SPARK-39257:
---

[~xinrantao]  I believe you are facing same issue as this 
[https://github.com/microsoft/mssql-jdbc/issues/1846.]  And this is fixed by 
the *mssql-jdbc* team using the PR 
[1942.|https://github.com/microsoft/mssql-jdbc/pull/1942] which is available in 
the release [12.1.0 
|https://github.com/microsoft/mssql-jdbc/releases/tag/v12.1.0] . You can this 
upgraded jar to solve this issue.

 

> use spark.read.jdbc() to read data from SQL databse into dataframe, it fails 
> silently, when the session is killed from SQL server side
> --
>
> Key: SPARK-39257
> URL: https://issues.apache.org/jira/browse/SPARK-39257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.2, 3.2.1
> Environment: {*}Spark version{*}: spark 3.0.1/3.1.2/3.2.1
> *Microsoft JDBC Driver* *for SQL server:* 
> mssql-jdbc-8.2.1.jre8/mssql-jdbc-10.2.1.jre8.jar
>Reporter: Xinran Tao
>Priority: Major
>
> I'm using *spark.read.jdbc()* to read form SQL database into a dataframe, 
> which utilizes *Microsoft JDBC Driver* *for SQL server* to get data from the 
> SQL server.
> *codes:*
>  
> {code:java}
> %scala
> val token = "xxx"
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = 1433
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="
> import java.util.Properties
> val connectionProperties = new Properties()
> val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> connectionProperties.setProperty("Driver", driverClass)
> connectionProperties.setProperty("accesstoken", token)
> val sql_pushdown = "(select UNITS from payment_balance_new) emp_alias"
> val df_stripe_dispute = spark.read.option("connectRetryCount", 
> 200).option("numPartitions",1).jdbc(url=jdbcUrl, table=sql_pushdown, 
> properties=connectionProperties)
> df_stripe_dispute.count()
> {code}
>  
>  
> The session was accidentally killed by some automatic scripts from SQL server 
> side, but no errors shows up from the spark side, no failure was observed. 
> But from the count() result, the reords are far less than it should be.
>  
> If I'm directly using *Microsoft JDBC Driver* *for SQL server* to run the 
> query and print the data out, which doesn't involve spark, there would be a 
> connection reset error thrown out.
> *codes:*
>  
> {code:java}
> %scala
> import java.sql.DriverManager
> import java.sql.Connection
> import java.util.Properties;
> val jdbcHostname = "xinrandatabseserver.database.windows.net"
> val jdbcDatabase = "xinranSQLDatabase"
> val jdbcPort = "1433"
> val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
> val token = ""
> val jdbcUrl = 
> "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname,
>  jdbcPort, jdbcDatabase)+ ";accessToken="+token
>  
> var connection:Connection = null
> val info:Properties = new Properties();
> info.setProperty("accesstoken", token);
>     
> // make the connection
> Class.forName(driver)
> connection = DriverManager.getConnection(jdbcUrl,info )
> // create the statement, and run the select query
> val statement = connection.createStatement()
> val resultSet = statement.executeQuery("select UNITS from 
> payment_balance_new")
> while ( resultSet.next() ) {
>   println("__"+resultSet.getString(1))
> }
> {code}
>  
> *errors:*
>  
> {code:java}
> com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
> at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2998)
>  at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2034) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6446) at 
> com.microsoft.sqlserver.jdbc.TDSReader.nextPacket(IOBuffer.java:6396) at 
> com.microsoft.sqlserver.jdbc.TDSReader.ensurePayload(IOBuffer.java:6374) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readBytes(IOBuffer.java:6675) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readWrappedBytes(IOBuffer.java:6696) 
> at com.microsoft.sqlserver.jdbc.TDSReader.readInt(IOBuffer.java:6645) at 
> com.microsoft.sqlserver.jdbc.TDSReader.readUnsignedInt(IOBuffer.java:6659) at 
> com.microsoft.sqlserver.jdbc.PLPInputStream.readBytesInternal(PLPInputStream.java:309)
>  at 
> com.microsoft.sqlserver.jdbc.PLPInputStream.getBytes(PL

[jira] [Commented] (SPARK-41235) High-order function: array_compact

2022-11-30 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641315#comment-17641315
 ] 

Sandeep Katta commented on SPARK-41235:
---

I will be working on this

> High-order function: array_compact
> --
>
> Key: SPARK-41235
> URL: https://issues.apache.org/jira/browse/SPARK-41235
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_compact.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38474) Use error classes in org.apache.spark.security

2022-05-26 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542473#comment-17542473
 ] 

Sandeep Katta commented on SPARK-38474:
---

I will start working on this

> Use error classes in org.apache.spark.security
> --
>
> Key: SPARK-38474
> URL: https://issues.apache.org/jira/browse/SPARK-38474
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37585) DSV2 InputMetrics are not getting update in corner case

2021-12-08 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-37585:
-

 Summary: DSV2 InputMetrics are not getting update in corner case
 Key: SPARK-37585
 URL: https://issues.apache.org/jira/browse/SPARK-37585
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2, 3.0.3
Reporter: Sandeep Katta


In some corner cases, DSV2 is not updating the input metrics.

 

This is very special case where the number of records read are less than 1000 
and *hasNext* is not called for last element(cz input.hasNext returns false so 
MetricsIterator.hasNext is not called)

 

hasNext implementation of MetricsIterator

 
{code:java}
override def hasNext: Boolean = {
  if (iter.hasNext) {
true
  } else {
metricsHandler.updateMetrics(0, force = true)
false
  } {code}
 

You reproduce this issue easily in spark-shell by running below code
{code:java}
import scala.collection.mutable
import org.apache.spark.scheduler.{SparkListener, 
SparkListenerTaskEnd}spark.conf.set("spark.sql.sources.useV1SourceList", "")
val dir = "Users/tmp1"
spark.range(0, 100).write.format("parquet").mode("overwrite").save(dir)
val df = spark.read.format("parquet").load(dir)
val bytesReads = new mutable.ArrayBuffer[Long]()
val recordsRead = new mutable.ArrayBuffer[Long]()val bytesReadListener = new 
SparkListener() {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
    bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
    recordsRead += taskEnd.taskMetrics.inputMetrics.recordsRead
  }
}
spark.sparkContext.addSparkListener(bytesReadListener)
try {
df.limit(10).collect()
assert(recordsRead.sum > 0)
assert(bytesReads.sum > 0)
} finally {
spark.sparkContext.removeSparkListener(bytesReadListener)
} {code}
This code generally fails at *assert(bytesReads.sum > 0)* which confirms that 
updateMetrics API is not called

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37578) DSV2 is not updating Output Metrics

2021-12-08 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-37578:
-

 Summary: DSV2 is not updating Output Metrics
 Key: SPARK-37578
 URL: https://issues.apache.org/jira/browse/SPARK-37578
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2, 3.0.3
Reporter: Sandeep Katta


Repro code

./bin/spark-shell --master local  --jars 
/Users/jars/iceberg-spark3-runtime-0.12.1.jar

 
{code:java}

import scala.collection.mutable
import org.apache.spark.scheduler._val bytesWritten = new 
mutable.ArrayBuffer[Long]()
val recordsWritten = new mutable.ArrayBuffer[Long]()
val bytesWrittenListener = new SparkListener() {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
    bytesWritten += taskEnd.taskMetrics.outputMetrics.bytesWritten
    recordsWritten += taskEnd.taskMetrics.outputMetrics.recordsWritten
  }
}
spark.sparkContext.addSparkListener(bytesWrittenListener)
try {
val df = spark.range(1000).toDF("id")
  df.write.format("iceberg").save("Users/data/dsv2_test")
  
assert(bytesWritten.sum > 0)
assert(recordsWritten.sum > 0)
} finally {
  spark.sparkContext.removeSparkListener(bytesWrittenListener)
} {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-35272) org.apache.spark.SparkException: Task not serializable

2021-05-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338301#comment-17338301
 ] 

Sandeep Katta edited comment on SPARK-35272 at 5/3/21, 10:41 AM:
-

That is correct if I am broadcasting *NonSerializable,* but in this case I am 
broadcasting *Student* which is serialized


was (Author: sandeep.katta2007):
That is correct if I am broadcasting *NonSerializable,* but in this case I am 
broadcasting Student which is serialized

> org.apache.spark.SparkException: Task not serializable
> --
>
> Key: SPARK-35272
> URL: https://issues.apache.org/jira/browse/SPARK-35272
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Sandeep Katta
>Priority: Major
> Attachments: ExceptionStack.txt
>
>
> I am getting NotSerializableException when broadcasting Serialized class.
>  
> Minimal code with which you can reproduce this issue
>  
> {code:java}
> case class Student(name: String)
> class NonSerializable() {
>   def getText() : String = {
> """
>   |[
>   |{
>   | "name": "test1"
>   |},
>   |{
>   | "name": "test2"
>   |}
>   |]
>   |""".stripMargin
>   }
> }
> val obj = new NonSerializable()
> val descriptors_string = obj.getText()
> import com.github.plokhotnyuk.jsoniter_scala.macros._
> import com.github.plokhotnyuk.jsoniter_scala.core._
> val parsed_descriptors: Array[Student] =
>   readFromString[Array[Student]](descriptors_string)(JsonCodecMaker.make)
> val broadcast_descriptors = spark.sparkContext.broadcast(parsed_descriptors)
> def foo(data: String): Seq[Any] = {
>   import scala.collection.mutable.ArrayBuffer
>   val res = new ArrayBuffer[String]()
>   for (desc <- broadcast_descriptors.value) {
> res += desc.name
>   }
>   res
> }
> val data = spark.sparkContext.parallelize(Array("1", "2", "3")).map(x => 
> foo(x)).collect()
> {code}
> Command used to start spark-shell
> {code:java}
> ./bin/spark-shell --master local --jars 
> /Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-macros_2.12/2.7.1/jsoniter-scala-macros_2.12-2.7.1.jar,/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-core_2.12/2.7.1/jsoniter-scala-core_2.12-2.7.1.jar
> {code}
> *Exception Details*
> {code:java}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:395)
>   ... 51 elided
> Caused by: java.io.NotSerializableException: NonSerializable
> Serialization stack:
>   - object not serializable (class: NonSerializable, value: 
> NonSerializable@7c95440d)
>   - field (class: $iw, name: obj, type: class NonSerializable)
>   - object (class $iw, $iw@3ed476a8)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@381f7b8e)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@23635e39)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@1a8b3791)
>   - field (class: $iw, name: $iw, type: class $iw)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35272) org.apache.spark.SparkException: Task not serializable

2021-05-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338301#comment-17338301
 ] 

Sandeep Katta commented on SPARK-35272:
---

That is correct if I am broadcasting *NonSerializable,* but in this case I am 
broadcasting Student which is serialized

> org.apache.spark.SparkException: Task not serializable
> --
>
> Key: SPARK-35272
> URL: https://issues.apache.org/jira/browse/SPARK-35272
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Sandeep Katta
>Priority: Major
> Attachments: ExceptionStack.txt
>
>
> I am getting NotSerializableException when broadcasting Serialized class.
>  
> Minimal code with which you can reproduce this issue
>  
> {code:java}
> case class Student(name: String)
> class NonSerializable() {
>   def getText() : String = {
> """
>   |[
>   |{
>   | "name": "test1"
>   |},
>   |{
>   | "name": "test2"
>   |}
>   |]
>   |""".stripMargin
>   }
> }
> val obj = new NonSerializable()
> val descriptors_string = obj.getText()
> import com.github.plokhotnyuk.jsoniter_scala.macros._
> import com.github.plokhotnyuk.jsoniter_scala.core._
> val parsed_descriptors: Array[Student] =
>   readFromString[Array[Student]](descriptors_string)(JsonCodecMaker.make)
> val broadcast_descriptors = spark.sparkContext.broadcast(parsed_descriptors)
> def foo(data: String): Seq[Any] = {
>   import scala.collection.mutable.ArrayBuffer
>   val res = new ArrayBuffer[String]()
>   for (desc <- broadcast_descriptors.value) {
> res += desc.name
>   }
>   res
> }
> val data = spark.sparkContext.parallelize(Array("1", "2", "3")).map(x => 
> foo(x)).collect()
> {code}
> Command used to start spark-shell
> {code:java}
> ./bin/spark-shell --master local --jars 
> /Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-macros_2.12/2.7.1/jsoniter-scala-macros_2.12-2.7.1.jar,/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-core_2.12/2.7.1/jsoniter-scala-core_2.12-2.7.1.jar
> {code}
> *Exception Details*
> {code:java}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:395)
>   ... 51 elided
> Caused by: java.io.NotSerializableException: NonSerializable
> Serialization stack:
>   - object not serializable (class: NonSerializable, value: 
> NonSerializable@7c95440d)
>   - field (class: $iw, name: obj, type: class NonSerializable)
>   - object (class $iw, $iw@3ed476a8)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@381f7b8e)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@23635e39)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@1a8b3791)
>   - field (class: $iw, name: $iw, type: class $iw)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-35272) org.apache.spark.SparkException: Task not serializable

2021-05-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338297#comment-17338297
 ] 

Sandeep Katta edited comment on SPARK-35272 at 5/3/21, 10:30 AM:
-

This is the minimal code with which we can reproduce this issue, in reality 
this *NonSerializable* class contains objects to 3rd party library which cannot 
be *serialized*

 

This issue can also be solved by using *trasient* keyword like below,

 
{code:java}
@transient val obj = new NonSerializable()
val descriptors_string = obj.getText()
{code}
 

But this does not makes sense as I am broadcasting the *Seq[Student]* not 
*NonSerializable*


was (Author: sandeep.katta2007):
This is the minimal code with which we can reproduce this issue, in reality 
this *NonSerializable* class contains objects to 3rd party library which cannot 
be *serialized*

> org.apache.spark.SparkException: Task not serializable
> --
>
> Key: SPARK-35272
> URL: https://issues.apache.org/jira/browse/SPARK-35272
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Sandeep Katta
>Priority: Major
> Attachments: ExceptionStack.txt
>
>
> I am getting NotSerializableException when broadcasting Serialized class.
>  
> Minimal code with which you can reproduce this issue
>  
> {code:java}
> case class Student(name: String)
> class NonSerializable() {
>   def getText() : String = {
> """
>   |[
>   |{
>   | "name": "test1"
>   |},
>   |{
>   | "name": "test2"
>   |}
>   |]
>   |""".stripMargin
>   }
> }
> val obj = new NonSerializable()
> val descriptors_string = obj.getText()
> import com.github.plokhotnyuk.jsoniter_scala.macros._
> import com.github.plokhotnyuk.jsoniter_scala.core._
> val parsed_descriptors: Array[Student] =
>   readFromString[Array[Student]](descriptors_string)(JsonCodecMaker.make)
> val broadcast_descriptors = spark.sparkContext.broadcast(parsed_descriptors)
> def foo(data: String): Seq[Any] = {
>   import scala.collection.mutable.ArrayBuffer
>   val res = new ArrayBuffer[String]()
>   for (desc <- broadcast_descriptors.value) {
> res += desc.name
>   }
>   res
> }
> val data = spark.sparkContext.parallelize(Array("1", "2", "3")).map(x => 
> foo(x)).collect()
> {code}
> Command used to start spark-shell
> {code:java}
> ./bin/spark-shell --master local --jars 
> /Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-macros_2.12/2.7.1/jsoniter-scala-macros_2.12-2.7.1.jar,/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-core_2.12/2.7.1/jsoniter-scala-core_2.12-2.7.1.jar
> {code}
> *Exception Details*
> {code:java}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:395)
>   ... 51 elided
> Caused by: java.io.NotSerializableException: NonSerializable
> Serialization stack:
>   - object not serializable (class: NonSerializable, value: 
> NonSerializable@7c95440d)
>   - field (class: $iw, name: obj, type: class NonSerializable)
>   - object (class $iw, $iw@3ed476a8)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@381f7b8e)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@23635e39)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@1a8b3791)
>   - field (class: $iw, name: $iw, type: class $iw)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35272) org.apache.spark.SparkException: Task not serializable

2021-05-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338297#comment-17338297
 ] 

Sandeep Katta commented on SPARK-35272:
---

This is the minimal code with which we can reproduce this issue, in reality 
this *NonSerializable* class contains objects to 3rd party library which cannot 
be *serialized*

> org.apache.spark.SparkException: Task not serializable
> --
>
> Key: SPARK-35272
> URL: https://issues.apache.org/jira/browse/SPARK-35272
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Sandeep Katta
>Priority: Major
> Attachments: ExceptionStack.txt
>
>
> I am getting NotSerializableException when broadcasting Serialized class.
>  
> Minimal code with which you can reproduce this issue
>  
> {code:java}
> case class Student(name: String)
> class NonSerializable() {
>   def getText() : String = {
> """
>   |[
>   |{
>   | "name": "test1"
>   |},
>   |{
>   | "name": "test2"
>   |}
>   |]
>   |""".stripMargin
>   }
> }
> val obj = new NonSerializable()
> val descriptors_string = obj.getText()
> import com.github.plokhotnyuk.jsoniter_scala.macros._
> import com.github.plokhotnyuk.jsoniter_scala.core._
> val parsed_descriptors: Array[Student] =
>   readFromString[Array[Student]](descriptors_string)(JsonCodecMaker.make)
> val broadcast_descriptors = spark.sparkContext.broadcast(parsed_descriptors)
> def foo(data: String): Seq[Any] = {
>   import scala.collection.mutable.ArrayBuffer
>   val res = new ArrayBuffer[String]()
>   for (desc <- broadcast_descriptors.value) {
> res += desc.name
>   }
>   res
> }
> val data = spark.sparkContext.parallelize(Array("1", "2", "3")).map(x => 
> foo(x)).collect()
> {code}
> Command used to start spark-shell
> {code:java}
> ./bin/spark-shell --master local --jars 
> /Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-macros_2.12/2.7.1/jsoniter-scala-macros_2.12-2.7.1.jar,/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-core_2.12/2.7.1/jsoniter-scala-core_2.12-2.7.1.jar
> {code}
> *Exception Details*
> {code:java}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:395)
>   ... 51 elided
> Caused by: java.io.NotSerializableException: NonSerializable
> Serialization stack:
>   - object not serializable (class: NonSerializable, value: 
> NonSerializable@7c95440d)
>   - field (class: $iw, name: obj, type: class NonSerializable)
>   - object (class $iw, $iw@3ed476a8)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@381f7b8e)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@23635e39)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@1a8b3791)
>   - field (class: $iw, name: $iw, type: class $iw)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35272) org.apache.spark.SparkException: Task not serializable

2021-04-29 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-35272:
-

 Summary: org.apache.spark.SparkException: Task not serializable
 Key: SPARK-35272
 URL: https://issues.apache.org/jira/browse/SPARK-35272
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Sandeep Katta
 Attachments: ExceptionStack.txt

I am getting NotSerializableException when broadcasting Serialized class.

 

Minimal code with which you can reproduce this issue

 
{code:java}
case class Student(name: String)

class NonSerializable() {

  def getText() : String = {
"""
  |[
  |{
  | "name": "test1"
  |},
  |{
  | "name": "test2"
  |}
  |]
  |""".stripMargin
  }
}

val obj = new NonSerializable()
val descriptors_string = obj.getText()


import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._

val parsed_descriptors: Array[Student] =
  readFromString[Array[Student]](descriptors_string)(JsonCodecMaker.make)

val broadcast_descriptors = spark.sparkContext.broadcast(parsed_descriptors)

def foo(data: String): Seq[Any] = {

  import scala.collection.mutable.ArrayBuffer
  val res = new ArrayBuffer[String]()
  for (desc <- broadcast_descriptors.value) {
res += desc.name
  }
  res
}

val data = spark.sparkContext.parallelize(Array("1", "2", "3")).map(x => 
foo(x)).collect()
{code}
Command used to start spark-shell
{code:java}
./bin/spark-shell --master local --jars 
/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-macros_2.12/2.7.1/jsoniter-scala-macros_2.12-2.7.1.jar,/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-core_2.12/2.7.1/jsoniter-scala-core_2.12-2.7.1.jar
{code}
*Exception Details*
{code:java}
org.apache.spark.SparkException: Task not serializable
  at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
  at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
  at org.apache.spark.rdd.RDD.map(RDD.scala:395)
  ... 51 elided
Caused by: java.io.NotSerializableException: NonSerializable
Serialization stack:
- object not serializable (class: NonSerializable, value: 
NonSerializable@7c95440d)
- field (class: $iw, name: obj, type: class NonSerializable)
- object (class $iw, $iw@3ed476a8)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@381f7b8e)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@23635e39)
- field (class: $iw, name: $iw, type: class $iw)
- object (class $iw, $iw@1a8b3791)
- field (class: $iw, name: $iw, type: class $iw)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35272) org.apache.spark.SparkException: Task not serializable

2021-04-29 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-35272:
--
Attachment: ExceptionStack.txt

> org.apache.spark.SparkException: Task not serializable
> --
>
> Key: SPARK-35272
> URL: https://issues.apache.org/jira/browse/SPARK-35272
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Sandeep Katta
>Priority: Major
> Attachments: ExceptionStack.txt
>
>
> I am getting NotSerializableException when broadcasting Serialized class.
>  
> Minimal code with which you can reproduce this issue
>  
> {code:java}
> case class Student(name: String)
> class NonSerializable() {
>   def getText() : String = {
> """
>   |[
>   |{
>   | "name": "test1"
>   |},
>   |{
>   | "name": "test2"
>   |}
>   |]
>   |""".stripMargin
>   }
> }
> val obj = new NonSerializable()
> val descriptors_string = obj.getText()
> import com.github.plokhotnyuk.jsoniter_scala.macros._
> import com.github.plokhotnyuk.jsoniter_scala.core._
> val parsed_descriptors: Array[Student] =
>   readFromString[Array[Student]](descriptors_string)(JsonCodecMaker.make)
> val broadcast_descriptors = spark.sparkContext.broadcast(parsed_descriptors)
> def foo(data: String): Seq[Any] = {
>   import scala.collection.mutable.ArrayBuffer
>   val res = new ArrayBuffer[String]()
>   for (desc <- broadcast_descriptors.value) {
> res += desc.name
>   }
>   res
> }
> val data = spark.sparkContext.parallelize(Array("1", "2", "3")).map(x => 
> foo(x)).collect()
> {code}
> Command used to start spark-shell
> {code:java}
> ./bin/spark-shell --master local --jars 
> /Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-macros_2.12/2.7.1/jsoniter-scala-macros_2.12-2.7.1.jar,/Users/sandeep.katta/.m2/repository/com/github/plokhotnyuk/jsoniter-scala/jsoniter-scala-core_2.12/2.7.1/jsoniter-scala-core_2.12-2.7.1.jar
> {code}
> *Exception Details*
> {code:java}
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:395)
>   ... 51 elided
> Caused by: java.io.NotSerializableException: NonSerializable
> Serialization stack:
>   - object not serializable (class: NonSerializable, value: 
> NonSerializable@7c95440d)
>   - field (class: $iw, name: obj, type: class NonSerializable)
>   - object (class $iw, $iw@3ed476a8)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@381f7b8e)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@23635e39)
>   - field (class: $iw, name: $iw, type: class $iw)
>   - object (class $iw, $iw@1a8b3791)
>   - field (class: $iw, name: $iw, type: class $iw)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35096) foreachBatch throws ArrayIndexOutOfBoundsException if schema is case Insensitive

2021-04-15 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322360#comment-17322360
 ] 

Sandeep Katta commented on SPARK-35096:
---

Working on fix, soon raise  PR for this

> foreachBatch throws ArrayIndexOutOfBoundsException if schema is case 
> Insensitive
> 
>
> Key: SPARK-35096
> URL: https://issues.apache.org/jira/browse/SPARK-35096
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>
> Below code works fine before spark3, running on spark3 throws 
> java.lang.ArrayIndexOutOfBoundsException
> {code:java}
> val inputPath = "/Users/xyz/data/testcaseInsensitivity"
> val output_path = "/Users/xyz/output"
> spark.range(10).write.format("parquet").save(inputPath)
> def process_row(microBatch: DataFrame, batchId: Long): Unit = {
>   val df = microBatch.select($"ID".alias("other")) // Doesn't work
>   df.write.format("parquet").mode("append").save(output_path)
> }
> val schema = new StructType().add("id", LongType)
> val stream_df = 
> spark.readStream.schema(schema).format("parquet").load(inputPath)
> stream_df.writeStream.trigger(Trigger.Once).foreachBatch(process_row _)
>   .start().awaitTermination()
> {code}
> Stack Trace:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at org.apache.spark.sql.types.StructType.apply(StructType.scala:414)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.$anonfun$applyOrElse$3(objects.scala:216)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:215)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:203)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$.apply(objects.scala:203)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$.apply(objects.scala:121)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
>   at 
> scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
>   at 
> scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
>   at scala.collection.mutable.Wrap

[jira] [Created] (SPARK-35096) foreachBatch throws ArrayIndexOutOfBoundsException if schema is case Insensitive

2021-04-15 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-35096:
-

 Summary: foreachBatch throws ArrayIndexOutOfBoundsException if 
schema is case Insensitive
 Key: SPARK-35096
 URL: https://issues.apache.org/jira/browse/SPARK-35096
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Sandeep Katta


Below code works fine before spark3, running on spark3 throws 

java.lang.ArrayIndexOutOfBoundsException
{code:java}
val inputPath = "/Users/xyz/data/testcaseInsensitivity"
val output_path = "/Users/xyz/output"

spark.range(10).write.format("parquet").save(inputPath)

def process_row(microBatch: DataFrame, batchId: Long): Unit = {
  val df = microBatch.select($"ID".alias("other")) // Doesn't work
  df.write.format("parquet").mode("append").save(output_path)

}

val schema = new StructType().add("id", LongType)

val stream_df = 
spark.readStream.schema(schema).format("parquet").load(inputPath)
stream_df.writeStream.trigger(Trigger.Once).foreachBatch(process_row _)
  .start().awaitTermination()
{code}
Stack Trace:
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
  at org.apache.spark.sql.types.StructType.apply(StructType.scala:414)
  at 
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.$anonfun$applyOrElse$3(objects.scala:216)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at 
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:215)
  at 
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:203)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:149)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:147)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298)
  at 
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$.apply(objects.scala:203)
  at 
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$.apply(objects.scala:121)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
  at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
  at 
scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:38)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:146)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:138)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:138)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.sca

[jira] [Comment Edited] (SPARK-21564) TaskDescription decoding failure should fail the task

2021-03-02 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294347#comment-17294347
 ] 

Sandeep Katta edited comment on SPARK-21564 at 3/3/21, 7:47 AM:


Recently I have hit with decode error, irony is all the tasks in the same 
taskset were able to deserialize but only 1 task is failed .

Which says that the data is corrupted, most of the times it will be very 
difficult to analyze why the data is corrupted , so for these kind of 
intermittent issue exception handling should be in place to achieve fault 
tolerant

 
{code:java}
21/02/11 07:53:39 ERROR Inbox: Ignoring errorjava.io.UTFDataFormatException: 
malformed input around byte 5 at 
java.io.DataInputStream.readUTF(DataInputStream.java:656) at 
java.io.DataInputStream.readUTF(DataInputStream.java:564) at 
org.apache.spark.scheduler.TaskDescription$$anonfun$deserializeStringLongMap$1.apply$mcVI$sp(TaskDescription.scala:110)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
org.apache.spark.scheduler.TaskDescription$.deserializeStringLongMap(TaskDescription.scala:109)
 at 
org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:125) 
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:100)
 at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
 at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at 
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:226) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748){code}
 

!image-2021-03-03-13-02-31-744.png!

 

So it's better to have fault tolerant in place, Spark Driver does not have any 
idea about this exception so it still waits for this task to complete, thus the 
job is in zombie stage

 

CC [~dongjoon] [~hyukjin.kwon] [~cloud_fan] tagging you guys for more traction


was (Author: sandeep.katta2007):
Recently I have hit with decode error, irony is all the tasks in the same 
taskset were able to deserialize but only 1 task is failed .

Which says that the data is corrupted, most of the times it will be very 
difficult to analyze why the data is corrupted , so for these kind of 
intermittent issue exception handling should be in place to achieve fault 
tolerant

 

*21/02/11 07:53:39 ERROR Inbox: Ignoring errorjava.io.UTFDataFormatException: 
malformed input around byte 5 at* 
java.io.DataInputStream.readUTF(DataInputStream.java:656) at 
java.io.DataInputStream.readUTF(DataInputStream.java:564) at 
org.apache.spark.scheduler.TaskDescription$$anonfun$deserializeStringLongMap$1.apply$mcVI$sp(TaskDescription.scala:110)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
org.apache.spark.scheduler.TaskDescription$.deserializeStringLongMap(TaskDescription.scala:109)
 at 
org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:125) 
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:100)
 at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
 at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at 
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:226) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

 

!image-2021-03-03-13-02-31-744.png!

 

So it's better to have fault tolerant in place, Spark Driver does not have any 
idea about this exception so it still waits for this task to complete, thus the 
job is in zombie stage

 

CC [~dongjoon] [~hyukjin.kwon] [~cloud_fan] tagging you guys for more traction

> TaskDescription decoding failure should fail the task
> -
>
> Key: SPARK-21564
> URL: https://issues.apache.org/jira/browse/SPARK-21564
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0, 2.4.7, 3.1.1
>Reporter: Andrew Ash
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2021-03-03-13-02-06-669.png, 
> image-2021-03-03-13-02-31-744.png
>
>
> cc [~robert3005]
> I was seeing an issue where Spark was throwing this exception:
> {noformat}
> 16:16:28.294 [dispatcher-event-loop-14] ERROR 
> org.apache.spark.rpc.netty.Inbox - Ignoring error
> java.io.EOFException: null
> at

[jira] [Updated] (SPARK-21564) TaskDescription decoding failure should fail the task

2021-03-02 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-21564:
--
Affects Version/s: 2.4.7
   3.1.1

> TaskDescription decoding failure should fail the task
> -
>
> Key: SPARK-21564
> URL: https://issues.apache.org/jira/browse/SPARK-21564
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0, 2.4.7, 3.1.1
>Reporter: Andrew Ash
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2021-03-03-13-02-06-669.png, 
> image-2021-03-03-13-02-31-744.png
>
>
> cc [~robert3005]
> I was seeing an issue where Spark was throwing this exception:
> {noformat}
> 16:16:28.294 [dispatcher-event-loop-14] ERROR 
> org.apache.spark.rpc.netty.Inbox - Ignoring error
> java.io.EOFException: null
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readUTF(DataInputStream.java:609)
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:127)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:126)
> at scala.collection.immutable.Range.foreach(Range.scala:160)
> at 
> org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:126)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:95)
> at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
> at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> For details on the cause of that exception, see SPARK-21563
> We've since changed the application and have a proposed fix in Spark at the 
> ticket above, but it was troubling that decoding the TaskDescription wasn't 
> failing the tasks.  So the Spark job ended up hanging and making no progress, 
> permanently stuck, because the driver thinks the task is running but the 
> thread has died in the executor.
> We should make a change around 
> https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L96
>  so that when that decode throws an exception, the task is marked as failed.
> cc [~kayousterhout] [~irashid]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21564) TaskDescription decoding failure should fail the task

2021-03-02 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294347#comment-17294347
 ] 

Sandeep Katta commented on SPARK-21564:
---

Recently I have hit with decode error, irony is all the tasks in the same 
taskset were able to deserialize but only 1 task is failed .

Which says that the data is corrupted, most of the times it will be very 
difficult to analyze why the data is corrupted , so for these kind of 
intermittent issue exception handling should be in place to achieve fault 
tolerant

 

*21/02/11 07:53:39 ERROR Inbox: Ignoring errorjava.io.UTFDataFormatException: 
malformed input around byte 5 at* 
java.io.DataInputStream.readUTF(DataInputStream.java:656) at 
java.io.DataInputStream.readUTF(DataInputStream.java:564) at 
org.apache.spark.scheduler.TaskDescription$$anonfun$deserializeStringLongMap$1.apply$mcVI$sp(TaskDescription.scala:110)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
org.apache.spark.scheduler.TaskDescription$.deserializeStringLongMap(TaskDescription.scala:109)
 at 
org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:125) 
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:100)
 at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
 at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at 
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:226) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

 

!image-2021-03-03-13-02-31-744.png!

 

So it's better to have fault tolerant in place, Spark Driver does not have any 
idea about this exception so it still waits for this task to complete, thus the 
job is in zombie stage

 

CC [~dongjoon] [~hyukjin.kwon] [~cloud_fan] tagging you guys for more traction

> TaskDescription decoding failure should fail the task
> -
>
> Key: SPARK-21564
> URL: https://issues.apache.org/jira/browse/SPARK-21564
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Andrew Ash
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2021-03-03-13-02-06-669.png, 
> image-2021-03-03-13-02-31-744.png
>
>
> cc [~robert3005]
> I was seeing an issue where Spark was throwing this exception:
> {noformat}
> 16:16:28.294 [dispatcher-event-loop-14] ERROR 
> org.apache.spark.rpc.netty.Inbox - Ignoring error
> java.io.EOFException: null
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readUTF(DataInputStream.java:609)
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:127)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:126)
> at scala.collection.immutable.Range.foreach(Range.scala:160)
> at 
> org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:126)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:95)
> at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
> at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> For details on the cause of that exception, see SPARK-21563
> We've since changed the application and have a proposed fix in Spark at the 
> ticket above, but it was troubling that decoding the TaskDescription wasn't 
> failing the tasks.  So the Spark job ended up hanging and making no progress, 
> permanently stuck, because the driver thinks the task is running but the 
> thread has died in the executor.
> We should make a change around 
> https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L96
>  so that when that decode throws an exception, the task is marked as failed.
> cc [~kayousterhout] [~irashid]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (SPARK-21564) TaskDescription decoding failure should fail the task

2021-03-02 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-21564:
--
Attachment: image-2021-03-03-13-02-31-744.png

> TaskDescription decoding failure should fail the task
> -
>
> Key: SPARK-21564
> URL: https://issues.apache.org/jira/browse/SPARK-21564
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Andrew Ash
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2021-03-03-13-02-06-669.png, 
> image-2021-03-03-13-02-31-744.png
>
>
> cc [~robert3005]
> I was seeing an issue where Spark was throwing this exception:
> {noformat}
> 16:16:28.294 [dispatcher-event-loop-14] ERROR 
> org.apache.spark.rpc.netty.Inbox - Ignoring error
> java.io.EOFException: null
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readUTF(DataInputStream.java:609)
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:127)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:126)
> at scala.collection.immutable.Range.foreach(Range.scala:160)
> at 
> org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:126)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:95)
> at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
> at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> For details on the cause of that exception, see SPARK-21563
> We've since changed the application and have a proposed fix in Spark at the 
> ticket above, but it was troubling that decoding the TaskDescription wasn't 
> failing the tasks.  So the Spark job ended up hanging and making no progress, 
> permanently stuck, because the driver thinks the task is running but the 
> thread has died in the executor.
> We should make a change around 
> https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L96
>  so that when that decode throws an exception, the task is marked as failed.
> cc [~kayousterhout] [~irashid]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21564) TaskDescription decoding failure should fail the task

2021-03-02 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-21564:
--
Attachment: image-2021-03-03-13-02-06-669.png

> TaskDescription decoding failure should fail the task
> -
>
> Key: SPARK-21564
> URL: https://issues.apache.org/jira/browse/SPARK-21564
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Andrew Ash
>Priority: Major
>  Labels: bulk-closed
> Attachments: image-2021-03-03-13-02-06-669.png, 
> image-2021-03-03-13-02-31-744.png
>
>
> cc [~robert3005]
> I was seeing an issue where Spark was throwing this exception:
> {noformat}
> 16:16:28.294 [dispatcher-event-loop-14] ERROR 
> org.apache.spark.rpc.netty.Inbox - Ignoring error
> java.io.EOFException: null
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readUTF(DataInputStream.java:609)
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:127)
> at 
> org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:126)
> at scala.collection.immutable.Range.foreach(Range.scala:160)
> at 
> org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:126)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:95)
> at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
> at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> For details on the cause of that exception, see SPARK-21563
> We've since changed the application and have a proposed fix in Spark at the 
> ticket above, but it was troubling that decoding the TaskDescription wasn't 
> failing the tasks.  So the Spark job ended up hanging and making no progress, 
> permanently stuck, because the driver thinks the task is running but the 
> thread has died in the executor.
> We should make a change around 
> https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L96
>  so that when that decode throws an exception, the task is marked as failed.
> cc [~kayousterhout] [~irashid]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32779) Spark/Hive3 interaction potentially causes deadlock

2020-09-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190626#comment-17190626
 ] 

Sandeep Katta commented on SPARK-32779:
---

[~bersprockets] thanks for raising this issue, I am working on this. I will 
raise the patch soon

> Spark/Hive3 interaction potentially causes deadlock
> ---
>
> Key: SPARK-32779
> URL: https://issues.apache.org/jira/browse/SPARK-32779
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Bruce Robbins
>Priority: Major
>
> This is an issue for applications that share a Spark Session across multiple 
> threads.
> sessionCatalog.loadPartition (after checking that the table exists) grabs 
> locks in this order:
>  - HiveExternalCatalog
>  - HiveSessionCatalog (in Shim_v3_0)
> Other operations (e.g., sessionCatalog.tableExists), grab locks in this order:
>  - HiveSessionCatalog
>  - HiveExternalCatalog
> [This|https://github.com/apache/spark/blob/ad6b887541bf90cc3ea830a1a3322b71ccdd80ee/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L1332]
>  appears to be the culprit. Maybe db name should be defaulted _before_ the 
> call to HiveClient so that Shim_v3_0 doesn't have to call back into 
> SessionCatalog. Or possibly this is not needed at all, since loadPartition in 
> Shim_v2_1 doesn't worry about the default db name, but that might be because 
> of differences between Hive client libraries.
> Reproduction case:
>  - You need to have a running Hive 3.x HMS instance and the appropriate 
> hive-site.xml for your Spark instance
>  - Adjust your spark.sql.hive.metastore.version accordingly
>  - It might take more than one try to hit the deadlock
> Launch Spark:
> {noformat}
> bin/spark-shell --conf "spark.sql.hive.metastore.jars=${HIVE_HOME}/lib/*" 
> --conf spark.sql.hive.metastore.version=3.1
> {noformat}
> Then use the following code:
> {noformat}
> import scala.collection.mutable.ArrayBuffer
> import scala.util.Random
> val tableCount = 4
> for (i <- 0 until tableCount) {
>   val tableName = s"partitioned${i+1}"
>   sql(s"drop table if exists $tableName")
>   sql(s"create table $tableName (a bigint) partitioned by (b bigint) stored 
> as orc")
> }
> val threads = new ArrayBuffer[Thread]
> for (i <- 0 until tableCount) {
>   threads.append(new Thread( new Runnable {
> override def run: Unit = {
>   val tableName = s"partitioned${i + 1}"
>   val rand = Random
>   val df = spark.range(0, 2).toDF("a")
>   val location = s"/tmp/${rand.nextLong.abs}"
>   df.write.mode("overwrite").orc(location)
>   sql(
> s"""
> LOAD DATA LOCAL INPATH '$location' INTO TABLE $tableName partition 
> (b=$i)""")
> }
>   }, s"worker$i"))
>   threads(i).start()
> }
> for (i <- 0 until tableCount) {
>   println(s"Joining with thread $i")
>   threads(i).join()
> }
> println("All done")
> {noformat}
> The job often gets stuck after one or two "Joining..." lines.
> {{kill -3}} shows something like this:
> {noformat}
> Found one Java-level deadlock:
> =
> "worker3":
>   waiting to lock monitor 0x7fdc3cde6798 (object 0x000784d98ac8, a 
> org.apache.spark.sql.hive.HiveSessionCatalog),
>   which is held by "worker0"
> "worker0":
>   waiting to lock monitor 0x7fdc441d1b88 (object 0x0007861d1208, a 
> org.apache.spark.sql.hive.HiveExternalCatalog),
>   which is held by "worker3"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32764) compare of -0.0 < 0.0 return true

2020-09-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190333#comment-17190333
 ] 

Sandeep Katta commented on SPARK-32764:
---

So it's edge case scenario, how should we fix it or we can leave it as 
limitation ?

> compare of -0.0 < 0.0 return true
> -
>
> Key: SPARK-32764
> URL: https://issues.apache.org/jira/browse/SPARK-32764
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Izek Greenfield
>Priority: Major
>  Labels: correctness
> Attachments: 2.4_codegen.txt, 3.0_Codegen.txt
>
>
> {code:scala}
>  val spark: SparkSession = SparkSession
>   .builder()
>   .master("local")
>   .appName("SparkByExamples.com")
>   .getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.sqlContext.implicits._
> val df = Seq((-0.0, 0.0)).toDF("neg", "pos")
>   .withColumn("comp", col("neg") < col("pos"))
>   df.show(false)
> ==
> ++---++
> |neg |pos|comp|
> ++---++
> |-0.0|0.0|true|
> ++---++{code}
> I think that result should be false.
> **Apache Spark 2.4.6 RESULT**
> {code}
> scala> spark.version
> res0: String = 2.4.6
> scala> Seq((-0.0, 0.0)).toDF("neg", "pos").withColumn("comp", col("neg") < 
> col("pos")).show
> ++---+-+
> | neg|pos| comp|
> ++---+-+
> |-0.0|0.0|false|
> ++---+-+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-32764) compare of -0.0 < 0.0 return true

2020-09-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190331#comment-17190331
 ] 

Sandeep Katta edited comment on SPARK-32764 at 9/3/20, 5:47 PM:


[~cloud_fan] and [~smilegator]

I did some analysis on this, it is because as a part of SPARK-30009 
org.apache.spark.util.Utils.nanSafeCompareDoubles is replaced with 
java.lang.Double.compare. Same can be confirmed from the codegen code, I have 
attached those for referrence.

 

If you see the implementation of java.lang.Double.compare
 
*java.lang.Double.compare(-0.0, 0.0) < 0 evaluates to true*

*java.lang.Double.compare(0.0, -0.0) < 0 evaluates to false*

 


was (Author: sandeep.katta2007):
[~cloud_fan] and [~smilegator]

I did some analysis on this, it is because as a part of SPARK-30009 
org.apache.spark.util.Utils.nanSafeCompareDoubles is replaced with 
java.lang.Double.compare. Same can be confirmed from the codegen code, I have 
attached those for referrence.

 

If you see the implementation of java.lang.Double.compare
 
*java.lang.Double.compare(-0.0, 0.0) < 0 evaluates to true*

*java.lang.Double.compare(0.0, -0.0) < 0 evaluates to false*

 

> compare of -0.0 < 0.0 return true
> -
>
> Key: SPARK-32764
> URL: https://issues.apache.org/jira/browse/SPARK-32764
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Izek Greenfield
>Priority: Major
>  Labels: correctness
> Attachments: 2.4_codegen.txt, 3.0_Codegen.txt
>
>
> {code:scala}
>  val spark: SparkSession = SparkSession
>   .builder()
>   .master("local")
>   .appName("SparkByExamples.com")
>   .getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.sqlContext.implicits._
> val df = Seq((-0.0, 0.0)).toDF("neg", "pos")
>   .withColumn("comp", col("neg") < col("pos"))
>   df.show(false)
> ==
> ++---++
> |neg |pos|comp|
> ++---++
> |-0.0|0.0|true|
> ++---++{code}
> I think that result should be false.
> **Apache Spark 2.4.6 RESULT**
> {code}
> scala> spark.version
> res0: String = 2.4.6
> scala> Seq((-0.0, 0.0)).toDF("neg", "pos").withColumn("comp", col("neg") < 
> col("pos")).show
> ++---+-+
> | neg|pos| comp|
> ++---+-+
> |-0.0|0.0|false|
> ++---+-+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-32764) compare of -0.0 < 0.0 return true

2020-09-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190331#comment-17190331
 ] 

Sandeep Katta edited comment on SPARK-32764 at 9/3/20, 5:45 PM:


[~cloud_fan] and [~smilegator]

I did some analysis on this, it is because as a part of SPARK-30009 
org.apache.spark.util.Utils.nanSafeCompareDoubles is replaced with 
java.lang.Double.compare. Same can be confirmed from the codegen code, I have 
attached those for referrence.

 

If you see the implementation of java.lang.Double.compare
 
*java.lang.Double.compare(-0.0, 0.0) < 0 evaluates to true*

*java.lang.Double.compare(0.0, -0.0) < 0 evaluates to false*

 


was (Author: sandeep.katta2007):
[~cloud_fan] and [~smilegator]

I did some analysis on this, it is because as a part of SPARK-30009 
org.apache.spark.util.Utils.nanSafeCompareDoubles is replaced with 
java.lang.Double.compare. Same can be confirmed from the codegen code 

> compare of -0.0 < 0.0 return true
> -
>
> Key: SPARK-32764
> URL: https://issues.apache.org/jira/browse/SPARK-32764
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Izek Greenfield
>Priority: Major
>  Labels: correctness
>
> {code:scala}
>  val spark: SparkSession = SparkSession
>   .builder()
>   .master("local")
>   .appName("SparkByExamples.com")
>   .getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.sqlContext.implicits._
> val df = Seq((-0.0, 0.0)).toDF("neg", "pos")
>   .withColumn("comp", col("neg") < col("pos"))
>   df.show(false)
> ==
> ++---++
> |neg |pos|comp|
> ++---++
> |-0.0|0.0|true|
> ++---++{code}
> I think that result should be false.
> **Apache Spark 2.4.6 RESULT**
> {code}
> scala> spark.version
> res0: String = 2.4.6
> scala> Seq((-0.0, 0.0)).toDF("neg", "pos").withColumn("comp", col("neg") < 
> col("pos")).show
> ++---+-+
> | neg|pos| comp|
> ++---+-+
> |-0.0|0.0|false|
> ++---+-+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32764) compare of -0.0 < 0.0 return true

2020-09-03 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190331#comment-17190331
 ] 

Sandeep Katta commented on SPARK-32764:
---

[~cloud_fan] and [~smilegator]

I did some analysis on this, it is because as a part of SPARK-30009 
org.apache.spark.util.Utils.nanSafeCompareDoubles is replaced with 
java.lang.Double.compare. Same can be confirmed from the codegen code 

> compare of -0.0 < 0.0 return true
> -
>
> Key: SPARK-32764
> URL: https://issues.apache.org/jira/browse/SPARK-32764
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Izek Greenfield
>Priority: Major
>  Labels: correctness
>
> {code:scala}
>  val spark: SparkSession = SparkSession
>   .builder()
>   .master("local")
>   .appName("SparkByExamples.com")
>   .getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.sqlContext.implicits._
> val df = Seq((-0.0, 0.0)).toDF("neg", "pos")
>   .withColumn("comp", col("neg") < col("pos"))
>   df.show(false)
> ==
> ++---++
> |neg |pos|comp|
> ++---++
> |-0.0|0.0|true|
> ++---++{code}
> I think that result should be false.
> **Apache Spark 2.4.6 RESULT**
> {code}
> scala> spark.version
> res0: String = 2.4.6
> scala> Seq((-0.0, 0.0)).toDF("neg", "pos").withColumn("comp", col("neg") < 
> col("pos")).show
> ++---+-+
> | neg|pos| comp|
> ++---+-+
> |-0.0|0.0|false|
> ++---+-+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29314) ProgressReporter.extractStateOperatorMetrics should not overwrite updated as 0 when it actually runs a batch even with no data

2020-07-29 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167644#comment-17167644
 ] 

Sandeep Katta commented on SPARK-29314:
---

ya my bad, not required to backport

> ProgressReporter.extractStateOperatorMetrics should not overwrite updated as 
> 0 when it actually runs a batch even with no data
> --
>
> Key: SPARK-29314
> URL: https://issues.apache.org/jira/browse/SPARK-29314
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-24156 brought the ability to run a batch without actual data to enable 
> fast state cleanup as well as emit evicted outputs without waiting actual 
> data to come.
> This breaks some assumption on 
> `ProgressReporter.extractStateOperatorMetrics`. See comment in source code:
> {code:java}
> // lastExecution could belong to one of the previous triggers if 
> `!hasNewData`.
> // Walking the plan again should be inexpensive.
> {code}
> and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense 
> if we copy progress from previous execution (which means no batch is run for 
> this time), but after SPARK-24156 the precondition is broken. 
> Spark should still replace the value of newNumRowsUpdated with 0 if there's 
> no batch being run and it needs to copy the old value from previous 
> execution, but it shouldn't touch the value if it runs a batch for no data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29314) ProgressReporter.extractStateOperatorMetrics should not overwrite updated as 0 when it actually runs a batch even with no data

2020-07-29 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167243#comment-17167243
 ] 

Sandeep Katta commented on SPARK-29314:
---

[~kabhwan] [~brkyvz] this is required to backport to 2.4 branch

> ProgressReporter.extractStateOperatorMetrics should not overwrite updated as 
> 0 when it actually runs a batch even with no data
> --
>
> Key: SPARK-29314
> URL: https://issues.apache.org/jira/browse/SPARK-29314
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-24156 brought the ability to run a batch without actual data to enable 
> fast state cleanup as well as emit evicted outputs without waiting actual 
> data to come.
> This breaks some assumption on 
> `ProgressReporter.extractStateOperatorMetrics`. See comment in source code:
> {code:java}
> // lastExecution could belong to one of the previous triggers if 
> `!hasNewData`.
> // Walking the plan again should be inexpensive.
> {code}
> and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense 
> if we copy progress from previous execution (which means no batch is run for 
> this time), but after SPARK-24156 the precondition is broken. 
> Spark should still replace the value of newNumRowsUpdated with 0 if there's 
> no batch being run and it needs to copy the old value from previous 
> execution, but it shouldn't touch the value if it runs a batch for no data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2020-05-27 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117759#comment-17117759
 ] 

Sandeep Katta commented on SPARK-28054:
---

[~hyukjin.kwon] is there any reason why this PR is not backported to branch2.4 ?

> Unable to insert partitioned table dynamically when partition name is upper 
> case
> 
>
> Key: SPARK-28054
> URL: https://issues.apache.org/jira/browse/SPARK-28054
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ChenKai
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:java}
> -- create sql and column name is upper case
> CREATE TABLE src (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING)
> -- insert sql
> INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
> {code}
> The error is:
> {code:java}
> Error in query: 
> org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: 
> Partition spec {ds=, DS=1} contains non-partition columns;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113525#comment-17113525
 ] 

Sandeep Katta commented on SPARK-31761:
---

I have raised the Pull request [https://github.com/apache/spark/pull/28600]

> Sql Div operator can result in incorrect output for int_min
> ---
>
> Key: SPARK-31761
> URL: https://issues.apache.org/jira/browse/SPARK-31761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Priority: Major
>
> Input  in csv : -2147483648,-1  --> (_c0, _c1)
> {code}
> val res = df.selectExpr("_c0 div _c1")
> res.collect
> res1: Array[org.apache.spark.sql.Row] = Array([-2147483648])
> {code}
> The result should be 2147483648 instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113453#comment-17113453
 ] 

Sandeep Katta commented on SPARK-31761:
---

okay I will try to fix it 

> Sql Div operator can result in incorrect output for int_min
> ---
>
> Key: SPARK-31761
> URL: https://issues.apache.org/jira/browse/SPARK-31761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Priority: Major
>
> Input  in csv : -2147483648,-1  --> (_c0, _c1)
> {code}
> val res = df.selectExpr("_c0 div _c1")
> res.collect
> res1: Array[org.apache.spark.sql.Row] = Array([-2147483648])
> {code}
> The result should be 2147483648 instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113344#comment-17113344
 ] 

Sandeep Katta commented on SPARK-31761:
---

but to fix this do we need to revert 
https://issues.apache.org/jira/browse/SPARK-16323 or we just cast input to long 
and divide ?

> Sql Div operator can result in incorrect output for int_min
> ---
>
> Key: SPARK-31761
> URL: https://issues.apache.org/jira/browse/SPARK-31761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Priority: Major
>
> Input  in csv : -2147483648,-1  --> (_c0, _c1)
> {code}
> val res = df.selectExpr("_c0 div _c1")
> res.collect
> res1: Array[org.apache.spark.sql.Row] = Array([-2147483648])
> {code}
> The result should be 2147483648 instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113337#comment-17113337
 ] 

Sandeep Katta commented on SPARK-31761:
---

I can fix this, I will raise PR

> Sql Div operator can result in incorrect output for int_min
> ---
>
> Key: SPARK-31761
> URL: https://issues.apache.org/jira/browse/SPARK-31761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Priority: Major
>
> Input  in csv : -2147483648,-1  --> (_c0, _c1)
> {code}
> val res = df.selectExpr("_c0 div _c1")
> res.collect
> res1: Array[org.apache.spark.sql.Row] = Array([-2147483648])
> {code}
> The result should be 2147483648 instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-20 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112368#comment-17112368
 ] 

Sandeep Katta edited comment on SPARK-31761 at 5/20/20, 3:46 PM:
-

[~sowen] [~hyukjin.kwon]  [~dongjoon]

 

I have executed the same query in spark-2.4.4 it works as per expectation.

 

As you can see from 2.4.4 plan, columns are casted to double,  so there won't 
be *Integer overflow.*

== Parsed Logical Plan ==
 'Project [cast(('col0 / 'col1) as bigint) AS CAST((col0 / col1) AS 
BIGINT)#4|#4]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Analyzed Logical Plan ==
 CAST((col0 / col1) AS BIGINT): bigint
 Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Optimized Logical Plan ==
 Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Physical Plan ==
 *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as 
double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- *(1) FileScan csv [col0#0,col1#1|#0,col1#1] Batched: false, Format: CSV, 
Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
 *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as 
double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- *(1) FileScan csv [col0#0,col1#1|#0,col1#1] Batched: false, Format: CSV, 
Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct

*Spark-3.0 Plan*

== Parsed Logical Plan ==
 'Project [('col0 div 'col1) AS (col0 div col1)#4|#4]
 +- RelationV2[col0#0, col1#1|#0, col1#1] csv 
[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv]

== Analyzed Logical Plan ==
 (col0 div col1): int
 Project [(col0#0 div col1#1) AS (col0 div col1)#4|#0 div col1#1) AS (col0 div 
col1)#4]
 +- RelationV2[col0#0, col1#1|#0, col1#1] csv 
[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv]

== Optimized Logical Plan ==
 Project [(col0#0 div col1#1) AS (col0 div col1)#4|#0 div col1#1) AS (col0 div 
col1)#4]
 +- RelationV2[col0#0, col1#1|#0, col1#1] csv 
[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv]

== Physical Plan ==
 *(1) Project [(col0#0 div col1#1) AS (col0 div col1)#4|#0 div col1#1) AS (col0 
div col1)#4]
 +- BatchScan[col0#0, col1#1|#0, col1#1] CSVScan Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 ReadSchema: struct

 

In Spark3 do I need to cast the columns as in spark-2.4,  or user should 
manually add cast to their query as per below example

 

val schema = "col0 int,col1 int";
 val df = spark.read.schema(schema).csv("file:/opt/fordebug/divTest.csv");
 val res = df.selectExpr("col0 div col1")
 val res = df.selectExpr("Cast(col0 as Decimal) div col1 ")
 res.collect

 

please let us know your opinion 


was (Author: sandeep.katta2007):
[~sowen] [~hyukjin.kwon] 

 

I have executed the same query in spark-2.4.4 it works as per expectation.

 

As you can see from 2.4.4 plan, columns are casted to double,  so there won't 
be *Integer overflow.*

== Parsed Logical Plan ==
 'Project [cast(('col0 / 'col1) as bigint) AS CAST((col0 / col1) AS 
BIGINT)#4|#4]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Analyzed Logical Plan ==
 CAST((col0 / col1) AS BIGINT): bigint
 Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Optimized Logical Plan ==
 Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Physical Plan ==
 *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as 
double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- *(1) FileScan csv [col0#0,col1#1|#0,col1#1] Batched: false, Format: CSV, 
Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
 *(1) P

[jira] [Comment Edited] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-20 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112368#comment-17112368
 ] 

Sandeep Katta edited comment on SPARK-31761 at 5/20/20, 3:44 PM:
-

[~sowen] [~hyukjin.kwon] 

 

I have executed the same query in spark-2.4.4 it works as per expectation.

 

As you can see from 2.4.4 plan, columns are casted to double,  so there won't 
be *Integer overflow.*

== Parsed Logical Plan ==
 'Project [cast(('col0 / 'col1) as bigint) AS CAST((col0 / col1) AS 
BIGINT)#4|#4]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Analyzed Logical Plan ==
 CAST((col0 / col1) AS BIGINT): bigint
 Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Optimized Logical Plan ==
 Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- Relation[col0#0,col1#1|#0,col1#1] csv

== Physical Plan ==
 *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as 
double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- *(1) FileScan csv [col0#0,col1#1|#0,col1#1] Batched: false, Format: CSV, 
Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
 *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as 
bigint) AS CAST((col0 / col1) AS BIGINT)#4L|#0 as double) / cast(col1#1 as 
double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L]
 +- *(1) FileScan csv [col0#0,col1#1|#0,col1#1] Batched: false, Format: CSV, 
Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct

*Spark-3.0 Plan*

== Parsed Logical Plan ==
 'Project [('col0 div 'col1) AS (col0 div col1)#4|#4]
 +- RelationV2[col0#0, col1#1|#0, col1#1] csv 
[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv]

== Analyzed Logical Plan ==
 (col0 div col1): int
 Project [(col0#0 div col1#1) AS (col0 div col1)#4|#0 div col1#1) AS (col0 div 
col1)#4]
 +- RelationV2[col0#0, col1#1|#0, col1#1] csv 
[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv]

== Optimized Logical Plan ==
 Project [(col0#0 div col1#1) AS (col0 div col1)#4|#0 div col1#1) AS (col0 div 
col1)#4]
 +- RelationV2[col0#0, col1#1|#0, col1#1] csv 
[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv]

== Physical Plan ==
 *(1) Project [(col0#0 div col1#1) AS (col0 div col1)#4|#0 div col1#1) AS (col0 
div col1)#4]
 +- BatchScan[col0#0, col1#1|#0, col1#1] CSVScan Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv|file:///opt/fordebug/divTest.csv],
 ReadSchema: struct

 

In Spark3 do I need to cast the columns as in spark-2.4,  or user should 
manually add cast to their query as per below example

 

val schema = "col0 int,col1 int";
 val df = spark.read.schema(schema).csv("file:/opt/fordebug/divTest.csv");
 val res = df.selectExpr("col0 div col1")
 val res = df.selectExpr("Cast(col0 as Decimal) div col1 ")
 res.collect

 

please let us know your opinion 


was (Author: sandeep.katta2007):
[~sowen] [~hyukjin.kwon] 

 

I have executed the same query in spark-2.4.4 it works as per expectation.

 

As you can see from 2.4.4 plan, columns are casted to double,  so there won't 
be *Integer overflow.*

== Parsed Logical Plan ==
'Project [cast(('col0 / 'col1) as bigint) AS CAST((col0 / col1) AS BIGINT)#4]
+- Relation[col0#0,col1#1] csv

== Analyzed Logical Plan ==
CAST((col0 / col1) AS BIGINT): bigint
Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L]
+- Relation[col0#0,col1#1] csv

== Optimized Logical Plan ==
Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L]
+- Relation[col0#0,col1#1] csv

== Physical Plan ==
*(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) 
AS CAST((col0 / col1) AS BIGINT)#4L]
+- *(1) FileScan csv [col0#0,col1#1] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
*(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) 
AS CAST((col0 / col1) AS BIGINT)#4L]
+- *(1) FileScan csv [col0#0,col1#1] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct


Spark-3.0


== Parsed Logical Plan ==
'Project [('col0 div 'c

[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-20 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112368#comment-17112368
 ] 

Sandeep Katta commented on SPARK-31761:
---

[~sowen] [~hyukjin.kwon] 

 

I have executed the same query in spark-2.4.4 it works as per expectation.

 

As you can see from 2.4.4 plan, columns are casted to double,  so there won't 
be *Integer overflow.*

== Parsed Logical Plan ==
'Project [cast(('col0 / 'col1) as bigint) AS CAST((col0 / col1) AS BIGINT)#4]
+- Relation[col0#0,col1#1] csv

== Analyzed Logical Plan ==
CAST((col0 / col1) AS BIGINT): bigint
Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L]
+- Relation[col0#0,col1#1] csv

== Optimized Logical Plan ==
Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS 
CAST((col0 / col1) AS BIGINT)#4L]
+- Relation[col0#0,col1#1] csv

== Physical Plan ==
*(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) 
AS CAST((col0 / col1) AS BIGINT)#4L]
+- *(1) FileScan csv [col0#0,col1#1] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
*(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) 
AS CAST((col0 / col1) AS BIGINT)#4L]
+- *(1) FileScan csv [col0#0,col1#1] Batched: false, Format: CSV, Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct


Spark-3.0


== Parsed Logical Plan ==
'Project [('col0 div 'col1) AS (col0 div col1)#4]
+- RelationV2[col0#0, col1#1] csv file:/opt/fordebug/divTest.csv

== Analyzed Logical Plan ==
(col0 div col1): int
Project [(col0#0 div col1#1) AS (col0 div col1)#4]
+- RelationV2[col0#0, col1#1] csv file:/opt/fordebug/divTest.csv

== Optimized Logical Plan ==
Project [(col0#0 div col1#1) AS (col0 div col1)#4]
+- RelationV2[col0#0, col1#1] csv file:/opt/fordebug/divTest.csv

== Physical Plan ==
*(1) Project [(col0#0 div col1#1) AS (col0 div col1)#4]
+- BatchScan[col0#0, col1#1] CSVScan Location: 
InMemoryFileIndex[file:/opt/fordebug/divTest.csv], ReadSchema: 
struct

 

In Spark3 do I need to cast the columns as in spark-2.4,  or user should 
manually add cast to their query as per below example

 

val schema = "col0 int,col1 int";
val df = spark.read.schema(schema).csv("file:/opt/fordebug/divTest.csv");
val res = df.selectExpr("col0 div col1")
val res = df.selectExpr("Cast(col0 as Decimal) div col1 ")
res.collect

 

please let us know your opinion 

> Sql Div operator can result in incorrect output for int_min
> ---
>
> Key: SPARK-31761
> URL: https://issues.apache.org/jira/browse/SPARK-31761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Priority: Major
>
> Input  in csv : -2147483648,-1  --> (_c0, _c1)
> {code}
> val res = df.selectExpr("_c0 div _c1")
> res.collect
> res1: Array[org.apache.spark.sql.Row] = Array([-2147483648])
> {code}
> The result should be 2147483648 instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min

2020-05-19 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111294#comment-17111294
 ] 

Sandeep Katta commented on SPARK-31761:
---

[~kshukla] thanks for raising this, I cross checked this in 2.4.4 it seems to 
work fine. 

 

I will analyse  w.r.t 3.0.0 and raise PR soon .

> Sql Div operator can result in incorrect output for int_min
> ---
>
> Key: SPARK-31761
> URL: https://issues.apache.org/jira/browse/SPARK-31761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Priority: Major
>
> Input  in csv : -2147483648,-1  --> (_c0, _c1)
> {code}
> val res = df.selectExpr("_c0 div _c1")
> res.collect
> res1: Array[org.apache.spark.sql.Row] = Array([-2147483648])
> {code}
> The result should be 2147483648 instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31617) support drop multiple functions

2020-05-05 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100451#comment-17100451
 ] 

Sandeep Katta commented on SPARK-31617:
---

[~hyukjin.kwon] if this is required by spark community I can work on this , 
what's your suggesstion ?

> support drop multiple functions
> ---
>
> Key: SPARK-31617
> URL: https://issues.apache.org/jira/browse/SPARK-31617
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jobit mathew
>Priority: Minor
>
> postgresql support dropping of multiple functions in one command.Better spark 
> sql also can support this.
>  
> [https://www.postgresql.org/docs/12/sql-dropfunction.html]
> Drop multiple functions in one command:
> DROP FUNCTION sqrt(integer), sqrt(bigint);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

2020-04-08 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077939#comment-17077939
 ] 

Sandeep Katta commented on SPARK-23128:
---

[~cloud_fan] [~carsonwang] any updates on dynamic parallelism and skew 
Handling. Is it fixed in 3.0.0 

> A new approach to do adaptive execution in Spark SQL
> 
>
> Key: SPARK-23128
> URL: https://issues.apache.org/jira/browse/SPARK-23128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Carson Wang
>Assignee: Carson Wang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: AdaptiveExecutioninBaidu.pdf
>
>
> SPARK-9850 proposed the basic idea of adaptive execution in Spark. In 
> DAGScheduler, a new API is added to support submitting a single map stage.  
> The current implementation of adaptive execution in Spark SQL supports 
> changing the reducer number at runtime. An Exchange coordinator is used to 
> determine the number of post-shuffle partitions for a stage that needs to 
> fetch shuffle data from one or multiple stages. The current implementation 
> adds ExchangeCoordinator while we are adding Exchanges. However there are 
> some limitations. First, it may cause additional shuffles that may decrease 
> the performance. We can see this from EnsureRequirements rule when it adds 
> ExchangeCoordinator.  Secondly, it is not a good idea to add 
> ExchangeCoordinators while we are adding Exchanges because we don’t have a 
> global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 
> 3 tables’ join in a single stage, the same ExchangeCoordinator should be used 
> in three Exchanges but currently two separated ExchangeCoordinator will be 
> added. Thirdly, with the current framework it is not easy to implement other 
> features in adaptive execution flexibly like changing the execution plan and 
> handling skewed join at runtime.
> We'd like to introduce a new way to do adaptive execution in Spark SQL and 
> address the limitations. The idea is described at 
> [https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30096) drop function does not delete the jars file from tmp folder- session is not getting clear

2020-02-25 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044353#comment-17044353
 ] 

Sandeep Katta commented on SPARK-30096:
---

[~abhishek.akg] I have cross checked the behaviour, those temp files will be 
deleted once the session is closed. I double checked the code and found the 
same. Can you please recheck after existing the session ?

Code which cleans the temp directory

 !screenshot-1.png! 



> drop function does not delete the jars file from tmp folder- session is not 
> getting clear
> -
>
> Key: SPARK-30096
> URL: https://issues.apache.org/jira/browse/SPARK-30096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Steps:
>  1. {{spark-master --master yarn}}
>  2. {{spark-sql> create function addDoubles AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';}}
>  3. {{spark-sql> select addDoubles (1,2);}}
> In the console log user can see
>  Added 
> [/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources/AddDoublesUDF.jar] to 
> class path
> {code}
>  19/12/02 13:49:33 INFO SessionState: Added 
> [/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources/AddDoublesUDF.jar] to 
> class path
> {code}
> 4. {{spark-sql>drop function AddDoubles;}}
>  Check the tmp folder still session is not clear
> {code:java}
> vm1:/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources # ll
> total 11696
> -rw-r--r-- 1 root root92660 Dec  2 13:49 .AddDoublesUDF.jar.crc
> -rwxr-xr-x 1 root root 11859263 Dec  2 13:49 AddDoublesUDF.jar
> vm1:/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources #
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30096) drop function does not delete the jars file from tmp folder- session is not getting clear

2020-02-25 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30096:
--
Attachment: screenshot-1.png

> drop function does not delete the jars file from tmp folder- session is not 
> getting clear
> -
>
> Key: SPARK-30096
> URL: https://issues.apache.org/jira/browse/SPARK-30096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Steps:
>  1. {{spark-master --master yarn}}
>  2. {{spark-sql> create function addDoubles AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';}}
>  3. {{spark-sql> select addDoubles (1,2);}}
> In the console log user can see
>  Added 
> [/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources/AddDoublesUDF.jar] to 
> class path
> {code}
>  19/12/02 13:49:33 INFO SessionState: Added 
> [/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources/AddDoublesUDF.jar] to 
> class path
> {code}
> 4. {{spark-sql>drop function AddDoubles;}}
>  Check the tmp folder still session is not clear
> {code:java}
> vm1:/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources # ll
> total 11696
> -rw-r--r-- 1 root root92660 Dec  2 13:49 .AddDoublesUDF.jar.crc
> -rwxr-xr-x 1 root root 11859263 Dec  2 13:49 AddDoublesUDF.jar
> vm1:/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources #
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30133) Support DELETE Jar and DELETE File functionality in spark

2020-01-13 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014328#comment-17014328
 ] 

Sandeep Katta commented on SPARK-30133:
---

[~hyukjin.kwon] I have raised the PR for these, Please help me to review it.

> Support DELETE Jar and DELETE File functionality in spark
> -
>
> Key: SPARK-30133
> URL: https://issues.apache.org/jira/browse/SPARK-30133
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>  Labels: Umbrella
>
> SPARK should support delete jar feature
> This feature aims at solving below use case.
> Currently in spark add jar API supports to add the jar to executor and Driver 
> ClassPath at runtime, if there is any change in this jar definition there is 
> no way user can update the jar to executor and Driver classPath. User needs 
> to restart the application to solve this problem which is costly operation.
> After this JIRA fix user can use delete jar API to remove the jar from Driver 
> and Executor ClassPath without the need of restarting the  any spark 
> application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30133) Support DELETE Jar and DELETE File functionality in spark

2020-01-13 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30133:
--
Description: 
SPARK should support delete jar feature

This feature aims at solving below use case.

Currently in spark add jar API supports to add the jar to executor and Driver 
ClassPath at runtime, if there is any change in this jar definition there is no 
way user can update the jar to executor and Driver classPath. User needs to 
restart the application to solve this problem which is costly operation.

After this JIRA fix user can use delete jar API to remove the jar from Driver 
and Executor ClassPath without the need of restarting the  any spark 
application.

  was:SPARK should support delete jar feature


> Support DELETE Jar and DELETE File functionality in spark
> -
>
> Key: SPARK-30133
> URL: https://issues.apache.org/jira/browse/SPARK-30133
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>  Labels: Umbrella
>
> SPARK should support delete jar feature
> This feature aims at solving below use case.
> Currently in spark add jar API supports to add the jar to executor and Driver 
> ClassPath at runtime, if there is any change in this jar definition there is 
> no way user can update the jar to executor and Driver classPath. User needs 
> to restart the application to solve this problem which is costly operation.
> After this JIRA fix user can use delete jar API to remove the jar from Driver 
> and Executor ClassPath without the need of restarting the  any spark 
> application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30425) FileScan of Data Source V2 doesn't implement Partition Pruning

2020-01-07 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010408#comment-17010408
 ] 

Sandeep Katta commented on SPARK-30425:
---

[~gengliang]

> FileScan of Data Source V2 doesn't implement Partition Pruning
> --
>
> Key: SPARK-30425
> URL: https://issues.apache.org/jira/browse/SPARK-30425
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Haifeng Chen
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I was trying to understand how Data Source V2 handling partition pruning,  I 
> didn't find the code anywhere which filtering out the unnecessary files in 
> current Data Source V2 implementation. For a File data source, the base class 
> FileScan of Data Source V2 possibly should handle this in "partitions" 
> method. But the current implementation is like the following:
> protected def partitions: Seq[FilePartition] = {
>  val selectedPartitions = fileIndex.listFiles(Seq.empty, Seq.empty)
>  
> listFiles passed to empty sequence where no files will be filtered by the 
> partition filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30425) FileScan of Data Source V2 doesn't implement Partition Pruning

2020-01-07 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010407#comment-17010407
 ] 

Sandeep Katta commented on SPARK-30425:
---

is this duplicate of 
[SPARK-30428|https://issues.apache.org/jira/browse/SPARK-30428]

> FileScan of Data Source V2 doesn't implement Partition Pruning
> --
>
> Key: SPARK-30425
> URL: https://issues.apache.org/jira/browse/SPARK-30425
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Haifeng Chen
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I was trying to understand how Data Source V2 handling partition pruning,  I 
> didn't find the code anywhere which filtering out the unnecessary files in 
> current Data Source V2 implementation. For a File data source, the base class 
> FileScan of Data Source V2 possibly should handle this in "partitions" 
> method. But the current implementation is like the following:
> protected def partitions: Seq[FilePartition] = {
>  val selectedPartitions = fileIndex.listFiles(Seq.empty, Seq.empty)
>  
> listFiles passed to empty sequence where no files will be filtered by the 
> partition filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30362) InputMetrics are not updated for DataSourceRDD V2

2019-12-26 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30362:
--
Attachment: inputMetrics.png

> InputMetrics are not updated for DataSourceRDD V2 
> --
>
> Key: SPARK-30362
> URL: https://issues.apache.org/jira/browse/SPARK-30362
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
> Attachments: inputMetrics.png
>
>
> InputMetrics is not updated for DataSourceRDD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30362) InputMetrics are not updated for DataSourceRDD V2

2019-12-26 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30362:
-

 Summary: InputMetrics are not updated for DataSourceRDD V2 
 Key: SPARK-30362
 URL: https://issues.apache.org/jira/browse/SPARK-30362
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Sandeep Katta


InputMetrics is not updated for DataSourceRDD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30333) Bump jackson-databind to 2.6.7.3

2019-12-23 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30333:
-

 Summary: Bump  jackson-databind to 2.6.7.3 
 Key: SPARK-30333
 URL: https://issues.apache.org/jira/browse/SPARK-30333
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Sandeep Katta


To fix below CVE

 

CVE-2018-14718

CVE-2018-14719

CVE-2018-14720

CVE-2018-14721

CVE-2018-19360,

CVE-2018-19361

CVE-2018-19362



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30318) Bump jetty to 9.3.27.v20190418

2019-12-20 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30318:
-

 Summary: Bump jetty to 9.3.27.v20190418
 Key: SPARK-30318
 URL: https://issues.apache.org/jira/browse/SPARK-30318
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Sandeep Katta


Upgrade jetty to 9.3.27.v20190418 to fix CVE-2019-10241 and CVE-2019-10247



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27021) Leaking Netty event loop group for shuffle chunk fetch requests

2019-12-20 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000782#comment-17000782
 ] 

Sandeep Katta commented on SPARK-27021:
---

[~hyukjin.kwon] [~dongjoon] this issue is not back ported to branch-2.4. Please 
check it is required or not

> Leaking Netty event loop group for shuffle chunk fetch requests
> ---
>
> Key: SPARK-27021
> URL: https://issues.apache.org/jira/browse/SPARK-27021
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2019-12-14-23-23-50-384.png
>
>
> The extra event loop group created for handling shuffle chunk fetch requests 
> are never closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30137) Support DELETE file

2019-12-11 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994263#comment-16994263
 ] 

Sandeep Katta commented on SPARK-30137:
---

yes, I working on this, will raise it by tomorrow 

> Support DELETE file 
> 
>
> Key: SPARK-30137
> URL: https://issues.apache.org/jira/browse/SPARK-30137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-30175) Eliminate warnings: part 5

2019-12-09 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30175:
--
Comment: was deleted

(was: thanks for raising, will raise PR soon)

> Eliminate warnings: part 5
> --
>
> Key: SPARK-30175
> URL: https://issues.apache.org/jira/browse/SPARK-30175
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jobit mathew
>Priority: Minor
>
> sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/WriteToMicroBatchDataSource.scala
> {code:java}
> Warning:Warning:line (36)class WriteToDataSourceV2 in package v2 is 
> deprecated (since 2.4.0): Use specific logical plans like AppendData instead
>   def createPlan(batchId: Long): WriteToDataSourceV2 = {
> Warning:Warning:line (37)class WriteToDataSourceV2 in package v2 is 
> deprecated (since 2.4.0): Use specific logical plans like AppendData instead
> WriteToDataSourceV2(new MicroBatchWrite(batchId, write), query)
> {code}
> sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
> {code:java}
>  Warning:Warning:line (703)a pure expression does nothing in statement 
> position; multiline expressions might require enclosing parentheses
>   q1
> {code}
> sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
> {code:java}
> Warning:Warning:line (285)object typed in package scalalang is deprecated 
> (since 3.0.0): please use untyped builtin aggregate functions.
> val aggregated = 
> inputData.toDS().groupByKey(_._1).agg(typed.sumLong(_._2))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30175) Eliminate warnings: part 5

2019-12-08 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991146#comment-16991146
 ] 

Sandeep Katta commented on SPARK-30175:
---

thanks for raising, will raise PR soon

> Eliminate warnings: part 5
> --
>
> Key: SPARK-30175
> URL: https://issues.apache.org/jira/browse/SPARK-30175
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jobit mathew
>Priority: Minor
>
> sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/WriteToMicroBatchDataSource.scala
> sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
> sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30135) Add documentation for DELETE JAR and DELETE File command

2019-12-04 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30135:
--
Summary: Add documentation for DELETE JAR and DELETE File command  (was: 
Add documentation for DELETE JAR command)

> Add documentation for DELETE JAR and DELETE File command
> 
>
> Key: SPARK-30135
> URL: https://issues.apache.org/jira/browse/SPARK-30135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30137) Support DELETE file

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988508#comment-16988508
 ] 

Sandeep Katta commented on SPARK-30137:
---

I started working on this feature, I will raise the PR soon

> Support DELETE file 
> 
>
> Key: SPARK-30137
> URL: https://issues.apache.org/jira/browse/SPARK-30137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30137) Support DELETE file

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30137:
-

 Summary: Support DELETE file 
 Key: SPARK-30137
 URL: https://issues.apache.org/jira/browse/SPARK-30137
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30133) Support DELETE Jar and DELETE File functionality in spark

2019-12-04 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30133:
--
Summary: Support DELETE Jar and DELETE File functionality in spark  (was: 
Support DELETE Jar functionality in spark)

> Support DELETE Jar and DELETE File functionality in spark
> -
>
> Key: SPARK-30133
> URL: https://issues.apache.org/jira/browse/SPARK-30133
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>  Labels: Umbrella
>
> SPARK should support delete jar feature



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30136) DELETE JAR should also remove the jar from executor classPath

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30136:
-

 Summary: DELETE JAR should also remove the jar from executor 
classPath
 Key: SPARK-30136
 URL: https://issues.apache.org/jira/browse/SPARK-30136
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30136) DELETE JAR should also remove the jar from executor classPath

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988504#comment-16988504
 ] 

Sandeep Katta commented on SPARK-30136:
---

I started working on this feature, I will raise the PR soon

> DELETE JAR should also remove the jar from executor classPath
> -
>
> Key: SPARK-30136
> URL: https://issues.apache.org/jira/browse/SPARK-30136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30134) DELETE JAR should remove from addedJars list and from classpath

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988502#comment-16988502
 ] 

Sandeep Katta commented on SPARK-30134:
---

I started working on this feature, I will raise the PR soon

> DELETE JAR should remove  from addedJars list and from classpath
> 
>
> Key: SPARK-30134
> URL: https://issues.apache.org/jira/browse/SPARK-30134
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30135) Add documentation for DELETE JAR command

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988503#comment-16988503
 ] 

Sandeep Katta commented on SPARK-30135:
---

I started working on this feature, I will raise the PR soon

> Add documentation for DELETE JAR command
> 
>
> Key: SPARK-30135
> URL: https://issues.apache.org/jira/browse/SPARK-30135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30135) Add documentation for DELETE JAR command

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30135:
-

 Summary: Add documentation for DELETE JAR command
 Key: SPARK-30135
 URL: https://issues.apache.org/jira/browse/SPARK-30135
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30134) DELETE JAR should remove from addedJars list and from classpath

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30134:
-

 Summary: DELETE JAR should remove  from addedJars list and from 
classpath
 Key: SPARK-30134
 URL: https://issues.apache.org/jira/browse/SPARK-30134
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30133) Support DELETE Jar functionality in spark

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30133:
-

 Summary: Support DELETE Jar functionality in spark
 Key: SPARK-30133
 URL: https://issues.apache.org/jira/browse/SPARK-30133
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta


SPARK should support delete jar feature



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30096) drop function does not delete the jars file from tmp folder- session is not getting clear

2019-12-01 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985862#comment-16985862
 ] 

Sandeep Katta commented on SPARK-30096:
---

[~abhishek.akg] thanks for raising this issue, I will analyze and raise the PR 
if require

> drop function does not delete the jars file from tmp folder- session is not 
> getting clear
> -
>
> Key: SPARK-30096
> URL: https://issues.apache.org/jira/browse/SPARK-30096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Steps:
> 1.spark-master --master yarn
> 2. spark-sql> create function addDoubles  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF'  using jar 
> 'hdfs://hacluster/user/user1/AddDoublesUDF.jar';
> 3. spark-sql> select addDoubles (1,2);
> In the console log user can see
> Added [/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources/AddDoublesUDF.jar] 
> to class path
> 19/12/02 13:49:33 INFO SessionState: Added 
> [/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources/AddDoublesUDF.jar] to 
> class path
> 4. spark-sql>drop function AddDoubles;
> Check the tmp folder still session is not clear
> vm1:/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources # ll
> total 11696
> -rw-r--r-- 1 root root92660 Dec  2 13:49 .AddDoublesUDF.jar.crc
> -rwxr-xr-x 1 root root 11859263 Dec  2 13:49 AddDoublesUDF.jar
> vm1:/tmp/23090937-7314-43d8-a859-b35f42d37bdf_resources #



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29504) Tooltip not display for Job Description even it shows ellipsed

2019-10-20 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955677#comment-16955677
 ] 

Sandeep Katta commented on SPARK-29504:
---

It's invalid issue, spark already supports this feature. If you double click on 
column it will show full description.Refer the Jira

[SPARK-27135|https://issues.apache.org/jira/browse/SPARK-27135]

> Tooltip  not display for Job Description even it shows ellipsed
> ---
>
> Key: SPARK-29504
> URL: https://issues.apache.org/jira/browse/SPARK-29504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: ToolTip JIRA.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29465) Unable to configure SPARK UI (spark.ui.port) in spark yarn cluster mode.

2019-10-15 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952537#comment-16952537
 ] 

Sandeep Katta commented on SPARK-29465:
---

[~dongjoon]  thank you for the review. I will raise PR soon

> Unable to configure SPARK UI (spark.ui.port) in spark yarn cluster mode. 
> -
>
> Key: SPARK-29465
> URL: https://issues.apache.org/jira/browse/SPARK-29465
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit, YARN
>Affects Versions: 3.0.0
>Reporter: Vishwas Nalka
>Priority: Major
>
>  I'm trying to restrict the ports used by spark app which is launched in yarn 
> cluster mode. All ports (viz. driver, executor, blockmanager) could be 
> specified using the respective properties except the ui port. The spark app 
> is launched using JAVA code and setting the property spark.ui.port in 
> sparkConf doesn't seem to help. Even setting a JVM option 
> -Dspark.ui.port="some_port" does not spawn the UI is required port. 
> From the logs of the spark app, *_the property spark.ui.port is overridden 
> and the JVM property '-Dspark.ui.port=0' is set_* even though it is never set 
> to 0. 
> _(Run in Spark 1.6.2) From the logs ->_
> _command:LD_LIBRARY_PATH="/usr/hdp/2.6.4.0-91/hadoop/lib/native:$LD_LIBRARY_PATH"
>  {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms4096m 
> -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.blockManager.port=9900' 
> '-Dspark.driver.port=9902' '-Dspark.fileserver.port=9903' 
> '-Dspark.broadcast.port=9904' '-Dspark.port.maxRetries=20' 
> '-Dspark.ui.port=0' '-Dspark.executor.port=9905'_
> _19/10/14 16:39:59 INFO Utils: Successfully started service 'SparkUI' on port 
> 35167.19/10/14 16:39:59 INFO SparkUI: Started SparkUI at_ 
> [_http://10.65.170.98:35167_|http://10.65.170.98:35167/]
> Even tried using a *spark-submit command with --conf spark.ui.port* does 
> spawn UI in required port
> {color:#172b4d}_(Run in Spark 2.4.4)_{color}
>  {color:#172b4d}_./bin/spark-submit --class org.apache.spark.examples.SparkPi 
> --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g 
> --executor-cores 1 --conf spark.ui.port=12345 --conf spark.driver.port=12340 
> --queue default examples/jars/spark-examples_2.11-2.4.4.jar 10_{color}
> _From the logs::_
>  _19/10/15 00:04:05 INFO ui.SparkUI: Stopped Spark web UI at 
> [http://invrh74ace005.informatica.com:46622|http://invrh74ace005.informatica.com:46622/]_
> _command:{{JAVA_HOME}}/bin/java -server -Xmx2048m 
> -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.ui.port=0'  'Dspark.driver.port=12340' 
> -Dspark.yarn.app.container.log.dir= -XX:OnOutOfMemoryError='kill %p' 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://coarsegrainedschedu...@invrh74ace005.informatica.com:12340 
> --executor-id  --hostname  --cores 1 --app-id 
> application_1570992022035_0089 --user-class-path 
> [file:$PWD/__app__.jar1|file://%24pwd/__app__.jar1]>/stdout2>/stderr_
>  
> Looks like the application master override this and set a JVM property before 
> launch resulting in random UI port even though spark.ui.port is set by the 
> user.
> In these links
>  # 
> [https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala]
>  (line 214)
>  # 
> [https://github.com/cloudera/spark/blob/master/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala]
>  (line 75)
> I can see that the method _*run() in above files sets a system property 
> UI_PORT*_ and _*spark.ui.port respectively.*_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29465) Unable to configure SPARK UI (spark.ui.port) in spark yarn cluster mode.

2019-10-14 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951652#comment-16951652
 ] 

Sandeep Katta commented on SPARK-29465:
---

agreed, I will raise the PR soon

> Unable to configure SPARK UI (spark.ui.port) in spark yarn cluster mode. 
> -
>
> Key: SPARK-29465
> URL: https://issues.apache.org/jira/browse/SPARK-29465
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, YARN
>Affects Versions: 1.6.2, 2.4.4
>Reporter: Vishwas Nalka
>Priority: Major
>
>  I'm trying to restrict the ports used by spark app which is launched in yarn 
> cluster mode. All ports (viz. driver, executor, blockmanager) could be 
> specified using the respective properties except the ui port. The spark app 
> is launched using JAVA code and setting the property spark.ui.port in 
> sparkConf doesn't seem to help. Even setting a JVM option 
> -Dspark.ui.port="some_port" does not spawn the UI is required port. 
> From the logs of the spark app, *_the property spark.ui.port is overridden 
> and the JVM property '-Dspark.ui.port=0' is set_* even though it is never set 
> to 0. 
> _(Run in Spark 1.6.2) From the logs ->_
> _command:LD_LIBRARY_PATH="/usr/hdp/2.6.4.0-91/hadoop/lib/native:$LD_LIBRARY_PATH"
>  {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms4096m 
> -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.blockManager.port=9900' 
> '-Dspark.driver.port=9902' '-Dspark.fileserver.port=9903' 
> '-Dspark.broadcast.port=9904' '-Dspark.port.maxRetries=20' 
> '-Dspark.ui.port=0' '-Dspark.executor.port=9905'_
> _19/10/14 16:39:59 INFO Utils: Successfully started service 'SparkUI' on port 
> 35167.19/10/14 16:39:59 INFO SparkUI: Started SparkUI at_ 
> [_http://10.65.170.98:35167_|http://10.65.170.98:35167/]
> Even tried using a *spark-submit command with --conf spark.ui.port* does 
> spawn UI in required port
> {color:#172b4d}_(Run in Spark 2.4.4)_{color}
>  {color:#172b4d}_./bin/spark-submit --class org.apache.spark.examples.SparkPi 
> --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g 
> --executor-cores 1 --conf spark.ui.port=12345 --conf spark.driver.port=12340 
> --queue default examples/jars/spark-examples_2.11-2.4.4.jar 10_{color}
> _From the logs::_
>  _19/10/15 00:04:05 INFO ui.SparkUI: Stopped Spark web UI at 
> [http://invrh74ace005.informatica.com:46622|http://invrh74ace005.informatica.com:46622/]_
> _command:{{JAVA_HOME}}/bin/java -server -Xmx2048m 
> -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.ui.port=0'  'Dspark.driver.port=12340' 
> -Dspark.yarn.app.container.log.dir= -XX:OnOutOfMemoryError='kill %p' 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://coarsegrainedschedu...@invrh74ace005.informatica.com:12340 
> --executor-id  --hostname  --cores 1 --app-id 
> application_1570992022035_0089 --user-class-path 
> [file:$PWD/__app__.jar1|file://%24pwd/__app__.jar1]>/stdout2>/stderr_
>  
> Looks like the application master override this and set a JVM property before 
> launch resulting in random UI port even though spark.ui.port is set by the 
> user.
> In these links
>  # 
> [https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala]
>  (line 214)
>  # 
> [https://github.com/cloudera/spark/blob/master/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala]
>  (line 75)
> I can see that the method _*run() in above files sets a system property 
> UI_PORT*_ and _*spark.ui.port respectively.*_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29465) Unable to configure SPARK UI (spark.ui.port) in spark yarn cluster mode.

2019-10-14 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951581#comment-16951581
 ] 

Sandeep Katta commented on SPARK-29465:
---

I see as per the comments in the [code 
|https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L212]
 behavior is correct. User can submit more spark applications in the same box.

ping [~hyukjin.kwon] [~dongjoon] what's your suggesstion

> Unable to configure SPARK UI (spark.ui.port) in spark yarn cluster mode. 
> -
>
> Key: SPARK-29465
> URL: https://issues.apache.org/jira/browse/SPARK-29465
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, YARN
>Affects Versions: 1.6.2, 2.4.4
>Reporter: Vishwas Nalka
>Priority: Major
>
>  I'm trying to restrict the ports used by spark app which is launched in yarn 
> cluster mode. All ports (viz. driver, executor, blockmanager) could be 
> specified using the respective properties except the ui port. The spark app 
> is launched using JAVA code and setting the property spark.ui.port in 
> sparkConf doesn't seem to help. Even setting a JVM option 
> -Dspark.ui.port="some_port" does not spawn the UI is required port. 
> From the logs of the spark app, *_the property spark.ui.port is overridden 
> and the JVM property '-Dspark.ui.port=0' is set_* even though it is never set 
> to 0. 
> _(Run in Spark 1.6.2) From the logs ->_
> _command:LD_LIBRARY_PATH="/usr/hdp/2.6.4.0-91/hadoop/lib/native:$LD_LIBRARY_PATH"
>  {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms4096m 
> -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.blockManager.port=9900' 
> '-Dspark.driver.port=9902' '-Dspark.fileserver.port=9903' 
> '-Dspark.broadcast.port=9904' '-Dspark.port.maxRetries=20' 
> '-Dspark.ui.port=0' '-Dspark.executor.port=9905'_
> _19/10/14 16:39:59 INFO Utils: Successfully started service 'SparkUI' on port 
> 35167.19/10/14 16:39:59 INFO SparkUI: Started SparkUI at_ 
> [_http://10.65.170.98:35167_|http://10.65.170.98:35167/]
> Even tried using a *spark-submit command with --conf spark.ui.port* does 
> spawn UI in required port
> {color:#172b4d}_(Run in Spark 2.4.4)_{color}
>  {color:#172b4d}_./bin/spark-submit --class org.apache.spark.examples.SparkPi 
> --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g 
> --executor-cores 1 --conf spark.ui.port=12345 --conf spark.driver.port=12340 
> --queue default examples/jars/spark-examples_2.11-2.4.4.jar 10_{color}
> _From the logs::_
>  _19/10/15 00:04:05 INFO ui.SparkUI: Stopped Spark web UI at 
> [http://invrh74ace005.informatica.com:46622|http://invrh74ace005.informatica.com:46622/]_
> _command:{{JAVA_HOME}}/bin/java -server -Xmx2048m 
> -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.ui.port=0'  'Dspark.driver.port=12340' 
> -Dspark.yarn.app.container.log.dir= -XX:OnOutOfMemoryError='kill %p' 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://coarsegrainedschedu...@invrh74ace005.informatica.com:12340 
> --executor-id  --hostname  --cores 1 --app-id 
> application_1570992022035_0089 --user-class-path 
> [file:$PWD/__app__.jar1|file://%24pwd/__app__.jar1]>/stdout2>/stderr_
>  
> Looks like the application master override this and set a JVM property before 
> launch resulting in random UI port even though spark.ui.port is set by the 
> user.
> In these links
>  # 
> [https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala]
>  (line 214)
>  # 
> [https://github.com/cloudera/spark/blob/master/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala]
>  (line 75)
> I can see that the method _*run() in above files sets a system property 
> UI_PORT*_ and _*spark.ui.port respectively.*_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-29453) Improve tooltip information for SQL tab

2019-10-13 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-29453:
--
Comment: was deleted

(was: working on this will raise PR soon)

> Improve tooltip information for SQL tab
> ---
>
> Key: SPARK-29453
> URL: https://issues.apache.org/jira/browse/SPARK-29453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29453) Improve tooltip information for SQL tab

2019-10-13 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950364#comment-16950364
 ] 

Sandeep Katta commented on SPARK-29453:
---

working on this will raise PR soon

> Improve tooltip information for SQL tab
> ---
>
> Key: SPARK-29453
> URL: https://issues.apache.org/jira/browse/SPARK-29453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29453) Improve tooltip information for SQL tab

2019-10-13 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-29453:
-

 Summary: Improve tooltip information for SQL tab
 Key: SPARK-29453
 URL: https://issues.apache.org/jira/browse/SPARK-29453
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29452) Improve tootip information for storage tab

2019-10-13 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-29452:
-

 Summary: Improve tootip information for storage tab
 Key: SPARK-29452
 URL: https://issues.apache.org/jira/browse/SPARK-29452
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29452) Improve tootip information for storage tab

2019-10-13 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950344#comment-16950344
 ] 

Sandeep Katta commented on SPARK-29452:
---

soon I will raise PR for this 

> Improve tootip information for storage tab
> --
>
> Key: SPARK-29452
> URL: https://issues.apache.org/jira/browse/SPARK-29452
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-13 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950261#comment-16950261
 ] 

Sandeep Katta commented on SPARK-29435:
---

[~XuanYuan] thank you for your comments, I will update the PR as per this 
changes

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.storage.ShuffleBloc

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-12 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950212#comment-16950212
 ] 

Sandeep Katta commented on SPARK-29435:
---

cc [~cloud_fan] [~XuanYuan] this patched is tested by [~koert] please help to 
review patch

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.storage.Sh

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-11 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949881#comment-16949881
 ] 

Sandeep Katta commented on SPARK-29435:
---

[~koert] can you please apply this 
[patch|https://github.com/apache/spark/pull/26095] and check ?

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.sto

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-11 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949522#comment-16949522
 ] 

Sandeep Katta commented on SPARK-29435:
---

Ya I am even able to reproduce it, I am looking into this issue. 

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.

[jira] [Issue Comment Deleted] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-11 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-29435:
--
Comment: was deleted

(was: [~koert] I will investigate this further, If I found root cause will 
raise PR soon)

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.(Shuffl

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-11 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949508#comment-16949508
 ] 

Sandeep Katta commented on SPARK-29435:
---

[~koert] I will investigate this further, If I found root cause will raise PR 
soon

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
> org.apache.spark.storage.ShuffleBlo

[jira] [Commented] (SPARK-27318) Join operation on bucketing table fails with base adaptive enabled

2019-10-09 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947580#comment-16947580
 ] 

Sandeep Katta commented on SPARK-27318:
---

can you share me your bucket2.txt and bucket3.txt, Itried with sample data it 
is working for me. So better attach your data also 

> Join operation on bucketing table fails with base adaptive enabled
> --
>
> Key: SPARK-27318
> URL: https://issues.apache.org/jira/browse/SPARK-27318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Supritha
>Priority: Major
>
> Join Operation on bucketed table is failing.
> Steps to reproduce the issue.
> {code}
> spark.sql("set spark.sql.adaptive.enabled=true")
> {code}
> 1. Create table bukcet3 and bucket4 Table as below and load the data.
> {code}
> sql("create table bucket3(id3 int,country3 String, sports3 String) row format 
> delimited fields terminated by ','").show()
> sql("create table bucket4(id4 int,country4 String) row format delimited 
> fields terminated by ','").show()
> sql("load data local inpath '/opt/abhidata/bucket2.txt' into table 
> bucket3").show()
> sql("load data local inpath '/opt/abhidata/bucket3.txt' into table 
> bucket4").show()
> {code}
> 2. Create bucketing table as below
> {code}
> spark.sqlContext.table("bucket3").write.bucketBy(3, 
> "id3").saveAsTable("bucketed_table_3");
> spark.sqlContext.table("bucket4").write.bucketBy(4, 
> "id4").saveAsTable("bucketed_table_4");
> {code}
> 3. Execute the join query on the bucketed table 
> {code}
> sql("select * from bucketed_table_3 join bucketed_table_4 on 
> bucketed_table_3.id3 = bucketed_table_4.id4").show()
> {code}
>  
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: 
> PartitioningCollection requires all of its partitionings have the same 
> numPartitions. at scala.Predef$.require(Predef.scala:224) at 
> org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.(partitioning.scala:291)
>  at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputPartitioning(SortMergeJoinExec.scala:69)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:150)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:149)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.immutable.List.map(List.scala:296) at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:149)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:304)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:281) 
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324) 
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275) 
> at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:296)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:38)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87)
>  at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124

[jira] [Comment Edited] (SPARK-27318) Join operation on bucketing table fails with base adaptive enabled

2019-10-09 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947580#comment-16947580
 ] 

Sandeep Katta edited comment on SPARK-27318 at 10/9/19 11:27 AM:
-

[~Supritha] can you share me your bucket2.txt and bucket3.txt, Itried with 
sample data it is working for me. So better attach your data also 


was (Author: sandeep.katta2007):
can you share me your bucket2.txt and bucket3.txt, Itried with sample data it 
is working for me. So better attach your data also 

> Join operation on bucketing table fails with base adaptive enabled
> --
>
> Key: SPARK-27318
> URL: https://issues.apache.org/jira/browse/SPARK-27318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Supritha
>Priority: Major
>
> Join Operation on bucketed table is failing.
> Steps to reproduce the issue.
> {code}
> spark.sql("set spark.sql.adaptive.enabled=true")
> {code}
> 1. Create table bukcet3 and bucket4 Table as below and load the data.
> {code}
> sql("create table bucket3(id3 int,country3 String, sports3 String) row format 
> delimited fields terminated by ','").show()
> sql("create table bucket4(id4 int,country4 String) row format delimited 
> fields terminated by ','").show()
> sql("load data local inpath '/opt/abhidata/bucket2.txt' into table 
> bucket3").show()
> sql("load data local inpath '/opt/abhidata/bucket3.txt' into table 
> bucket4").show()
> {code}
> 2. Create bucketing table as below
> {code}
> spark.sqlContext.table("bucket3").write.bucketBy(3, 
> "id3").saveAsTable("bucketed_table_3");
> spark.sqlContext.table("bucket4").write.bucketBy(4, 
> "id4").saveAsTable("bucketed_table_4");
> {code}
> 3. Execute the join query on the bucketed table 
> {code}
> sql("select * from bucketed_table_3 join bucketed_table_4 on 
> bucketed_table_3.id3 = bucketed_table_4.id4").show()
> {code}
>  
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: 
> PartitioningCollection requires all of its partitionings have the same 
> numPartitions. at scala.Predef$.require(Predef.scala:224) at 
> org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.(partitioning.scala:291)
>  at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputPartitioning(SortMergeJoinExec.scala:69)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:150)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:149)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.immutable.List.map(List.scala:296) at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:149)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:304)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:281) 
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324) 
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275) 
> at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:296)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:38)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(Q

[jira] [Commented] (SPARK-27695) SELECT * returns null column when reading from Hive / ORC and spark.sql.hive.convertMetastoreOrc=true

2019-10-08 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947385#comment-16947385
 ] 

Sandeep Katta commented on SPARK-27695:
---

does this jira applicable for 2.4 version also ?

> SELECT * returns null column when reading from Hive / ORC and 
> spark.sql.hive.convertMetastoreOrc=true
> -
>
> Key: SPARK-27695
> URL: https://issues.apache.org/jira/browse/SPARK-27695
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, Spark Core
>Affects Versions: 2.3.1, 2.3.2, 2.3.3
>Reporter: Oscar Cassetti
>Priority: Major
>
> If you do 
> {code:java}
> select * from hive.some_table{code}
> and the underlying data does not match exactly the schema the last column is 
> returned as null 
> Example 
> {code:java}
> from pyspark import SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.types import *
> conf = SparkConf().set('spark.sql.hive.convertMetastoreOrc', 'true')
> spark = 
> SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
> data = [{'a':i, 'b':i+10, 'd':{'a':i, 'b':i+10}} for i in range(1, 100)]
> data_schema = StructType([StructField('a', LongType(), True),
> StructField('b', LongType(), True),
> StructField('d', MapType(StringType(), LongType(), True), True)
> ])
> rdd = spark.sparkContext.parallelize(data)
> df = rdd.toDF(data_schema)
> df.write.format("orc").save("./sample_data/")
> spark.sql("""create external table tmp(
> a bigint,
> b bigint,
> d map)
> stored as orc
> location 'sample_data/'
> """)
> spark.sql("select * from tmp").show()
> {code}
> This return correctl
> {noformat}
> +---+---+---+
> |  a|  b|  d|
> +---+---+---+
> | 85| 95| [a -> 85, b -> 95]|
> | 86| 96| [a -> 86, b -> 96]|
> | 87| 97| [a -> 87, b -> 97]|
> | 88| 98| [a -> 88, b -> 98]|
> | 89| 99| [a -> 89, b -> 99]|
> | 90|100|[a -> 90, b -> 100]|
> {noformat}
> However if add a new column in the underlying data without altering the hive 
> schema 
> the last column of the hive schema is set to null
> {code}
> from pyspark import SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.types import *
> conf = SparkConf().set('spark.sql.hive.convertMetastoreOrc', 'true')
> spark = 
> SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
> data = [{'a':i, 'b':i+10, 'c':i+5, 'd':{'a':i, 'b':i+10, 'c':i+5}} for i in 
> range(1, 100)]
> data_schema = StructType([StructField('a', LongType(), True),
> StructField('b', LongType(), True),
> StructField('c', LongType(), True),
> StructField('d', MapType(StringType(), 
> LongType(), True), True)
> ])
> rdd = spark.sparkContext.parallelize(data)
> df = rdd.toDF(data_schema)
> df.write.format("orc").mode("overwrite").save("./sample_data/")
> spark.sql("select * from tmp").show()
> spark.read.orc("./sample_data/").show()
> {code}
> The first show() returns 
> {noformat}
> +---+---++
> |  a|  b|   d|
> +---+---++
> | 85| 95|null|
> | 86| 96|null|
> | 87| 97|null|
> | 88| 98|null|
> | 89| 99|null|
> | 90|100|null|
> | 91|101|null|
> | 92|102|null|
> | 93|103|null|
> | 94|104|null|
> {noformat}
> But the data on disk is correct
> {noformat}
> +---+---+---++
> |  a|  b|  c|   d|
> +---+---+---++
> | 85| 95| 90|[a -> 85, b -> 95...|
> | 86| 96| 91|[a -> 86, b -> 96...|
> | 87| 97| 92|[a -> 87, b -> 97...|
> | 88| 98| 93|[a -> 88, b -> 98...|
> | 89| 99| 94|[a -> 89, b -> 99...|
> | 90|100| 95|[a -> 90, b -> 10...|
> | 91|101| 96|[a -> 91, b -> 10...|
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29268) Failed to start spark-sql when using Derby metastore and isolatedLoader is enabled

2019-09-28 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939911#comment-16939911
 ] 

Sandeep Katta commented on SPARK-29268:
---

I found the reason, soon will raise PR

> Failed to start spark-sql when using Derby metastore and isolatedLoader is 
> enabled
> --
>
> Key: SPARK-29268
> URL: https://issues.apache.org/jira/browse/SPARK-29268
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Failed to start spark-sql when using Derby metastore and isolatedLoader is 
> enabled({{spark.sql.hive.metastore.jars != builtin}}).
> How to reproduce:
> {code:sh}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.1 --conf 
> spark.sql.hive.metastore.jars=maven
> {code}
> Logs:
> {noformat}
> ...
> Caused by: java.sql.SQLException: Failed to start database 'metastore_db' 
> with class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@15a591d9, see 
> the next exception for details.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   ... 108 more
> Caused by: java.sql.SQLException: Another instance of Derby may have already 
> booted the database /root/spark-2.3.4-bin-hadoop2.7/metastore_db.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   ... 105 more
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
> database /root/spark-2.3.4-bin-hadoop2.7/metastore_db.
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29268) Failed to start spark-sql when using Derby metastore and isolatedLoader is enabled

2019-09-26 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939073#comment-16939073
 ] 

Sandeep Katta commented on SPARK-29268:
---

I will work on this

> Failed to start spark-sql when using Derby metastore and isolatedLoader is 
> enabled
> --
>
> Key: SPARK-29268
> URL: https://issues.apache.org/jira/browse/SPARK-29268
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Failed to start spark-sql when using Derby metastore and isolatedLoader is 
> enabled({{spark.sql.hive.metastore.jars != builtin}}).
> How to reproduce:
> {code:sh}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.1 --conf 
> spark.sql.hive.metastore.jars=maven
> {code}
> Logs:
> {noformat}
> ...
> Caused by: java.sql.SQLException: Failed to start database 'metastore_db' 
> with class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@15a591d9, see 
> the next exception for details.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   ... 108 more
> Caused by: java.sql.SQLException: Another instance of Derby may have already 
> booted the database /root/spark-2.3.4-bin-hadoop2.7/metastore_db.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   ... 105 more
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
> database /root/spark-2.3.4-bin-hadoop2.7/metastore_db.
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-29254) Failed to include jars passed in through --jars when isolatedLoader is enabled

2019-09-26 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-29254:
--
Comment: was deleted

(was: [~yumwang] I would like to work on it  if you have not started )

> Failed to include jars passed in through --jars when isolatedLoader is enabled
> --
>
> Key: SPARK-29254
> URL: https://issues.apache.org/jira/browse/SPARK-29254
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Failed to include jars passed in through --jars when {{isolatedLoader}} is 
> enabled({{spark.sql.hive.metastore.jars != builtin}}). How to reproduce:
> {code:scala}
>   test("SPARK-29254: include jars passed in through --jars when 
> isolatedLoader is enabled") {
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val jar1 = TestUtils.createJarWithClasses(Seq("SparkSubmitClassA"))
> val jar2 = TestUtils.createJarWithClasses(Seq("SparkSubmitClassB"))
> val jar3 = HiveTestJars.getHiveContribJar.getCanonicalPath
> val jar4 = HiveTestJars.getHiveHcatalogCoreJar.getCanonicalPath
> val jarsString = Seq(jar1, jar2, jar3, jar4).map(j => 
> j.toString).mkString(",")
> val args = Seq(
>   "--class", SparkSubmitClassLoaderTest.getClass.getName.stripSuffix("$"),
>   "--name", "SparkSubmitClassLoaderTest",
>   "--master", "local-cluster[2,1,1024]",
>   "--conf", "spark.ui.enabled=false",
>   "--conf", "spark.master.rest.enabled=false",
>   "--conf", "spark.sql.hive.metastore.version=3.1.2",
>   "--conf", "spark.sql.hive.metastore.jars=maven",
>   "--driver-java-options", "-Dderby.system.durability=test",
>   "--jars", jarsString,
>   unusedJar.toString, "SparkSubmitClassA", "SparkSubmitClassB")
> runSparkSubmit(args)
>   }
> {code}
> Logs:
> {noformat}
> 2019-09-25 22:11:42.854 - stderr> 19/09/25 22:11:42 ERROR log: error in 
> initSerDe: java.lang.ClassNotFoundException Class 
> org.apache.hive.hcatalog.data.JsonSerDe not found
> 2019-09-25 22:11:42.854 - stderr> java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.data.JsonSerDe not found
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:84)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:289)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:271)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:663)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:646)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:898)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:937)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:539)
> 2019-09-25 22:11:42.854 - stderr> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:311)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:245)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:244)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:294)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:537)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:284)
> 2019-09-25 22:11:42.854 - stderr> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:242)
> 2019-09-25 22

[jira] [Commented] (SPARK-29254) Failed to include jars passed in through --jars when isolatedLoader is enabled

2019-09-26 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938434#comment-16938434
 ] 

Sandeep Katta commented on SPARK-29254:
---

[~yumwang] I would like to work on it  if you have not started 

> Failed to include jars passed in through --jars when isolatedLoader is enabled
> --
>
> Key: SPARK-29254
> URL: https://issues.apache.org/jira/browse/SPARK-29254
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Failed to include jars passed in through --jars when {{isolatedLoader}} is 
> enabled({{spark.sql.hive.metastore.jars != builtin}}). How to reproduce:
> {code:scala}
>   test("SPARK-29254: include jars passed in through --jars when 
> isolatedLoader is enabled") {
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val jar1 = TestUtils.createJarWithClasses(Seq("SparkSubmitClassA"))
> val jar2 = TestUtils.createJarWithClasses(Seq("SparkSubmitClassB"))
> val jar3 = HiveTestJars.getHiveContribJar.getCanonicalPath
> val jar4 = HiveTestJars.getHiveHcatalogCoreJar.getCanonicalPath
> val jarsString = Seq(jar1, jar2, jar3, jar4).map(j => 
> j.toString).mkString(",")
> val args = Seq(
>   "--class", SparkSubmitClassLoaderTest.getClass.getName.stripSuffix("$"),
>   "--name", "SparkSubmitClassLoaderTest",
>   "--master", "local-cluster[2,1,1024]",
>   "--conf", "spark.ui.enabled=false",
>   "--conf", "spark.master.rest.enabled=false",
>   "--conf", "spark.sql.hive.metastore.version=3.1.2",
>   "--conf", "spark.sql.hive.metastore.jars=maven",
>   "--driver-java-options", "-Dderby.system.durability=test",
>   "--jars", jarsString,
>   unusedJar.toString, "SparkSubmitClassA", "SparkSubmitClassB")
> runSparkSubmit(args)
>   }
> {code}
> Logs:
> {noformat}
> 2019-09-25 22:11:42.854 - stderr> 19/09/25 22:11:42 ERROR log: error in 
> initSerDe: java.lang.ClassNotFoundException Class 
> org.apache.hive.hcatalog.data.JsonSerDe not found
> 2019-09-25 22:11:42.854 - stderr> java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.data.JsonSerDe not found
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:84)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:289)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:271)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:663)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:646)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:898)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:937)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:539)
> 2019-09-25 22:11:42.854 - stderr> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:311)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:245)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:244)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:294)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:537)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:284)
> 2019-09-25 22:11:42.854 - stderr> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
> 2019-09-25 22:11:42.854 - stderr> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.s

[jira] [Commented] (SPARK-29207) Document LIST JAR in SQL Reference

2019-09-22 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935574#comment-16935574
 ] 

Sandeep Katta commented on SPARK-29207:
---

List jar or List files are part of list command, you can use any one of the 
jira to fix this

> Document LIST JAR in SQL Reference
> --
>
> Key: SPARK-29207
> URL: https://issues.apache.org/jira/browse/SPARK-29207
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29202) --driver-java-options are not passed to driver process in yarn client mode

2019-09-21 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-29202:
--
Description: 
Run the below command 

./bin/spark-sql --master yarn 
--driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address="

 
In Spark 2.3.3

/opt/softwares/Java/jdk1.8.0_211/bin/java -cp 
/opt/BigdataTools/spark-2.3.3-bin-hadoop2.7/conf/:/opt/BigdataTools/spark-2.3.3-bin-hadoop2.7/jars/*:/opt/BigdataTools/hadoop-3.2.0/etc/hadoop/
 -Xmx1g -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address= 
org.apache.spark.deploy.SparkSubmit --master yarn --conf 
spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=
 --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver spark-internal

 
In Spark 3.0

/opt/softwares/Java/jdk1.8.0_211/bin/java -cp 
/opt/apache/git/sparkSourceCode/spark/conf/:/opt/apache/git/sparkSourceCode/spark/assembly/target/scala-2.12/jars/*:/opt/BigdataTools/hadoop-3.2.0/etc/hadoop/
 org.apache.spark.deploy.SparkSubmit --master yarn --conf 
spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5556
 --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver spark-internal


We can see that java options are not passed to driver process in spark3

 

> --driver-java-options are not passed to driver process in yarn client mode
> --
>
> Key: SPARK-29202
> URL: https://issues.apache.org/jira/browse/SPARK-29202
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>
> Run the below command 
> ./bin/spark-sql --master yarn 
> --driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address="
>  
> In Spark 2.3.3
> /opt/softwares/Java/jdk1.8.0_211/bin/java -cp 
> /opt/BigdataTools/spark-2.3.3-bin-hadoop2.7/conf/:/opt/BigdataTools/spark-2.3.3-bin-hadoop2.7/jars/*:/opt/BigdataTools/hadoop-3.2.0/etc/hadoop/
>  -Xmx1g -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address= 
> org.apache.spark.deploy.SparkSubmit --master yarn --conf 
> spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=
>  --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver 
> spark-internal
>  
> In Spark 3.0
> /opt/softwares/Java/jdk1.8.0_211/bin/java -cp 
> /opt/apache/git/sparkSourceCode/spark/conf/:/opt/apache/git/sparkSourceCode/spark/assembly/target/scala-2.12/jars/*:/opt/BigdataTools/hadoop-3.2.0/etc/hadoop/
>  org.apache.spark.deploy.SparkSubmit --master yarn --conf 
> spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5556
>  --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver 
> spark-internal
> We can see that java options are not passed to driver process in spark3
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29202) --driver-java-options are not passed to driver process in yarn client mode

2019-09-21 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-29202:
-

 Summary: --driver-java-options are not passed to driver process in 
yarn client mode
 Key: SPARK-29202
 URL: https://issues.apache.org/jira/browse/SPARK-29202
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29180) drop database throws Exception

2019-09-21 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta resolved SPARK-29180.
---
Resolution: Invalid

User requried to run hive-txn-schema-2.3.0.mysql.sql and 
hive-txn-schema-2.3.0.mysql.sql  after upgrading to Hive 2.3.6

> drop database throws Exception 
> ---
>
> Key: SPARK-29180
> URL: https://issues.apache.org/jira/browse/SPARK-29180
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
> Attachments: DROP_DATABASE_Exception.png, 
> image-2019-09-22-09-32-57-532.png
>
>
> drop database throwing Exception but result is success.
>  
> : jdbc:hive2://10.18.19.208:23040/default> show databases;
> +-+
> |  databaseName   |
> +-+
> | db1 |
> | db2 |
> | default |
> | func    |
> | gloablelimit    |
> | jointesthll |
> | sparkdb__   |
> | temp_func_test  |
> | test1   |
> +-+
> 9 rows selected (0.131 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> drop database test1 cascade;
> *Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Unable to clean up 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'sparksql.TXN_COMPONENTS' doesn' exist*
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>     at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>     at com.mysql.jdbc.Util.getInstance(Util.java:383)
>     at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
>     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
>     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
>     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1759)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679)
>     at 
> com.jolbox.bonecp.StatementHandle.executeUpdate(StatementHandle.java:497)
>     at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:1888)
>     at 
> org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:51)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$13.notify(MetaStoreListenerNotifier.java:69)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:235)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1139)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1175)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>     at com.sun.proxy.$Proxy31.drop_database(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:868)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy32.dropDatabase(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata

[jira] [Commented] (SPARK-29180) drop database throws Exception

2019-09-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935201#comment-16935201
 ] 

Sandeep Katta commented on SPARK-29180:
---

[~abhishek.akg] I have verified this after I run the above mentioned scripts

 

!image-2019-09-22-09-32-57-532.png!

 

can we close this is as invalid ?

> drop database throws Exception 
> ---
>
> Key: SPARK-29180
> URL: https://issues.apache.org/jira/browse/SPARK-29180
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: DROP_DATABASE_Exception.png, 
> image-2019-09-22-09-32-57-532.png
>
>
> drop database throwing Exception but result is success.
>  
> : jdbc:hive2://10.18.19.208:23040/default> show databases;
> +-+
> |  databaseName   |
> +-+
> | db1 |
> | db2 |
> | default |
> | func    |
> | gloablelimit    |
> | jointesthll |
> | sparkdb__   |
> | temp_func_test  |
> | test1   |
> +-+
> 9 rows selected (0.131 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> drop database test1 cascade;
> *Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Unable to clean up 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'sparksql.TXN_COMPONENTS' doesn' exist*
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>     at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>     at com.mysql.jdbc.Util.getInstance(Util.java:383)
>     at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
>     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
>     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
>     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1759)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679)
>     at 
> com.jolbox.bonecp.StatementHandle.executeUpdate(StatementHandle.java:497)
>     at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:1888)
>     at 
> org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:51)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$13.notify(MetaStoreListenerNotifier.java:69)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:235)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1139)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1175)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>     at com.sun.proxy.$Proxy31.drop_database(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:868)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy32.dropDatabas

[jira] [Updated] (SPARK-29180) drop database throws Exception

2019-09-21 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-29180:
--
Attachment: image-2019-09-22-09-32-57-532.png

> drop database throws Exception 
> ---
>
> Key: SPARK-29180
> URL: https://issues.apache.org/jira/browse/SPARK-29180
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: DROP_DATABASE_Exception.png, 
> image-2019-09-22-09-32-57-532.png
>
>
> drop database throwing Exception but result is success.
>  
> : jdbc:hive2://10.18.19.208:23040/default> show databases;
> +-+
> |  databaseName   |
> +-+
> | db1 |
> | db2 |
> | default |
> | func    |
> | gloablelimit    |
> | jointesthll |
> | sparkdb__   |
> | temp_func_test  |
> | test1   |
> +-+
> 9 rows selected (0.131 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> drop database test1 cascade;
> *Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Unable to clean up 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'sparksql.TXN_COMPONENTS' doesn' exist*
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>     at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>     at com.mysql.jdbc.Util.getInstance(Util.java:383)
>     at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
>     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
>     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
>     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1759)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679)
>     at 
> com.jolbox.bonecp.StatementHandle.executeUpdate(StatementHandle.java:497)
>     at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:1888)
>     at 
> org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:51)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$13.notify(MetaStoreListenerNotifier.java:69)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:235)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1139)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1175)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>     at com.sun.proxy.$Proxy31.drop_database(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:868)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy32.dropDatabase(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:484)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.

[jira] [Commented] (SPARK-29180) drop database throws Exception

2019-09-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935199#comment-16935199
 ] 

Sandeep Katta commented on SPARK-29180:
---

ahhh !!!, [~abhishek.akg] after you upgrade to Hive 2.3.5, you need to run the 
following scripts

 

[https://github.com/apache/hive/blob/rel/release-2.3.6/metastore/scripts/upgrade/mysql/hive-schema-2.3.0.mysql.sql]
 and

[https://github.com/apache/hive/blob/rel/release-2.3.6/metastore/scripts/upgrade/mysql/hive-txn-schema-2.3.0.mysql.sql]

 

Please run these commands and check again

 

> drop database throws Exception 
> ---
>
> Key: SPARK-29180
> URL: https://issues.apache.org/jira/browse/SPARK-29180
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: DROP_DATABASE_Exception.png
>
>
> drop database throwing Exception but result is success.
>  
> : jdbc:hive2://10.18.19.208:23040/default> show databases;
> +-+
> |  databaseName   |
> +-+
> | db1 |
> | db2 |
> | default |
> | func    |
> | gloablelimit    |
> | jointesthll |
> | sparkdb__   |
> | temp_func_test  |
> | test1   |
> +-+
> 9 rows selected (0.131 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> drop database test1 cascade;
> *Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Unable to clean up 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'sparksql.TXN_COMPONENTS' doesn' exist*
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>     at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>     at com.mysql.jdbc.Util.getInstance(Util.java:383)
>     at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
>     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
>     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
>     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1759)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679)
>     at 
> com.jolbox.bonecp.StatementHandle.executeUpdate(StatementHandle.java:497)
>     at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:1888)
>     at 
> org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:51)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$13.notify(MetaStoreListenerNotifier.java:69)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:235)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1139)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1175)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>     at com.sun.proxy.$Proxy31.drop_database(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:868)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.

[jira] [Comment Edited] (SPARK-29180) drop database throws Exception

2019-09-21 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935194#comment-16935194
 ] 

Sandeep Katta edited comment on SPARK-29180 at 9/22/19 3:12 AM:


[~dongjoon] I found the reason but thinking it is requried to fix in spark or 
Hive

As per jira description database used in MYSQL

Hive code

https://github.com/apache/hive/blob/rel/release-2.3.6/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L1897
Hive checked for *"does not exist"* but MYSQL thrown exception with  *"doesn't 
exist"*


was (Author: sandeep.katta2007):
[~dongjoon] I found the reason but thinking it is requried to fix in spark or 
Hive

As per jira description database used in MYSQL

Hive code

https://github.com/apache/hive/blob/rel/release-2.3.6/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L1897
Hive checked for* "does not exist"* but MYSQL thrown exception with  *"doesn't 
exist"*

> drop database throws Exception 
> ---
>
> Key: SPARK-29180
> URL: https://issues.apache.org/jira/browse/SPARK-29180
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: DROP_DATABASE_Exception.png
>
>
> drop database throwing Exception but result is success.
>  
> : jdbc:hive2://10.18.19.208:23040/default> show databases;
> +-+
> |  databaseName   |
> +-+
> | db1 |
> | db2 |
> | default |
> | func    |
> | gloablelimit    |
> | jointesthll |
> | sparkdb__   |
> | temp_func_test  |
> | test1   |
> +-+
> 9 rows selected (0.131 seconds)
> 0: jdbc:hive2://10.18.19.208:23040/default> drop database test1 cascade;
> *Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Unable to clean up 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'sparksql.TXN_COMPONENTS' doesn' exist*
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>     at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>     at com.mysql.jdbc.Util.getInstance(Util.java:383)
>     at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
>     at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
>     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
>     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
>     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1759)
>     at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679)
>     at 
> com.jolbox.bonecp.StatementHandle.executeUpdate(StatementHandle.java:497)
>     at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:1888)
>     at 
> org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:51)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$13.notify(MetaStoreListenerNotifier.java:69)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:235)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1139)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1175)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>     at com.sun.proxy.$Proxy31.drop_database(U

1 2 3 >

1 - 100 of 206 matches

Mail list logo