[jira] [Commented] (SPARK-41232) High-order function: array_append

2022-11-30 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641272#comment-17641272
 ] 

Senthil Kumar commented on SPARK-41232:
---

[~podongfeng] Shall I work on this?

> High-order function: array_append
> -
>
> Key: SPARK-41232
> URL: https://issues.apache.org/jira/browse/SPARK-41232
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_append.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40367) Total size of serialized results of 3730 tasks (64.0 GB) is bigger than spark.driver.maxResultSize (64.0 GB)

2022-09-18 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606414#comment-17606414
 ] 

Senthil Kumar commented on SPARK-40367:
---

Hi [~jackyjfhu] 

 

Check if you are sending bytes/rows which are more than 
"spark.driver.maxResultSize". If so, you need to keep increasing 
"spark.driver.maxResultSize" untill it is fixing this issue. But while 
increasing spark.driver.maxResultSize you should be careful that it should not 
exceed driver-memory.

 

_Note: driver-memory > spark.driver.maxResultSize > row/bytes sent to driver_

>  Total size of serialized results of 3730 tasks (64.0 GB) is bigger than 
> spark.driver.maxResultSize (64.0 GB)
> -
>
> Key: SPARK-40367
> URL: https://issues.apache.org/jira/browse/SPARK-40367
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: jackyjfhu
>Priority: Blocker
>
>  I use this 
> code:spark.sql("xx").selectExpr(spark.table(target).columns:_*).write.mode("overwrite").insertInto(target),I
>  get an error
>  
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Total size of serialized results of 3730 tasks (64.0 GB) is bigger than 
> spark.driver.maxResultSize (64.0 GB)
>     at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1609)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1597)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1596)
>     at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>     at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1596)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
>     at scala.Option.foreach(Option.scala:257)
>     at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1830)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1779)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1768)
>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
>     at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
>     at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:304)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:73)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:97)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
>     at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>     at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> --conf spark.driver.maxResultSize=64g
> --conf spark.sql.broadcastTimeout=36000
> -conf spark.sql.autoBroadcastJoinThreshold=204857600 
> --conf spark.memory.offHeap.enabled=true
> --conf spark.memory.offHeap.size=4g
> --num-exe

[jira] [Commented] (SPARK-38213) support Metrics information report to kafkaSink.

2022-02-14 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492391#comment-17492391
 ] 

Senthil Kumar commented on SPARK-38213:
---

Working on this

> support Metrics information report to kafkaSink.
> 
>
> Key: SPARK-38213
> URL: https://issues.apache.org/jira/browse/SPARK-38213
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: YuanGuanhu
>Priority: Major
>
> Spark now support ConsoleSink/CsvSink/GraphiteSink/JmxSink etc. Now we want 
> report metrics information to kafka, we can work to support this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37936) Use error classes in the parsing errors of intervals

2022-01-21 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480099#comment-17480099
 ] 

Senthil Kumar commented on SPARK-37936:
---

[~maxgekk] ,  I have queries which matches 

 
 * invalidIntervalFormError - "SELECT INTERVAL '1 DAY 2' HOUR"

 * fromToIntervalUnsupportedError - "SELECT extract(MONTH FROM INTERVAL 
'2021-11' YEAR TO DAY)"

it will be helpful if you share queries for below scenarios,
 * moreThanOneFromToUnitInIntervalLiteralError
 * invalidIntervalLiteralError

 * invalidFromToUnitValueError
 * mixedIntervalUnitsError

> Use error classes in the parsing errors of intervals
> 
>
> Key: SPARK-37936
> URL: https://issues.apache.org/jira/browse/SPARK-37936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Modify the following methods in QueryParsingErrors:
>  * moreThanOneFromToUnitInIntervalLiteralError
>  * invalidIntervalLiteralError
>  * invalidIntervalFormError
>  * invalidFromToUnitValueError
>  * fromToIntervalUnsupportedError
>  * mixedIntervalUnitsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37944) Use error classes in the execution errors of casting

2022-01-17 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477551#comment-17477551
 ] 

Senthil Kumar commented on SPARK-37944:
---

I will work on this

> Use error classes in the execution errors of casting
> 
>
> Key: SPARK-37944
> URL: https://issues.apache.org/jira/browse/SPARK-37944
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * failedToCastValueToDataTypeForPartitionColumnError
> * invalidInputSyntaxForNumericError
> * cannotCastToDateTimeError
> * invalidInputSyntaxForBooleanError
> * nullLiteralsCannotBeCastedError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-01-17 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477550#comment-17477550
 ] 

Senthil Kumar commented on SPARK-37945:
---

I will work on this

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37940) Use error classes in the compilation errors of partitions

2022-01-17 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477549#comment-17477549
 ] 

Senthil Kumar commented on SPARK-37940:
---

I will work on this

> Use error classes in the compilation errors of partitions
> -
>
> Key: SPARK-37940
> URL: https://issues.apache.org/jira/browse/SPARK-37940
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * unsupportedIfNotExistsError
> * nonPartitionColError
> * missingStaticPartitionColumn
> * alterV2TableSetLocationWithPartitionNotSupportedError
> * invalidPartitionSpecError
> * partitionNotSpecifyLocationUriError
> * describeDoesNotSupportPartitionForV2TablesError
> * tableDoesNotSupportPartitionManagementError
> * tableDoesNotSupportAtomicPartitionManagementError
> * alterTableRecoverPartitionsNotSupportedForV2TablesError
> * partitionColumnNotSpecifiedError
> * invalidPartitionColumnError
> * multiplePartitionColumnValuesSpecifiedError
> * cannotUseDataTypeForPartitionColumnError
> * cannotUseAllColumnsForPartitionColumnsError
> * partitionColumnNotFoundInSchemaError
> * mismatchedTablePartitionColumnError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37939) Use error classes in the parsing errors of properties

2022-01-17 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477548#comment-17477548
 ] 

Senthil Kumar commented on SPARK-37939:
---

I will work on this

> Use error classes in the parsing errors of properties
> -
>
> Key: SPARK-37939
> URL: https://issues.apache.org/jira/browse/SPARK-37939
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryParsingErrors:
> * cannotCleanReservedNamespacePropertyError
> * cannotCleanReservedTablePropertyError
> * invalidPropertyKeyForSetQuotedConfigurationError
> * invalidPropertyValueForSetQuotedConfigurationError
> * propertiesAndDbPropertiesBothSpecifiedError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37936) Use error classes in the parsing errors of intervals

2022-01-17 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477544#comment-17477544
 ] 

Senthil Kumar commented on SPARK-37936:
---

Working on this

> Use error classes in the parsing errors of intervals
> 
>
> Key: SPARK-37936
> URL: https://issues.apache.org/jira/browse/SPARK-37936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Modify the following methods in QueryParsingErrors:
>  * moreThanOneFromToUnitInIntervalLiteralError
>  * invalidIntervalLiteralError
>  * invalidIntervalFormError
>  * invalidFromToUnitValueError
>  * fromToIntervalUnsupportedError
>  * mixedIntervalUnitsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428141#comment-17428141
 ] 

Senthil Kumar commented on SPARK-36996:
---

Sample Output after this changes:

SQL :

mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
varchar(255), Age int);

 

mysql> desc Persons;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| Id | int | NO | | NULL | |
| FirstName | varchar(255) | YES | | NULL | |
| LastName | varchar(255) | YES | | NULL | |
| Age | int | YES | | NULL | |
+---+--+--+-+-+---+

--++---+++

Spark:

scala> val df = 
spark.read.format("jdbc").option("database","Test_DB").option("user", 
"root").option("password", "").option("driver", 
"com.mysql.cj.jdbc.Driver").option("url", 
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
 df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
fields]

scala> df.printSchema()
 root
 |-- Id: integer (nullable = false)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Age: integer (nullable = true)

 

 

And for TIMESTAMP columns

 

SQL:
create table timestamp_test(id int(11), time_stamp timestamp not null default 
current_timestamp);

SPARK:

scala> val df = 
spark.read.format("jdbc").option("database","Test_DB").option("user", 
"root").option("password", "").option("driver", 
"com.mysql.cj.jdbc.Driver").option("url", 
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", 
"timestamp_test").load()
df: org.apache.spark.sql.DataFrame = [id: int, time_stamp: timestamp]

scala> df.printSchema()
root
|-- id: integer (nullable = true)
|-- time_stamp: timestamp (nullable = true)

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428140#comment-17428140
 ] 

Senthil Kumar commented on SPARK-36996:
---

We need to consider 2 scenarios

 
 # maintain NULLABLE value as per SQL metadata for non timestamp columns
 # set NULLABLE as true(always) for timestamp columns

 

 

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428104#comment-17428104
 ] 

Senthil Kumar commented on SPARK-36996:
---

Based on further analysis, Spark is hard-coding "nullable" as "true" always. 
This change has been inccluded due to 
"https://issues.apache.org/jira/browse/SPARK-19726";.

 

 

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428105#comment-17428105
 ] 

Senthil Kumar commented on SPARK-36996:
---

I m working on this

> fixing "SQL column nullable setting not retained as part of spark read" issue
> -
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while 
> reading from Spark read using jdbc format.
>  
> SQL :
> 
>  
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
>  
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
> But in Spark  we get all the columns as "Nullable":
> =
> scala> val df = 
> spark.read.format("jdbc").option("database","Test_DB").option("user", 
> "root").option("password", "").option("driver", 
> "com.mysql.cj.jdbc.Driver").option("url", 
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
> fields]
> scala> df.printSchema()
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
> =
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36996) fixing "SQL column nullable setting not retained as part of spark read" issue

2021-10-13 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36996:
-

 Summary: fixing "SQL column nullable setting not retained as part 
of spark read" issue
 Key: SPARK-36996
 URL: https://issues.apache.org/jira/browse/SPARK-36996
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0
Reporter: Senthil Kumar


Sql 'nullable' columns are not retaining 'nullable' type as it is while reading 
from Spark read using jdbc format.

 

SQL :



 

mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
varchar(255), Age int);

 

mysql> desc Persons;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| Id | int | NO | | NULL | |
| FirstName | varchar(255) | YES | | NULL | |
| LastName | varchar(255) | YES | | NULL | |
| Age | int | YES | | NULL | |
+---+--+--+-+-+---+

 

But in Spark  we get all the columns as "Nullable":

=

scala> val df = 
spark.read.format("jdbc").option("database","Test_DB").option("user", 
"root").option("password", "").option("driver", 
"com.mysql.cj.jdbc.Driver").option("url", 
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more 
fields]

scala> df.printSchema()
root
 |-- Id: integer (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Age: integer (nullable = true)

=

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36238) Spark UI load event timeline too slow for huge stage

2021-10-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423419#comment-17423419
 ] 

Senthil Kumar commented on SPARK-36238:
---

[~angerszhuuu] Did you try increasing heap memory for Spark History Server?

> Spark UI  load event timeline too slow for huge stage
> -
>
> Key: SPARK-36238
> URL: https://issues.apache.org/jira/browse/SPARK-36238
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36901) ERROR exchange.BroadcastExchangeExec: Could not execute broadcast in 300 secs

2021-10-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423409#comment-17423409
 ] 

Senthil Kumar commented on SPARK-36901:
---

[~rangareddy.av...@gmail.com]

It looks like normal behaviour of Spark. Due to Spark's lazy behaviour, it 
tries to execute "BroadcastExchangeExec" and then it finds that there are lack 
of resources in cluster and hence throws WARN messages and then wait for 300s 
and then throws ERROR messages stating that ""BroadcastExchangeExec" timeout.

> ERROR exchange.BroadcastExchangeExec: Could not execute broadcast in 300 secs
> -
>
> Key: SPARK-36901
> URL: https://issues.apache.org/jira/browse/SPARK-36901
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Ranga Reddy
>Priority: Major
>
> While running Spark application, if there are no further resources to launch 
> executors, Spark application is failed after 5 mins with below exception.
> {code:java}
> 21/09/24 06:12:45 WARN cluster.YarnScheduler: Initial job has not accepted 
> any resources; check your cluster UI to ensure that workers are registered 
> and have sufficient resources
> ...
> 21/09/24 06:17:29 ERROR exchange.BroadcastExchangeExec: Could not execute 
> broadcast in 300 secs.
> java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
> ...
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [300 seconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:146)
>   ... 71 more
> 21/09/24 06:17:30 INFO spark.SparkContext: Invoking stop() from shutdown hook
> {code}
> *Expectation* should be either needs to be throw proper exception saying 
> *"there are no further to resources to run the application"* or it needs to 
> be *"wait till it get resources"*.
> To reproduce the issue we have used following sample code.
> *PySpark Code (test_broadcast_timeout.py):*
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName("Test Broadcast Timeout").getOrCreate()
> t1 = spark.range(5)
> t2 = spark.range(5)
> q = t1.join(t2,t1.id == t2.id)
> q.explain
> q.show(){code}
> *Spark Submit Command:*
> {code:java}
> spark-submit --executor-memory 512M test_broadcast_timeout.py{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-10-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423402#comment-17423402
 ] 

Senthil Kumar edited comment on SPARK-36861 at 10/1/21, 7:24 PM:
-

Yes in Spark 3.3, hour column is created as "DateType" but I could see hour 
part in subdirs created

===

Spark session available as 'spark'.
 Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/_,_/_/ /_/_\ version 3.3.0-SNAPSHOT
 /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
 Type in expressions to have them evaluated.
 Type :help for more information.

scala> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), 
("2021-01-01T02", 2)).toDF("hour", "i")
 df: org.apache.spark.sql.DataFrame = [hour: string, i: int]

scala> df.write.partitionBy("hour").parquet("/tmp/t1")

scala> spark.read.parquet("/tmp/t1").schema
 res1: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,DateType,true))

scala>

===

 

and subdirs created are

===

ls -l
 total 0
 -rw-r--r-- 1 senthilkumar wheel 0 Oct 2 00:44 _SUCCESS
 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T00
 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T01
 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T02

===

 

It will be helpful if you share the list of sub-dirs created in your case.


was (Author: senthh):
Yes in Spark 3.3 hour column is created as "DateType" but I could see hour part 
in subdirs created

===

Spark session available as 'spark'.
Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 3.3.0-SNAPSHOT
 /_/
 
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), 
("2021-01-01T02", 2)).toDF("hour", "i")
df: org.apache.spark.sql.DataFrame = [hour: string, i: int]

scala> df.write.partitionBy("hour").parquet("/tmp/t1")
 
scala> spark.read.parquet("/tmp/t1").schema
res1: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,DateType,true))

scala>

===

 

and subdirs created are

===

ls -l
total 0
-rw-r--r-- 1 senthilkumar wheel 0 Oct 2 00:44 _SUCCESS
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T00
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T01
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T02

===

 

It will be helpful if you share the list of sub-dirs created in your case.

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Blocker
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-10-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423402#comment-17423402
 ] 

Senthil Kumar commented on SPARK-36861:
---

Yes in Spark 3.3 hour column is created as "DateType" but I could see hour part 
in subdirs created

===

Spark session available as 'spark'.
Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 3.3.0-SNAPSHOT
 /_/
 
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), 
("2021-01-01T02", 2)).toDF("hour", "i")
df: org.apache.spark.sql.DataFrame = [hour: string, i: int]

scala> df.write.partitionBy("hour").parquet("/tmp/t1")
 
scala> spark.read.parquet("/tmp/t1").schema
res1: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,DateType,true))

scala>

===

 

and subdirs created are

===

ls -l
total 0
-rw-r--r-- 1 senthilkumar wheel 0 Oct 2 00:44 _SUCCESS
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T00
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T01
drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T02

===

 

It will be helpful if you share the list of sub-dirs created in your case.

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Blocker
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-28 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421328#comment-17421328
 ] 

Senthil Kumar commented on SPARK-36861:
---

[~tanelk] This is issue not reproducable even in 3.1.2

 

root
 |-- i: integer (nullable = true)
 |-- hour: string (nullable = true)

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Blocker
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36781) The log could not get the correct line number

2021-09-28 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421281#comment-17421281
 ] 

Senthil Kumar commented on SPARK-36781:
---

[~chenxusheng] Could you please share the sample code to simulate this issue?

> The log could not get the correct line number
> -
>
> Key: SPARK-36781
> URL: https://issues.apache.org/jira/browse/SPARK-36781
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6, 3.0.3, 3.1.2
>Reporter: chenxusheng
>Priority: Major
>
> INFO 18:16:46 [Thread-1] 
> org.apache.spark.internal.Logging$class.logInfo({color:#FF}Logging.scala:54{color})
>  MemoryStore cleared
>  INFO 18:16:46 [Thread-1] 
> org.apache.spark.internal.Logging$class.logInfo({color:#FF}Logging.scala:54{color})
>  BlockManager stopped
>  INFO 18:16:46 [Thread-1] 
> org.apache.spark.internal.Logging$class.logInfo({color:#FF}Logging.scala:54{color})
>  BlockManagerMaster stopped
>  INFO 18:16:46 [dispatcher-event-loop-0] 
> org.apache.spark.internal.Logging$class.logInfo({color:#FF}Logging.scala:54{color})
>  OutputCommitCoordinator stopped!
>  INFO 18:16:46 [Thread-1] 
> org.apache.spark.internal.Logging$class.logInfo({color:#FF}Logging.scala:54{color})
>  Successfully stopped SparkContext
>  INFO 18:16:46 [Thread-1] 
> org.apache.spark.internal.Logging$class.logInfo({color:#FF}Logging.scala:54{color})
>  Shutdown hook called
> all are : {color:#FF}Logging.scala:54{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36801) Document change for Spark sql jdbc

2021-09-19 Thread Senthil Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthil Kumar updated SPARK-36801:
--
Description: 
Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
changes "non nullable" columns to "nullable".

 

For example:

mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
varchar(255), Age int);
Query OK, 0 rows affected (0.04 sec)

mysql> show tables;
+---+
| Tables_in_test_db |
+---+
| Persons |
+---+
1 row in set (0.00 sec)

mysql> desc Persons;
+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| Id | int | NO | | NULL | |
| FirstName | varchar(255) | YES | | NULL | |
| LastName | varchar(255) | YES | | NULL | |
| Age | int | YES | | NULL | |
+---+--+--+-+-+---+

 

 

{color:#cc7832}val {color}df = spark.read.format({color:#6a8759}"jdbc"{color})
 
.option({color:#6a8759}"database"{color}{color:#cc7832},{color}{color:#6a8759}"Test_DB"{color})
 .option({color:#6a8759}"user"{color}{color:#cc7832}, 
{color}{color:#6a8759}"root"{color})
 .option({color:#6a8759}"password"{color}{color:#cc7832}, 
{color}{color:#6a8759}""{color})
 .option({color:#6a8759}"driver"{color}{color:#cc7832}, 
{color}{color:#6a8759}"com.mysql.cj.jdbc.Driver"{color})
 .option({color:#6a8759}"url"{color}{color:#cc7832}, 
{color}{color:#6a8759}"jdbc:mysql://localhost:3306/Test_DB"{color})
 .option({color:#6a8759}"query"{color}{color:#cc7832}, 
{color}{color:#6a8759}"(select * from Persons)"{color})
 .load()
df.printSchema()

 

*output:*

 

root
 |-- Id: integer (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Age: integer (nullable = true)

 

 

So we need to add a note, in Documentation[1], "All columns are automatically 
converted to be nullable for compatibility reasons."

 Ref:

[1 
][https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#jdbc-to-other-databases]

 

 

  was:
Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
changes "non nullable" columns to "nullable".

So we need to add a note, in Documentation[1], "All columns are automatically 
converted to be nullable for compatibility reasons."

 

[1 
]https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#jdbc-to-other-databases


> Document change for Spark sql jdbc
> --
>
> Key: SPARK-36801
> URL: https://issues.apache.org/jira/browse/SPARK-36801
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Trivial
>
> Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
> changes "non nullable" columns to "nullable".
>  
> For example:
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName 
> varchar(255), Age int);
> Query OK, 0 rows affected (0.04 sec)
> mysql> show tables;
> +---+
> | Tables_in_test_db |
> +---+
> | Persons |
> +---+
> 1 row in set (0.00 sec)
> mysql> desc Persons;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +---+--+--+-+-+---+
>  
>  
> {color:#cc7832}val {color}df = spark.read.format({color:#6a8759}"jdbc"{color})
>  
> .option({color:#6a8759}"database"{color}{color:#cc7832},{color}{color:#6a8759}"Test_DB"{color})
>  .option({color:#6a8759}"user"{color}{color:#cc7832}, 
> {color}{color:#6a8759}"root"{color})
>  .option({color:#6a8759}"password"{color}{color:#cc7832}, 
> {color}{color:#6a8759}""{color})
>  .option({color:#6a8759}"driver"{color}{color:#cc7832}, 
> {color}{color:#6a8759}"com.mysql.cj.jdbc.Driver"{color})
>  .option({color:#6a8759}"url"{color}{color:#cc7832}, 
> {color}{color:#6a8759}"jdbc:mysql://localhost:3306/Test_DB"{color})
>  .option({color:#6a8759}"query"{color}{color:#cc7832}, 
> {color}{color:#6a8759}"(select * from Persons)"{color})
>  .load()
> df.printSchema()
>  
> *output:*
>  
> root
>  |-- Id: integer (nullable = true)
>  |-- FirstName: string (nullable = true)
>  |-- LastName: string (nullable = true)
>  |-- Age: integer (nullable = true)
>  
>  
> So we need to add a note, in Documentation[1], "All columns are automatically 
> converted to be nullable for compatibility reasons."
>  Ref:
> [1 
> ][https://spar

[jira] [Updated] (SPARK-36801) Document change for Spark sql jdbc

2021-09-19 Thread Senthil Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthil Kumar updated SPARK-36801:
--
Description: 
Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
changes "non nullable" columns to "nullable".

So we need to add a note, in Documentation[1], "All columns are automatically 
converted to be nullable for compatibility reasons."

 

[1 
]https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#jdbc-to-other-databases

  was:
Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
changes "non nullable" columns to "nullable".

So we need to add a note, in Documentation[1], "{color:#a9b7c6}All columns are 
automatically converted to be nullable for compatibility reasons.{color}"

 

[1 
]https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#jdbc-to-other-databases


> Document change for Spark sql jdbc
> --
>
> Key: SPARK-36801
> URL: https://issues.apache.org/jira/browse/SPARK-36801
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Senthil Kumar
>Priority: Trivial
>
> Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
> changes "non nullable" columns to "nullable".
> So we need to add a note, in Documentation[1], "All columns are automatically 
> converted to be nullable for compatibility reasons."
>  
> [1 
> ]https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#jdbc-to-other-databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36801) Document change for Spark sql jdbc

2021-09-19 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36801:
-

 Summary: Document change for Spark sql jdbc
 Key: SPARK-36801
 URL: https://issues.apache.org/jira/browse/SPARK-36801
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.0
Reporter: Senthil Kumar


Reading using Spark SQL jdbc DataSource  does not maintain nullable type and 
changes "non nullable" columns to "nullable".

So we need to add a note, in Documentation[1], "{color:#a9b7c6}All columns are 
automatically converted to be nullable for compatibility reasons.{color}"

 

[1 
]https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#jdbc-to-other-databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36743) Backporting SPARK-36327 changes into Spark 2.4 version

2021-09-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414746#comment-17414746
 ] 

Senthil Kumar commented on SPARK-36743:
---

[~hyukjin.kwon], [~dongjoon]. Thanks for the kind and immediate response on 
this.

> Backporting SPARK-36327 changes into Spark 2.4 version
> --
>
> Key: SPARK-36743
> URL: https://issues.apache.org/jira/browse/SPARK-36743
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Senthil Kumar
>Priority: Minor
>
> Could we back port changes merged by PR 
> [https://github.com/apache/spark/pull/33577]  into Spark 2.4 too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36743) Backporting SPARK-36327 changes into Spark 2.4 version

2021-09-13 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414242#comment-17414242
 ] 

Senthil Kumar commented on SPARK-36743:
---

[~hyukjin.kwon], [~dongjoon]

> Backporting SPARK-36327 changes into Spark 2.4 version
> --
>
> Key: SPARK-36743
> URL: https://issues.apache.org/jira/browse/SPARK-36743
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Senthil Kumar
>Priority: Minor
> Fix For: 3.3.0
>
>
> Could we back port changes merged by PR 
> [https://github.com/apache/spark/pull/33577]  into Spark 2.4 too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36743) Backporting SPARK-36327 changes into Spark 2.4 version

2021-09-13 Thread Senthil Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthil Kumar updated SPARK-36743:
--
Summary: Backporting SPARK-36327 changes into Spark 2.4 version  (was: 
Backporting changes into Spark 2.4 version)

> Backporting SPARK-36327 changes into Spark 2.4 version
> --
>
> Key: SPARK-36743
> URL: https://issues.apache.org/jira/browse/SPARK-36743
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Senthil Kumar
>Priority: Minor
> Fix For: 3.3.0
>
>
> Could we back port changes merged by PR 
> [https://github.com/apache/spark/pull/33577]  into Spark 2.4 too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36743) Backporting changes into Spark 2.4 version

2021-09-13 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36743:
-

 Summary: Backporting changes into Spark 2.4 version
 Key: SPARK-36743
 URL: https://issues.apache.org/jira/browse/SPARK-36743
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Senthil Kumar
 Fix For: 3.3.0


Could we back port changes merged by PR 
[https://github.com/apache/spark/pull/33577]  into Spark 2.4 too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-35623) Volcano resource manager for Spark on Kubernetes

2021-09-02 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408750#comment-17408750
 ] 

Senthil Kumar edited comment on SPARK-35623 at 9/2/21, 11:56 AM:
-

[~dipanjanK] Include me too pls.

mail id: senthissen...@gmail.com


was (Author: senthh):
[~dipanjanK] Include me too pls

> Volcano resource manager for Spark on Kubernetes
> 
>
> Key: SPARK-35623
> URL: https://issues.apache.org/jira/browse/SPARK-35623
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2
>Reporter: Dipanjan Kailthya
>Priority: Minor
>  Labels: kubernetes, resourcemanager
>
> Dear Spark Developers, 
>   
>  Hello from the Netherlands! Posting this here as I still haven't gotten 
> accepted to post in the spark dev mailing list.
>   
>  My team is planning to use spark with Kubernetes support on our shared 
> (multi-tenant) on premise Kubernetes cluster. However we would like to have 
> certain scheduling features like fair-share and preemption which as we 
> understand are not built into the current spark-kubernetes resource manager 
> yet. We have been working on and are close to a first successful prototype 
> integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means 
> a new resource manager component with lots in common with existing 
> spark-kubernetes resource manager, but instead of pods it launches Volcano 
> jobs which delegate the driver and executor pod creation and lifecycle 
> management to Volcano. We are interested in contributing this to open source, 
> either directly in spark or as a separate project.
>   
>  So, two questions: 
>   
>  1. Do the spark maintainers see this as a valuable contribution to the 
> mainline spark codebase? If so, can we have some guidance on how to publish 
> the changes? 
>   
>  2. Are any other developers / organizations interested to contribute to this 
> effort? If so, please get in touch.
>   
>  Best,
>  Dipanjan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes

2021-09-02 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408750#comment-17408750
 ] 

Senthil Kumar commented on SPARK-35623:
---

[~dipanjanK] Include me too pls

> Volcano resource manager for Spark on Kubernetes
> 
>
> Key: SPARK-35623
> URL: https://issues.apache.org/jira/browse/SPARK-35623
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2
>Reporter: Dipanjan Kailthya
>Priority: Minor
>  Labels: kubernetes, resourcemanager
>
> Dear Spark Developers, 
>   
>  Hello from the Netherlands! Posting this here as I still haven't gotten 
> accepted to post in the spark dev mailing list.
>   
>  My team is planning to use spark with Kubernetes support on our shared 
> (multi-tenant) on premise Kubernetes cluster. However we would like to have 
> certain scheduling features like fair-share and preemption which as we 
> understand are not built into the current spark-kubernetes resource manager 
> yet. We have been working on and are close to a first successful prototype 
> integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means 
> a new resource manager component with lots in common with existing 
> spark-kubernetes resource manager, but instead of pods it launches Volcano 
> jobs which delegate the driver and executor pod creation and lifecycle 
> management to Volcano. We are interested in contributing this to open source, 
> either directly in spark or as a separate project.
>   
>  So, two questions: 
>   
>  1. Do the spark maintainers see this as a valuable contribution to the 
> mainline spark codebase? If so, can we have some guidance on how to publish 
> the changes? 
>   
>  2. Are any other developers / organizations interested to contribute to this 
> effort? If so, please get in touch.
>   
>  Best,
>  Dipanjan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36643) Add more information in ERROR log while SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set

2021-09-01 Thread Senthil Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthil Kumar updated SPARK-36643:
--
Component/s: SQL

> Add more information in ERROR log while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is set
> --
>
> Key: SPARK-36643
> URL: https://issues.apache.org/jira/browse/SPARK-36643
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Right now, by default sql.legacy.setCommandRejectsSparkCoreConfs is set as 
> true in Spark 3.* versions int order to avoid changing Spark Confs. But from 
> the error message we get confused if we can not modify/change Spark conf in 
> Spark 3.* or not.
> Current Error Message :
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> modify the value of a Spark config: spark.driver.host
>  at 
> org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:156)
>  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:40){code}
>  
> So adding little more information( how to modify Spark Conf), in ERROR log 
> while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is 'true', will be helpful 
> to avoid confusions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36643) Add more information in ERROR log while SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set

2021-09-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408282#comment-17408282
 ] 

Senthil Kumar edited comment on SPARK-36643 at 9/1/21, 5:20 PM:


New ERROR message will be as below,

 
{code:java}
scala> spark.conf.set("spark.driver.host", "localhost") 
org.apache.spark.sql.AnalysisException: Cannot modify the value of a Spark 
config: spark.driver.host, please set 
spark.sql.legacy.setCommandRejectsSparkCoreConfs as 'false' in order to make 
change value of Spark config: spark.driver.host .
 at 
org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfSparkConfigError(QueryCompilationErrors.scala:2336)
 at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:157)
 at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:41)
... 47 elided{code}
 


was (Author: senthh):
New ERROR message will be as below,

 
{code:java}
scala> spark.conf.set("spark.driver.host", "localhost") 
org.apache.spark.sql.AnalysisException: Cannot modify the value of a Spark 
config: spark.driver.host, please set 
spark.sql.legacy.setCommandRejectsSparkCoreConfs as 'false' in order to make 
change value of Spark config: spark.driver.host .
 at 
org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfSparkConfigError(QueryCompilationErrors.scala:2336)
 at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:157)
 at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:41){code}

 ... 47 elided

> Add more information in ERROR log while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is set
> --
>
> Key: SPARK-36643
> URL: https://issues.apache.org/jira/browse/SPARK-36643
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Right now, by default sql.legacy.setCommandRejectsSparkCoreConfs is set as 
> true in Spark 3.* versions int order to avoid changing Spark Confs. But from 
> the error message we get confused if we can not modify/change Spark conf in 
> Spark 3.* or not.
> Current Error Message :
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> modify the value of a Spark config: spark.driver.host
>  at 
> org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:156)
>  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:40){code}
>  
> So adding little more information( how to modify Spark Conf), in ERROR log 
> while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is 'true', will be helpful 
> to avoid confusions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36643) Add more information in ERROR log while SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set

2021-09-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408282#comment-17408282
 ] 

Senthil Kumar commented on SPARK-36643:
---

New ERROR message will be as below,

 
{code:java}
scala> spark.conf.set("spark.driver.host", "localhost") 
org.apache.spark.sql.AnalysisException: Cannot modify the value of a Spark 
config: spark.driver.host, please set 
spark.sql.legacy.setCommandRejectsSparkCoreConfs as 'false' in order to make 
change value of Spark config: spark.driver.host .
 at 
org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfSparkConfigError(QueryCompilationErrors.scala:2336)
 at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:157)
 at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:41){code}

 ... 47 elided

> Add more information in ERROR log while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is set
> --
>
> Key: SPARK-36643
> URL: https://issues.apache.org/jira/browse/SPARK-36643
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Right now, by default sql.legacy.setCommandRejectsSparkCoreConfs is set as 
> true in Spark 3.* versions int order to avoid changing Spark Confs. But from 
> the error message we get confused if we can not modify/change Spark conf in 
> Spark 3.* or not.
> Current Error Message :
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> modify the value of a Spark config: spark.driver.host
>  at 
> org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:156)
>  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:40){code}
>  
> So adding little more information( how to modify Spark Conf), in ERROR log 
> while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is 'true', will be helpful 
> to avoid confusions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36643) Add more information in ERROR log while SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set

2021-09-01 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408274#comment-17408274
 ] 

Senthil Kumar commented on SPARK-36643:
---

Creating PR for this

> Add more information in ERROR log while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is set
> --
>
> Key: SPARK-36643
> URL: https://issues.apache.org/jira/browse/SPARK-36643
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Right now, by default sql.legacy.setCommandRejectsSparkCoreConfs is set as 
> true in Spark 3.* versions int order to avoid changing Spark Confs. But from 
> the error message we get confused if we can not modify/change Spark conf in 
> Spark 3.* or not.
> Current Error Message :
> {code:java}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> modify the value of a Spark config: spark.driver.host
>  at 
> org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:156)
>  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:40){code}
>  
> So adding little more information( how to modify Spark Conf), in ERROR log 
> while SparkConf is modified when 
> spark.sql.legacy.setCommandRejectsSparkCoreConfs is 'true', will be helpful 
> to avoid confusions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36643) Add more information in ERROR log while SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set

2021-09-01 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36643:
-

 Summary: Add more information in ERROR log while SparkConf is 
modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is set
 Key: SPARK-36643
 URL: https://issues.apache.org/jira/browse/SPARK-36643
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Senthil Kumar


Right now, by default sql.legacy.setCommandRejectsSparkCoreConfs is set as true 
in Spark 3.* versions int order to avoid changing Spark Confs. But from the 
error message we get confused if we can not modify/change Spark conf in Spark 
3.* or not.

Current Error Message :
{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
modify the value of a Spark config: spark.driver.host
 at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:156)
 at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:40){code}
 

So adding little more information( how to modify Spark Conf), in ERROR log 
while SparkConf is modified when 
spark.sql.legacy.setCommandRejectsSparkCoreConfs is 'true', will be helpful to 
avoid confusions.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2021-08-30 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407068#comment-17407068
 ] 

Senthil Kumar commented on SPARK-36604:
---

[~yghu] I tested this scenario in Spark2.4, but I don't see this issue is 
occurring.   Are you seeing this issue only in Spark 3.1.1? 

 

 
{panel}


 

_scala> spark.sql("create table c(a timestamp)")_

_res16: org.apache.spark.sql.DataFrame = []_

 __ 

_scala> spark.sql("insert into c select '2021-08-15 15:30:01'")_

_res17: org.apache.spark.sql.DataFrame = []_

 __ 

_scala> spark.sql("analyze table c compute statistics for columns a")_

_res18: org.apache.spark.sql.DataFrame = []_

 __ 

_scala> spark.sql("desc formatted c a").show(true)_

_+--++_

_|     info_name|          info_value|_

_+--++_

_|      col_name|                   a|_

_|     data_type|           timestamp|_

_|       comment|                NULL|_

_|           min|2021-08-15 15:30:...|_

_|           max|2021-08-15 15:30:...|_

_|     num_nulls|                   0|_

_|distinct_count|                   1|_

_|   avg_col_len|                   8|_

_|   max_col_len|                   8|_

_|     histogram|                NULL|_

_+--++_

 
{panel}
 

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> {code}
> > select * from a;
> {code}
> {code}
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
> {code}
>  
> reproduce step:
> {code}
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36412) Add Test Coverage to meet viewFs(Hadoop federation) scenario

2021-08-04 Thread Senthil Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthil Kumar updated SPARK-36412:
--
Summary: Add Test  Coverage to meet viewFs(Hadoop federation) scenario  
(was: Create coverage Test to meet viewFs(Hadoop federation) scenario)

> Add Test  Coverage to meet viewFs(Hadoop federation) scenario
> -
>
> Key: SPARK-36412
> URL: https://issues.apache.org/jira/browse/SPARK-36412
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Create coverage Test to meet viewFs(Hadoop federation) scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36412) Create coverage Test to meet viewFs(Hadoop federation) scenario

2021-08-04 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392976#comment-17392976
 ] 

Senthil Kumar commented on SPARK-36412:
---

I am working on this

> Create coverage Test to meet viewFs(Hadoop federation) scenario
> ---
>
> Key: SPARK-36412
> URL: https://issues.apache.org/jira/browse/SPARK-36412
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Major
>
> Create coverage Test to meet viewFs(Hadoop federation) scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36412) Create coverage Test to meet viewFs(Hadoop federation) scenario

2021-08-04 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36412:
-

 Summary: Create coverage Test to meet viewFs(Hadoop federation) 
scenario
 Key: SPARK-36412
 URL: https://issues.apache.org/jira/browse/SPARK-36412
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: Senthil Kumar


Create coverage Test to meet viewFs(Hadoop federation) scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory

2021-07-30 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390851#comment-17390851
 ] 

Senthil Kumar commented on SPARK-36327:
---

Hi [~sunchao]

Hive is creating .staging directories inside "/db/table" location but Spark-sql 
creates .staging directories inside /db/" location when we use hadoop 
federation(viewFs). But works as expected (creating .staging inside /db/table/ 
location for other filesystems like hdfs).

HIVE:
{{
# beeline
> use dicedb;
> insert into table part_test partition (j=1) values (1);
...
INFO : Loading data to table dicedb.part_test partition (j=1) from 
**viewfs://cloudera/user/daisuke/dicedb/part_test/j=1/.hive-staging_hive_2021-07-19_13-04-44_989_6775328876605030677-1/-ext-1**

}}

but spark's behaviour,

{{
spark-sql> use dicedb;
spark-sql> insert into table part_test partition (j=2) values (2);
21/07/19 13:07:37 INFO FileUtils: Creating directory if it doesn't exist: 
**viewfs://cloudera/user/daisuke/dicedb/.hive-staging_hive_2021-07-19_13-07-37_317_5083528872437596950-1**
... 
}}


The reason why we require this change is , if we allow spark-sql to create 
.staging directory inside /db/ location then we will end-up with security 
issues. We need to provide permission for "viewfs:///db/" location to all users 
who submit spark jobs.

After this change is applied spark-sql creates .staging inside /db/table/, 
similar to hive, as below,

{{
spark-sql> use dicedb;
21/07/28 00:22:47 INFO SparkSQLCLIDriver: Time taken: 0.929 seconds
spark-sql> insert into table part_test partition (j=8) values (8);
21/07/28 00:23:25 INFO HiveMetaStoreClient: Closed a connection to metastore, 
current connections: 1
21/07/28 00:23:26 INFO FileUtils: Creating directory if it doesn't exist: 
**viewfs://cloudera/user/daisuke/dicedb/part_test/.hive-staging_hive_2021-07-28_00-23-26_109_4548714524589026450-1**
 
}}

The reason why we don't see this issue in Hive but only occurs in Spark-sql:

In hive, "/db/table/tmp" directory structure is passed for path and hence 
path.getParent returns "db/table/" . But in Spark we just pass "/db/table" so 
it is not required to use "path.getParent" for hadoop federation(viewfs)

 

> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory
> ---
>
> Key: SPARK-36327
> URL: https://issues.apache.org/jira/browse/SPARK-36327
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory.
>  
> This arises only when viewfs:// is configured. When the location is hdfs://, 
> it doesn't occur.
>  
> Based on further investigation in file *SaveAsHiveFile.scala*, I could see 
> that the directory hierarchy has been not properly handled for viewFS 
> condition.
> Parent path(db path) is passed rather than passing the actual directory(table 
> location).
> {{
> // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
> private def newVersionExternalTempPath(
> path: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
> val extURI: URI = path.toUri
> if (extURI.getScheme == "viewfs")
> { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }
> else
> { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), 
> "-ext-1") }
> }
> }}
> Please refer below lines
> ===
> if (extURI.getScheme == "viewfs") {
> getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
> ===



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory

2021-07-29 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389827#comment-17389827
 ] 

Senthil Kumar commented on SPARK-36327:
---

Hi [~dongjoon],  [~hyukjin.kwon]

 

Could you please review these minor changes? 

> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory
> ---
>
> Key: SPARK-36327
> URL: https://issues.apache.org/jira/browse/SPARK-36327
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory.
>  
> This arises only when viewfs:// is configured. When the location is hdfs://, 
> it doesn't occur.
>  
> Based on further investigation in file *SaveAsHiveFile.scala*, I could see 
> that the directory hierarchy has been not properly handled for viewFS 
> condition.
> Parent path(db path) is passed rather than passing the actual directory(table 
> location).
> {{
> // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
> private def newVersionExternalTempPath(
> path: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
> val extURI: URI = path.toUri
> if (extURI.getScheme == "viewfs")
> { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }
> else
> { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), 
> "-ext-1") }
> }
> }}
> Please refer below lines
> ===
> if (extURI.getScheme == "viewfs") {
> getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
> ===



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory

2021-07-29 Thread Senthil Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthil Kumar updated SPARK-36327:
--
Component/s: SQL

> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory
> ---
>
> Key: SPARK-36327
> URL: https://issues.apache.org/jira/browse/SPARK-36327
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory.
>  
> This arises only when viewfs:// is configured. When the location is hdfs://, 
> it doesn't occur.
>  
> Based on further investigation in file *SaveAsHiveFile.scala*, I could see 
> that the directory hierarchy has been not properly handled for viewFS 
> condition.
> Parent path(db path) is passed rather than passing the actual directory(table 
> location).
> {{
> // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
> private def newVersionExternalTempPath(
> path: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
> val extURI: URI = path.toUri
> if (extURI.getScheme == "viewfs")
> { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }
> else
> { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), 
> "-ext-1") }
> }
> }}
> Please refer below lines
> ===
> if (extURI.getScheme == "viewfs") {
> getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
> ===



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory

2021-07-29 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389814#comment-17389814
 ] 

Senthil Kumar commented on SPARK-36327:
---

Created PR https://github.com/apache/spark/pull/33577

> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory
> ---
>
> Key: SPARK-36327
> URL: https://issues.apache.org/jira/browse/SPARK-36327
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory.
>  
> This arises only when viewfs:// is configured. When the location is hdfs://, 
> it doesn't occur.
>  
> Based on further investigation in file *SaveAsHiveFile.scala*, I could see 
> that the directory hierarchy has been not properly handled for viewFS 
> condition.
> Parent path(db path) is passed rather than passing the actual directory(table 
> location).
> {{
> // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
> private def newVersionExternalTempPath(
> path: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
> val extURI: URI = path.toUri
> if (extURI.getScheme == "viewfs")
> { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }
> else
> { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), 
> "-ext-1") }
> }
> }}
> Please refer below lines
> ===
> if (extURI.getScheme == "viewfs") {
> getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
> ===



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory

2021-07-28 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388597#comment-17388597
 ] 

Senthil Kumar commented on SPARK-36327:
---

Shall I work on this Jira to fix this issue?

> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory
> ---
>
> Key: SPARK-36327
> URL: https://issues.apache.org/jira/browse/SPARK-36327
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Senthil Kumar
>Priority: Minor
>
> Spark sql creates staging dir inside database directory rather than creating 
> inside table directory.
>  
> This arises only when viewfs:// is configured. When the location is hdfs://, 
> it doesn't occur.
>  
> Based on further investigation in file *SaveAsHiveFile.scala*, I could see 
> that the directory hierarchy has been not properly handled for viewFS 
> condition.
> Parent path(db path) is passed rather than passing the actual directory(table 
> location).
> {{
> // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
> private def newVersionExternalTempPath(
> path: Path,
> hadoopConf: Configuration,
> stagingDir: String): Path = {
> val extURI: URI = path.toUri
> if (extURI.getScheme == "viewfs")
> { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }
> else
> { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), 
> "-ext-1") }
> }
> }}
> Please refer below lines
> ===
> if (extURI.getScheme == "viewfs") {
> getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
> ===



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory

2021-07-28 Thread Senthil Kumar (Jira)
Senthil Kumar created SPARK-36327:
-

 Summary: Spark sql creates staging dir inside database directory 
rather than creating inside table directory
 Key: SPARK-36327
 URL: https://issues.apache.org/jira/browse/SPARK-36327
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Senthil Kumar


Spark sql creates staging dir inside database directory rather than creating 
inside table directory.

 

This arises only when viewfs:// is configured. When the location is hdfs://, it 
doesn't occur.

 

Based on further investigation in file *SaveAsHiveFile.scala*, I could see that 
the directory hierarchy has been not properly handled for viewFS condition.
Parent path(db path) is passed rather than passing the actual directory(table 
location).

{{
// Mostly copied from Context.java#getExternalTmpPath of Hive 1.2
private def newVersionExternalTempPath(
path: Path,
hadoopConf: Configuration,
stagingDir: String): Path = {
val extURI: URI = path.toUri
if (extURI.getScheme == "viewfs")

{ getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) }

else

{ new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), "-ext-1") 
}

}
}}

Please refer below lines

===
if (extURI.getScheme == "viewfs") {
getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir)
===



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org