[jira] [Updated] (SPARK-47766) Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0

2024-04-08 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-47766:

Description: 
We have some HIGH CVEs which are coming from hadoop-client-runtime 3.3.4 and 
hence we need to address those

 

com.fasterxml.jackson.core:jackson-databind              causing    
*CVE-2022-42003* and *CVE-2022-42004*

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)

 

 

com.google.protobuf:protobuf-java      

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  causing *CVE-2021-22569,* 
*CVE-2021-22570,* *CVE-2022-3509* and *CVE-2022-3510*

 

net.minidev:json-smart                                                         
causing *CVE-2021-31684,* *CVE-2023-1370*

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  

 

 

org.apache.avro:avro 

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)        causing 
*CVE-2023-39410*    

 

 

org.apache.commons:commons-compress         causing *CVE-2024-25710, 
CVE-2024-26308* 

(org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) 

 

 

Most of these have gone in hadoop client  runtime 3.4.0

 

Is there a plan to support hadoop 3.4.0 ?

  was:
I have a data pipeline set up in such a way that it reads data from a Kafka 
source, does some transformation on the data using pyspark, then writes the 
output into a sink (Kafka, Redis, etc).

 

My entire pipeline in written in SQL, so I wish to use the .sql() method to 
execute SQL on my streaming source directly.

 

However, I'm running into the issue where my watermark is not being recognized 
by the downstream query via the .sql() method.

 

```
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 
16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> print(pyspark.__version__)
3.5.1
>>> from pyspark.sql import SparkSession
>>>
>>> session = SparkSession.builder \
...     .config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
...     .getOrCreate()
>>> from pyspark.sql.functions import col, from_json
>>> from pyspark.sql.types import StructField, StructType, TimestampType, 
>>> LongType, DoubleType, IntegerType
>>> schema = StructType(
...     [
...         StructField('createTime', TimestampType(), True),
...         StructField('orderId', LongType(), True),
...         StructField('payAmount', DoubleType(), True),
...         StructField('payPlatform', IntegerType(), True),
...         StructField('provinceId', IntegerType(), True),
...     ])
>>>
>>> streaming_df = session.readStream\
...     .format("kafka")\
...     .option("kafka.bootstrap.servers", "localhost:9092")\
...     .option("subscribe", "payment_msg")\
...     .option("startingOffsets","earliest")\
...     .load()\
...     .select(from_json(col("value").cast("string"), 
schema).alias("parsed_value"))\
...     .select("parsed_value.*")\
...     .withWatermark("createTime", "10 seconds")
>>>
>>> streaming_df.createOrReplaceTempView("streaming_df")
>>> session.sql("""
... SELECT
...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
...     FROM streaming_df
...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
...     ORDER BY window.start
... """)\
...   .writeStream\
...   .format("kafka") \
...   .option("checkpointLocation", "checkpoint") \
...   .option("kafka.bootstrap.servers", "localhost:9092") \
...   .option("topic", "sink") \
...   .start()
```
 
This throws exception
```
pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
supported when there are streaming aggregations on streaming 
DataFrames/DataSets without watermark; line 6 pos 4;
```
 

 


> Extend spark 3.5.1 to support hadoop-client-api 3.4.0, 
> hadoop-client-runtime-3.4.0
> --
>
> Key: SPARK-47766
> URL: https://issues.apache.org/jira/browse/SPARK-47766
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Ramakrishna
>Priority: Blocker
>  Labels: pull-request-available
>
> We have some HIGH CVEs which are coming from hadoop-client-runtime 3.3.4 and 
> hence we need to address those
>  
> com.fasterxml.jackson.core:jackson-databind              causing    
> *CVE-2022-42003* and *CVE-2022-42004*
> (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)
>  
>  
> com.google.protobuf:protobuf-java      
> (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  causing 
> *CVE-2021-22569,* *CVE-2021-22570,* *CVE-2022-3509* and *CVE-2022-3510*
>  
> net.minidev:json-smart                                                        
>  causing *CVE-2021-31684,* *CVE-2023-1370*
> (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar)  
>  
>  
> org.apache.avro:avro 
> 

[jira] [Created] (SPARK-47766) Extend spark 3.5.1 to support hadoop-client-api 3.4.0, hadoop-client-runtime-3.4.0

2024-04-08 Thread Ramakrishna (Jira)
Ramakrishna created SPARK-47766:
---

 Summary: Extend spark 3.5.1 to support hadoop-client-api 3.4.0, 
hadoop-client-runtime-3.4.0
 Key: SPARK-47766
 URL: https://issues.apache.org/jira/browse/SPARK-47766
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.1
Reporter: Ramakrishna


I have a data pipeline set up in such a way that it reads data from a Kafka 
source, does some transformation on the data using pyspark, then writes the 
output into a sink (Kafka, Redis, etc).

 

My entire pipeline in written in SQL, so I wish to use the .sql() method to 
execute SQL on my streaming source directly.

 

However, I'm running into the issue where my watermark is not being recognized 
by the downstream query via the .sql() method.

 

```
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 
16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> print(pyspark.__version__)
3.5.1
>>> from pyspark.sql import SparkSession
>>>
>>> session = SparkSession.builder \
...     .config("spark.jars.packages", 
"org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
...     .getOrCreate()
>>> from pyspark.sql.functions import col, from_json
>>> from pyspark.sql.types import StructField, StructType, TimestampType, 
>>> LongType, DoubleType, IntegerType
>>> schema = StructType(
...     [
...         StructField('createTime', TimestampType(), True),
...         StructField('orderId', LongType(), True),
...         StructField('payAmount', DoubleType(), True),
...         StructField('payPlatform', IntegerType(), True),
...         StructField('provinceId', IntegerType(), True),
...     ])
>>>
>>> streaming_df = session.readStream\
...     .format("kafka")\
...     .option("kafka.bootstrap.servers", "localhost:9092")\
...     .option("subscribe", "payment_msg")\
...     .option("startingOffsets","earliest")\
...     .load()\
...     .select(from_json(col("value").cast("string"), 
schema).alias("parsed_value"))\
...     .select("parsed_value.*")\
...     .withWatermark("createTime", "10 seconds")
>>>
>>> streaming_df.createOrReplaceTempView("streaming_df")
>>> session.sql("""
... SELECT
...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
...     FROM streaming_df
...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
...     ORDER BY window.start
... """)\
...   .writeStream\
...   .format("kafka") \
...   .option("checkpointLocation", "checkpoint") \
...   .option("kafka.bootstrap.servers", "localhost:9092") \
...   .option("topic", "sink") \
...   .start()
```
 
This throws exception
```
pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
supported when there are streaming aggregations on streaming 
DataFrames/DataSets without watermark; line 6 pos 4;
```
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40782) Upgrade Jackson-databind to 2.13.4.1

2024-04-08 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834801#comment-17834801
 ] 

Ramakrishna commented on SPARK-40782:
-

Hi this seems to be an issue still as transitive dependency in hadoop 

 

│ com.fasterxml.jackson.core:jackson-databind                  │ CVE-2022-42003 
     │ HIGH     │ fixed  │ 2.12.7            │ 2.12.7.1, 2.13.4.2               
│ jackson-databind: deep wrapper array nesting wrt             │

│ (org.apache.hadoop_hadoop-client-runtime-3.3.4.jar) 

 

Is thrre a fix for this ?

> Upgrade Jackson-databind to 2.13.4.1
> 
>
> Key: SPARK-40782
> URL: https://issues.apache.org/jira/browse/SPARK-40782
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
>
> #3590: Add check in primitive value deserializers to avoid deep wrapper array
>   nesting wrt `UNWRAP_SINGLE_VALUE_ARRAYS` [CVE-2022-42003]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow

2024-02-21 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819249#comment-17819249
 ] 

Ramakrishna edited comment on SPARK-21595 at 2/21/24 1:34 PM:
--

[~Rakesh_Shah]  How did you manage to solve this ? I am getting this in my 
streaming query, it does aggregations similar to other streaming queries in 
same job. However it fails and I get

 

 

{"timestamp":"21/02/2024 
07:11:35","logLevel":"ERROR","class":"MapOutputTracker","thread":"Executor task 
launch worker for task 25.0 in stage 2.1 (TID 75)","message":"Missing an output 
location for shuffle 5 partition 35"}

 

Can you please help ?

 

[~tejasp]  Can you please help ?

 

My spark version is 3.4.0


was (Author: hande):
[~Rakesh_Shah]  How did you manage to solve this ? I am getting this in my 
streaming query, it does aggregations similar to other streaming queries in 
same job. However it fails and I get

 

 

{"timestamp":"21/02/2024 
07:11:35","logLevel":"ERROR","class":"MapOutputTracker","thread":"Executor task 
launch worker for task 25.0 in stage 2.1 (TID 75)","message":"Missing an output 
location for shuffle 5 partition 35"}

 

Can you please help ?

 

[~tejasp]  Can you please help ?

> introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 
> breaks existing workflow
> -
>
> Key: SPARK-21595
> URL: https://issues.apache.org/jira/browse/SPARK-21595
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.2.0
> Environment: pyspark on linux
>Reporter: Stephan Reiling
>Assignee: Tejas Patil
>Priority: Minor
>  Labels: documentation, regression
> Fix For: 2.2.1, 2.3.0
>
>
> My pyspark code has the following statement:
> {code:java}
> # assign row key for tracking
> df = df.withColumn(
> 'association_idx',
> sqlf.row_number().over(
> Window.orderBy('uid1', 'uid2')
> )
> )
> {code}
> where df is a long, skinny (450M rows, 10 columns) dataframe. So this creates 
> one large window for the whole dataframe to sort over.
> In spark 2.1 this works without problem, in spark 2.2 this fails either with 
> out of memory exception or too many open files exception, depending on memory 
> settings (which is what I tried first to fix this).
> Monitoring the blockmgr, I see that spark 2.1 creates 152 files, spark 2.2 
> creates >110,000 files.
> In the log I see the following messages (110,000 of these):
> {noformat}
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of 
> spilledRecords crossed the threshold 4096
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of 
> 64.1 MB to disk (0  time so far)
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of 
> spilledRecords crossed the threshold 4096
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of 
> 64.1 MB to disk (1  time so far)
> {noformat}
> So I started hunting for clues in UnsafeExternalSorter, without luck. What I 
> had missed was this one message:
> {noformat}
> 17/08/01 08:55:37 INFO ExternalAppendOnlyUnsafeRowArray: Reached spill 
> threshold of 4096 rows, switching to 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter
> {noformat}
> Which allowed me to track down the issue. 
> By changing the configuration to include:
> {code:java}
> spark.sql.windowExec.buffer.spill.threshold   2097152
> {code}
> I got it to work again and with the same performance as spark 2.1.
> I have workflows where I use windowing functions that do not fail, but took a 
> performance hit due to the excessive spilling when using the default of 4096.
> I think to make it easier to track down these issues this config variable 
> should be included in the configuration documentation. 
> Maybe 4096 is too small of a default value?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow

2024-02-21 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819249#comment-17819249
 ] 

Ramakrishna commented on SPARK-21595:
-

[~Rakesh_Shah]  How did you manage to solve this ? I am getting this in my 
streaming query, it does aggregations similar to other streaming queries in 
same job. However it fails and I get

 

 

{"timestamp":"21/02/2024 
07:11:35","logLevel":"ERROR","class":"MapOutputTracker","thread":"Executor task 
launch worker for task 25.0 in stage 2.1 (TID 75)","message":"Missing an output 
location for shuffle 5 partition 35"}

 

Can you please help ?

 

[~tejasp]  Can you please help ?

> introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 
> breaks existing workflow
> -
>
> Key: SPARK-21595
> URL: https://issues.apache.org/jira/browse/SPARK-21595
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.2.0
> Environment: pyspark on linux
>Reporter: Stephan Reiling
>Assignee: Tejas Patil
>Priority: Minor
>  Labels: documentation, regression
> Fix For: 2.2.1, 2.3.0
>
>
> My pyspark code has the following statement:
> {code:java}
> # assign row key for tracking
> df = df.withColumn(
> 'association_idx',
> sqlf.row_number().over(
> Window.orderBy('uid1', 'uid2')
> )
> )
> {code}
> where df is a long, skinny (450M rows, 10 columns) dataframe. So this creates 
> one large window for the whole dataframe to sort over.
> In spark 2.1 this works without problem, in spark 2.2 this fails either with 
> out of memory exception or too many open files exception, depending on memory 
> settings (which is what I tried first to fix this).
> Monitoring the blockmgr, I see that spark 2.1 creates 152 files, spark 2.2 
> creates >110,000 files.
> In the log I see the following messages (110,000 of these):
> {noformat}
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of 
> spilledRecords crossed the threshold 4096
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of 
> 64.1 MB to disk (0  time so far)
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of 
> spilledRecords crossed the threshold 4096
> 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of 
> 64.1 MB to disk (1  time so far)
> {noformat}
> So I started hunting for clues in UnsafeExternalSorter, without luck. What I 
> had missed was this one message:
> {noformat}
> 17/08/01 08:55:37 INFO ExternalAppendOnlyUnsafeRowArray: Reached spill 
> threshold of 4096 rows, switching to 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter
> {noformat}
> Which allowed me to track down the issue. 
> By changing the configuration to include:
> {code:java}
> spark.sql.windowExec.buffer.spill.threshold   2097152
> {code}
> I got it to work again and with the same performance as spark 2.1.
> I have workflows where I use windowing functions that do not fail, but took a 
> performance hit due to the excessive spilling when using the default of 4096.
> I think to make it easier to track down these issues this config variable 
> should be included in the configuration documentation. 
> Maybe 4096 is too small of a default value?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44152) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-07-24 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746516#comment-17746516
 ] 

Ramakrishna edited comment on SPARK-44152 at 7/24/23 4:06 PM:
--

Hello [~sdehaes] 

It should work if you copy jar to 

 

/usr/local/bin folder of your docker container

 

. It worked for us


was (Author: hande):
Hello [~sdehaes] 

It should work if you copy jar to 

 

/usr/local/bin folder

 

. It worked for us

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44152
> URL: https://issues.apache.org/jira/browse/SPARK-44152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44152) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-07-24 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746516#comment-17746516
 ] 

Ramakrishna commented on SPARK-44152:
-

Hello [~sdehaes] 

It should work if you copy jar to 

 

/usr/local/bin folder

 

. It worked for us

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44152
> URL: https://issues.apache.org/jira/browse/SPARK-44152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44152) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-07-03 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739627#comment-17739627
 ] 

Ramakrishna commented on SPARK-44152:
-

 Hi [~iainm]  Yes, I spent probably bit longer to understand what is happening, 
because it did not sound like a permission issue looking at the error. Because 
it just says jar not found. Probably adding more detailed documentation helps 
especially migrating from 3.3.2 to 3.4.0

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44152
> URL: https://issues.apache.org/jira/browse/SPARK-44152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44152) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-23 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736404#comment-17736404
 ] 

Ramakrishna commented on SPARK-44152:
-

[~gurwls223] 

Is this an issue spark 3.4.0 ? at least I am facing this issue, with all other 
constraints remaining .

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44152
> URL: https://issues.apache.org/jira/browse/SPARK-44152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Description: 
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

*{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}*

 

I have this in deployment.yaml of the app

 

*mainApplicationFile: "local:spark-assembly-1.0.jar"*

 

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

  was:
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

*{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}*

 

I have this in deployment.yaml of the app

 

*mainApplicationFile: "local:spark-assembly-1.0.jar"*

 

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

{{}}

{{}}


> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Blocker
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Priority: Blocker  (was: Critical)

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Blocker
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Issue Type: Bug  (was: Improvement)

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Description: 
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}

 

I have this in deployment.yaml of the app

 

{\{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

{{}}

{{}}

  was:
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}

{{}}

{{}}

I have this in deployment.yaml of the app

 

{{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

{{}}

{{}}


> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> {{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}
>  
> I have this in deployment.yaml of the app
>  
> {\{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Description: 
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

*{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}*

 

I have this in deployment.yaml of the app

 

*mainApplicationFile: "local:spark-assembly-1.0.jar"*

 

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

{{}}

{{}}

  was:
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}

 

I have this in deployment.yaml of the app

 

{\{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

{{}}

{{}}


> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> *{{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}*
>  
> I have this in deployment.yaml of the app
>  
> *mainApplicationFile: "local:spark-assembly-1.0.jar"*
>  
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Component/s: Spark Core
 (was: Shuffle)

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> {{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}
> {{}}
> {{}}
> I have this in deployment.yaml of the app
>  
> {{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Affects Version/s: (was: 3.2.0)

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> {{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}
> {{}}
> {{}}
> I have this in deployment.yaml of the app
>  
> {{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Description: 
 
I have a spark application that is deployed using k8s and it is of version 
3.3.2 Recently there were some vulneabilities in spark 3.3.2

I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
application jar is built on spark 3.4.0

However while deploying, I get this error

        

{{Exception in thread "main" java.nio.file.NoSuchFileException: 
/spark-assembly-1.0.jar}}

{{}}

{{}}

I have this in deployment.yaml of the app

 

{{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}

 

 

 

and I have not changed anything related to that. I see that some code has 
changed in spark 3.4.0 core's source code regarding jar location.

Has it really changed the functionality ? Is there anyone who is facing same 
issue as me ? Should the path be specified in a different way?

{{}}

{{}}

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>
>  
> I have a spark application that is deployed using k8s and it is of version 
> 3.3.2 Recently there were some vulneabilities in spark 3.3.2
> I changed my dockerfile to download 3.4.0 instead of 3.3.2 and also my 
> application jar is built on spark 3.4.0
> However while deploying, I get this error
>         
> {{Exception in thread "main" java.nio.file.NoSuchFileException: 
> /spark-assembly-1.0.jar}}
> {{}}
> {{}}
> I have this in deployment.yaml of the app
>  
> {{ mainApplicationFile: "local:spark-assembly-1.0.jar"}}
>  
>  
>  
> and I have not changed anything related to that. I see that some code has 
> changed in spark 3.4.0 core's source code regarding jar location.
> Has it really changed the functionality ? Is there anyone who is facing same 
> issue as me ? Should the path be specified in a different way?
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-44135:

Description: (was: In our production environment, 
_finalizeShuffleMerge_ processing took longer time (p90 is around 20s) than 
other PRC requests. This is due to _finalizeShuffleMerge_ invoking IO 
operations like truncate and file open/close.  

More importantly, processing this _finalizeShuffleMerge_ can block other 
critical lightweight messages like authentications, which can cause 
authentication timeout as well as fetch failures. Those timeout and fetch 
failures affect the stability of the Spark job executions. )

> Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" 
> java.nio.file.NoSuchFileException: , although jar is present in the location
> ---
>
> Key: SPARK-44135
> URL: https://issues.apache.org/jira/browse/SPARK-44135
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.4.0
>Reporter: Ramakrishna
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44135) Upgrade to spark 3.4.0 from 3.3.2 gives Exception in thread "main" java.nio.file.NoSuchFileException: , although jar is present in the location

2023-06-21 Thread Ramakrishna (Jira)
Ramakrishna created SPARK-44135:
---

 Summary: Upgrade to spark 3.4.0 from 3.3.2 gives Exception in 
thread "main" java.nio.file.NoSuchFileException: , although jar is present in 
the location
 Key: SPARK-44135
 URL: https://issues.apache.org/jira/browse/SPARK-44135
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.2.0, 3.4.0
Reporter: Ramakrishna


In our production environment, _finalizeShuffleMerge_ processing took longer 
time (p90 is around 20s) than other PRC requests. This is due to 
_finalizeShuffleMerge_ invoking IO operations like truncate and file 
open/close.  

More importantly, processing this _finalizeShuffleMerge_ can block other 
critical lightweight messages like authentications, which can cause 
authentication timeout as well as fetch failures. Those timeout and fetch 
failures affect the stability of the Spark job executions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41298) Getting Count on data frame is giving the performance issue

2022-12-06 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644164#comment-17644164
 ] 

Ramakrishna commented on SPARK-41298:
-

Can some one please check behavior and update me asap.

> Getting Count on data frame is giving the performance issue
> ---
>
> Key: SPARK-41298
> URL: https://issues.apache.org/jira/browse/SPARK-41298
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are invoking  below query on Teradata 
> 1) Dataframe df = spark.format("jdbc"). . . load();
> 2) int count = df.count();
> When we executed the df.count spark internally issuing the below query on 
> teradata which is wasting the lot of CPU on teradata and DBAs are making 
> noise by seeing this query.
>  
> Query : SELECT 1 FROM ()SPARK_SUB_TAB
> Response:
> 1
> 1
> 1
> 1
> 1
> ..
> 1
>  
> Is this expected behavior from spark or is it bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41298) Getting Count on data frame is giving the performance issue

2022-11-28 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41298:

Description: 
We are invoking  below query on Teradata 

1) Dataframe df = spark.format("jdbc"). . . load();

2) int count = df.count();

When we executed the df.count spark internally issuing the below query on 
teradata which is wasting the lot of CPU on teradata and DBAs are making noise 
by seeing this query.

 

Query : SELECT 1 FROM ()SPARK_SUB_TAB

Response:

1

1

1

1

1

..

1

 

Is this expected behavior from spark or is it bug.

  was:
We are invoking  below query on Teradata 

1) Dataframe df = spark.format("jdbc"). . . load();

2) int count = df.count();

When we executed the df.count spark internally issuing the below query on 
teradata which is wasting the lot of CPU on teradata and DBAs are making noise 
by seeing this query.

 

Query : SELECT 1 FROM ()SPARK_SUB_TAB

Response:

1

1

1

1

1

..

1

 

Is this expected behavior form spark.


> Getting Count on data frame is giving the performance issue
> ---
>
> Key: SPARK-41298
> URL: https://issues.apache.org/jira/browse/SPARK-41298
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are invoking  below query on Teradata 
> 1) Dataframe df = spark.format("jdbc"). . . load();
> 2) int count = df.count();
> When we executed the df.count spark internally issuing the below query on 
> teradata which is wasting the lot of CPU on teradata and DBAs are making 
> noise by seeing this query.
>  
> Query : SELECT 1 FROM ()SPARK_SUB_TAB
> Response:
> 1
> 1
> 1
> 1
> 1
> ..
> 1
>  
> Is this expected behavior from spark or is it bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41298) Getting Count on data frame is giving the performance issue

2022-11-28 Thread Ramakrishna (Jira)
Ramakrishna created SPARK-41298:
---

 Summary: Getting Count on data frame is giving the performance 
issue
 Key: SPARK-41298
 URL: https://issues.apache.org/jira/browse/SPARK-41298
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Ramakrishna


We are invoking  below query on Teradata 

1) Dataframe df = spark.format("jdbc"). . . load();

2) int count = df.count();

When we executed the df.count spark internally issuing the below query on 
teradata which is wasting the lot of CPU on teradata and DBAs are making noise 
by seeing this query.

 

Query : SELECT 1 FROM ()SPARK_SUB_TAB

Response:

1

1

1

1

1

..

1

 

Is this expected behavior form spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-17 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna resolved SPARK-41070.
-
Resolution: Done

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41170) Performance issue when Spark SQL connects with TeraData

2022-11-17 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41170:

Summary: Performance issue when Spark SQL connects with TeraData   (was: 
CLONE - Performance issue when Spark SQL connects with TeraData )

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41170
> URL: https://issues.apache.org/jira/browse/SPARK-41170
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41170) CLONE - Performance issue when Spark SQL connects with TeraData

2022-11-17 Thread Ramakrishna (Jira)
Ramakrishna created SPARK-41170:
---

 Summary: CLONE - Performance issue when Spark SQL connects with 
TeraData 
 Key: SPARK-41170
 URL: https://issues.apache.org/jira/browse/SPARK-41170
 Project: Spark
  Issue Type: Question
  Components: Spark Core, SQL
Affects Versions: 2.4.4
Reporter: Ramakrishna


We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this type of query is executing from our code it self while check for rows 
count from dataframe.

 

Please provide me your inputs on this.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-17 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635223#comment-17635223
 ] 

Ramakrishna commented on SPARK-41070:
-

I converted issue to a questions.

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-17 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna reopened SPARK-41070:
-

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-17 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41070:

Issue Type: Question  (was: Bug)

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-16 Thread Ramakrishna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635083#comment-17635083
 ] 

Ramakrishna commented on SPARK-41070:
-

Do I need to do raise any ticket for this.

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-11 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41070:

Component/s: SQL

> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-08 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41070:

Description: 
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this type of query is executing from our code it self while check for rows 
count from dataframe.

 

Please provide me your inputs on this.

 

  was:
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

 

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 


> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-08 Thread Ramakrishna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41070:

Description: 
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

 

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 

  was:
We are connecting Tera data from spark SQL with below API

Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
connectionProperties);

 

We are facing one issue when we execute this logic on large table with million 
rows every time we are seeing below extra query is executing every times as 
this resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 


> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
>  
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this query is executing from our code it self while check for rows count 
> from  dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-08 Thread Ramakrishna (Jira)
Ramakrishna created SPARK-41070:
---

 Summary: Performance issue when Spark SQL connects with TeraData 
 Key: SPARK-41070
 URL: https://issues.apache.org/jira/browse/SPARK-41070
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Ramakrishna


We are connecting Tera data from spark SQL with below API

Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
connectionProperties);

 

We are facing one issue when we execute this logic on large table with million 
rows every time we are seeing below extra query is executing every times as 
this resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13585) addPyFile behavior change between 1.6 and before

2016-02-29 Thread Santhosh Gorantla Ramakrishna (JIRA)
Santhosh Gorantla Ramakrishna created SPARK-13585:
-

 Summary: addPyFile behavior change between 1.6 and before
 Key: SPARK-13585
 URL: https://issues.apache.org/jira/browse/SPARK-13585
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.6.0
Reporter: Santhosh Gorantla Ramakrishna
Priority: Minor


addPyFile in earlier versions would remove the .py file if it already existed. 
In 1.6, it throws an exception "__.py exists and does not match contents of 
__.py".

This might be because the underlying scala code needs an overwrite parameter, 
and this is being defaulted to false when called from python.
  private def copyFile(
  url: String,
  sourceFile: File,
  destFile: File,
  fileOverwrite: Boolean,
  removeSourceFile: Boolean = false): Unit = {

Would be good if addPyFile takes a parameter to set the overwrite and default 
to false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org