Re: BUG :: UI Spark

2024-05-23 Thread Prem Sahoo
Does anyone have a clue ? On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote: > Hello Team, > in spark DAG UI , we have Stages tab. Once you click on each stage you can > view the tasks. > > In each task we have a column "ShuffleWrite Size/Records " that column > pr

BUG :: UI Spark

2024-05-23 Thread Prem Sahoo
Hello Team, in spark DAG UI , we have Stages tab. Once you click on each stage you can view the tasks. In each task we have a column "ShuffleWrite Size/Records " that column prints wrong data when it gets the data from cache/persist . it typically will show the wrong record number though the data

Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread Prem Sahoo
level. > > Regards, > Vibhor > From: Prem Sahoo > Date: Tuesday, 21 May 2024 at 8:16 AM > To: Spark dev list > Subject: EXT: Dual Write to HDFS and MinIO in faster way > > EXTERNAL: Report suspicious emails to Email Abuse. > > Hello Team, > I am plan

Dual Write to HDFS and MinIO in faster way

2024-05-20 Thread Prem Sahoo
Hello Team, I am planning to write to two datasource at the same time . Scenario:- Writing the same dataframe to HDFS and MinIO without re-executing the transformations and no cache(). Then how can we make it faster ? Read the parquet file and do a few transformations and write to HDFS and

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
erner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Wed, 8 May 2024 at 13:41, Prem Sahoo wrote: > >> Could any one help me here ? >> Sent from my iPhone >> >> > On May 7, 2024

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > >  > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapR

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo
Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't want

Re: Which version of spark version supports parquet version 2 ?

2024-04-26 Thread Prem Sahoo
Confirmed, closing this . Thanks everyone for valuable information. Sent from my iPhone > On Apr 25, 2024, at 9:55 AM, Prem Sahoo wrote: > >  > Hello Spark , > After discussing with the Parquet and Pyarrow community . We can use the > below config so that Spark can writ

Re: Which version of spark version supports parquet version 2 ?

2024-04-25 Thread Prem Sahoo
Hello Spark , After discussing with the Parquet and Pyarrow community . We can use the below config so that Spark can write Parquet V2 files. *"hadoopConfiguration.set(“parquet.writer.version”, “v2”)" while creating Parquet then those are V2 parquet.* *Could you please confirm ?* >

Re: Which version of spark version supports parquet version 2 ?

2024-04-18 Thread Prem Sahoo
pache/parquet-mr?tab=readme-ov-file#java-vector-api-supportYou are using spark 3.2.0spark version 3.2.4 was released April 13, 2023 https://spark.apache.org/releases/spark-release-3-2-4.htmlYou are using a spark version that is EOL.tor. 18. apr. 2024 kl. 00:25 skrev Prem Sahoo <prem.re...@gmail.c

Re: Which version of spark version supports parquet version 2 ?

2024-04-17 Thread Prem Sahoo
mmunity. > > On Wed, Apr 17, 2024 at 11:05 AM Prem Sahoo wrote: > >> Hello Community, >> Could anyone shed more light on this (Spark Supporting Parquet V2)? >> >> On Tue, Apr 16, 2024 at 3:42 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >&

Re: Which version of spark version supports parquet version 2 ?

2024-04-17 Thread Prem Sahoo
be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > &

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Prem Sahoo
AILondonUnited Kingdom    view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh  Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand ex

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Prem Sahoo
Hello Community,Could any of you shed some light on below questions please ?Sent from my iPhoneOn Apr 15, 2024, at 9:02 PM, Prem Sahoo wrote:Any specific reason spark does not support or community doesn't want to go to Parquet V2 , which is more optimized and read and write is too much faster

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
ust fine. You just don't > need to worry about making Spark produce v2. And you should probably also > not produce v2 encodings from other systems. > > On Mon, Apr 15, 2024 at 4:37 PM Prem Sahoo wrote: > >> oops but so spark does not support parquet V2 atm ?, as We have a u

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
d adopted by > the community. I highly recommend not using v2 encodings at this time. > > Ryan > > On Mon, Apr 15, 2024 at 3:05 PM Prem Sahoo wrote: > >> I am using spark 3.2.0 . but my spark package comes with parquet-mr 1.2.1 >> which writes in parquet version 1 not version versio

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 15 Apr 2024 at 21:33, Prem Sahoo wrote: > >> Thank you so much for the info! But do we have any release notes w

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
a.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 15 Apr 2024 at 20:53, Prem Sahoo wrote: > >> Thank you for the information! >> I can use any version of parquet-mr to produce parquet file. >> >

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > >

Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
Hello Team, May I know how to check which version of parquet is supported by parquet-mr 1.2.1 ? Which version of parquet-mr is supporting parquet version 2 (V2) ? Which version of spark is supporting parquet version 2? May I get the release notes where parquet versions are mentioned ?

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
Hopefully, someone with more knowledge can > provide further insight. > > Best, > Jason > > On Mon, Mar 4, 2024 at 9:41 AM Prem Sahoo wrote: > >> super :( >> >> On Mon, Mar 4, 2024 at 6:19 AM Mich Talebzadeh >> wrote: >> >>> "... in a n

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
ywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https:

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-03 Thread Prem Sahoo
t; > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
ovided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
and three > labels, 'Correctness', 'correctness' and 'data-loss'. > > Dongjoon > > On Thu, Feb 29, 2024 at 11:54 Prem Sahoo wrote: > >> Hello Dongjoon, >> Thanks for emailing me. >> Could you please share a list of fixes as the link provided by you is >>

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Prem Sahoo
Congratulations Sent from my iPhoneOn Feb 29, 2024, at 4:54 PM, Xinrong Meng wrote:Congratulations!Thanks,XinrongOn Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun wrote:Congratulations!Bests,Dongjoon.On Wed, Feb 28, 2024 at 11:43 AM beliefer

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
s with Apache > Spark 3.5.1. > > Thanks, > Dongjoon. > > On 2024/02/29 15:04:41 Prem Sahoo wrote: > > When Spark job shows FetchFailedException it creates few duplicate data > and > > we see few data also missing , please explain why. We have scenario when > >

When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have scenario when spark job complains FetchFailedException as one of the data node got rebooted middle of job running . Now due to this we have few duplicate data and

Re: Spark Union performance issue

2023-02-22 Thread Prem Sahoo
gt; How many columns do all these tables have? > > Are you sure creating the plan depends on the number of rows? > > Enrico > > > Am 22.02.23 um 19:08 schrieb Prem Sahoo: > > here is the information missed > 1. Spark 3.2.0 > 2. it is scala based > 3. size of tab

Re: Spark Union performance issue

2023-02-22 Thread Prem Sahoo
r destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 22 Feb 2023 at 15:42

Spark Union performance issue

2023-02-22 Thread Prem Sahoo
Hello Team, We are observing Spark Union performance issues when unioning big tables with lots of rows. Do we have any option apart from the Union ?

Executor tab missing information

2023-02-13 Thread Prem Sahoo
Hello All, I am executing spark jobs but in executor tab I am missing information, I cant see any data/info coming up. Please let me know what I am missing .

Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Prem Sahoo
+1 On Mon, Feb 13, 2023 at 8:13 PM L. C. Hsieh wrote: > +1 > > On Mon, Feb 13, 2023 at 3:49 PM Mich Talebzadeh > wrote: > >> +1 for me >> >> >> >>view my Linkedin profile >> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >>