Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
Very helpful! On Wed, May 8, 2024 at 9:07 AM Mich Talebzadeh wrote: > *Potential reasons* > > >- Data Serialization: Spark needs to serialize the DataFrame into an >in-memory format suitable for storage. This process can be time-consuming, >especially for large datasets like 3.2 GB

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Mich Talebzadeh
*Potential reasons* - Data Serialization: Spark needs to serialize the DataFrame into an in-memory format suitable for storage. This process can be time-consuming, especially for large datasets like 3.2 GB with complex schemas. - Shuffle Operations: If your transformations involve

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > >  > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapRFS but for this > Spark

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo
Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't want