Hi,
I am copying Dr. Zaharia in this email as I am quoting from his book (once
again I may be wrong):
Chapter 5: Basic Structured Operations >> Creating Rows
You can create rows by manually instantiating a Row object with the values
that belong in each column. It’s important to note that only
I want to further clarify the use case I have: an ML engineer collects data so
as to use it for training an ML model. The driver is created within Jupiter
notebook and has 64G of ram for fetching the training set and feeding it to the
model. Naturally, in this case executors shouldn’t be as big
Thanks for clarification on the koalas case.
The thread owner states and I quote: .. IIUC, in the `toPandas` case all
the data gets shuffled to a single executor that fails with OOM,
I still believe that this may be related to the way k8s handles shuffling.
In a balanced k8s cluster this
I think you're talking about koalas, which is in Spark 3.2, but that is
unrelated to toPandas(), nor to the question of how it differs from
collect().
Shuffle is also unrelated.
On Wed, Nov 3, 2021 at 3:45 PM Mich Talebzadeh
wrote:
> Hi,
>
> As I understood in the previous versions of Spark the
Hi,
Rohit, can you share how it looks using DSv2?
Thanks!
On Wed, 3 Nov 2021 at 19:35, huaxin gao wrote:
> Great to hear. Thanks for testing this!
>
> On Wed, Nov 3, 2021 at 4:03 AM Kapoor, Rohit
> wrote:
>
>> Thanks for your guidance Huaxin. I have been able to test the push down
>>
I’m pretty sure
WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 20) (10.20.167.28 executor
2): java.lang.OutOfMemoryError
at java.base/java.io.ByteArrayOutputStream.hugeCapacity(Unknown Source)
If you look at the «toPandas» you can see the exchange stage that doesn’t occur
in the
Great to hear. Thanks for testing this!
On Wed, Nov 3, 2021 at 4:03 AM Kapoor, Rohit
wrote:
> Thanks for your guidance Huaxin. I have been able to test the push down
> operators successfully against Postgresql using DS v2.
>
>
>
>
>
> *From: *huaxin gao
> *Date: *Tuesday, 2 November 2021 at
Thanks for your guidance Huaxin. I have been able to test the push down
operators successfully against Postgresql using DS v2.
From: huaxin gao
Date: Tuesday, 2 November 2021 at 12:35 AM
To: Kapoor, Rohit
Subject: Re: [Spark SQL]: Aggregate Push Down / Spark 3.2
EXTERNAL MAIL: USE CAUTION