sing your question.
> I do not think it's a bug necessarily; do you end up with one partition in
> your execution somewhere?
>
> On Fri, Nov 12, 2021 at 3:38 AM Sergey Ivanychev <mailto:sergeyivanyc...@gmail.com>> wrote:
> Of course if I give 64G of ram to each executor they w
damages arising from such
> loss, damage or destruction.
>
>
>
> On Thu, 11 Nov 2021 at 21:39, Sergey Ivanychev <mailto:sergeyivanyc...@gmail.com>> wrote:
> Yes, in fact those are the settings that cause this behaviour. If set to
> false, eve
deserialization of
Row objects.
Best regards,
Sergey Ivanychev
> 12 нояб. 2021 г., в 05:05, Gourav Sengupta
> написал(а):
>
> Hi Sergey,
>
> Please read the excerpts from the book of Dr. Zaharia that I had sent, they
> explain these fundamentals clearly.
>
>
Yes, in fact those are the settings that cause this behaviour. If set to false,
everything goes fine since the implementation in spark sources in this case is
pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns)
Best regards,
Sergey Ivanychev
> 11 нояб. 2021 г., в 13
> did you get to read the excerpts from the book of Dr. Zaharia?
I read what you have shared but didn’t manage to get your point.
Best regards,
Sergey Ivanychev
> 4 нояб. 2021 г., в 20:38, Gourav Sengupta
> написал(а):
>
> did you get to read the excerpts from the book of Dr. Zaharia?
> Just to confirm with Collect() alone, this is all on the driver?
I shared the screenshot with the plan in the first email. In the collect() case
the data gets fetched to the driver without problems.
Best regards,
Sergey Ivanychev
> 4 нояб. 2021 г., в 20:37, Mich Talebzadeh
>
on executors.
Best regards,
Sergey Ivanychev
> 4 нояб. 2021 г., в 15:17, Mich Talebzadeh
> написал(а):
>
>
>
> From your notes ".. IIUC, in the `toPandas` case all the data gets shuffled
> to a single executor that fails with OOM, which doesn’t happen in `collect`
>
in
execution plans.
Best regards,
Sergey Ivanychev
> 4 нояб. 2021 г., в 13:12, Mich Talebzadeh
> написал(а):
>
>
> Do you have the output for executors from spark GUI, the one that eventually
> ends up with OOM?
>
> Also what does
>
> kubectl get pods -n
as the driver.
Currently, the best solution I found is to write the dataframe to S3, and then
read it via pd.read_parquet.
Best regards,
Sergey Ivanychev
> 4 нояб. 2021 г., в 00:18, Mich Talebzadeh
> написал(а):
>
>
> Thanks for clarification on the koalas case.
>
at RDD. Also toPandas() converts to Python objects in memory I do not think
> that collect does it.
>
> Regards,
> Gourav
>
> On Wed, Nov 3, 2021 at 2:24 PM Sergey Ivanychev <mailto:sergeyivanyc...@gmail.com>> wrote:
> Hi,
>
> Spark 3.1.2 K8s.
>
10 matches
Mail list logo