You certainly shouldn't just sprinkle them in, no, that has never been the
idea here. It can help in some cases, but is just overhead in others.
Be thoughtful about why you are adding these statements.
On Wed, Apr 27, 2022 at 11:16 AM Koert Kuipers wrote:
> we have quite a few persists
we have quite a few persists statements in our codebase whenever we are
reusing a dataframe.
we noticed that it slows things down quite a bit (sometimes doubles the
runtime), while providing little benefits, since spark already re-uses the
shuffle files underlying the dataframe efficiently even if
Yes,
It created a list of records separated by , and it was created faster as
well.
On Wed, 27 Apr 2022, 13:42 Gourav Sengupta,
wrote:
> Hi,
> did that result in valid JSON in the output file?
>
> Regards,
> Gourav Sengupta
>
> On Tue, Apr 26, 2022 at 8:18 PM Sid wrote:
>
>> I have .txt
Hi team,
With your help last week I was able to adapt a project I'm developing and apply
a sentiment analysis and NER retrieval to streaming tweets. One of the next
steps in order to ensure that memory doesn't collapse is applying windows and
watermarks to discard tweets after some time.
Hi,
did that result in valid JSON in the output file?
Regards,
Gourav Sengupta
On Tue, Apr 26, 2022 at 8:18 PM Sid wrote:
> I have .txt files with JSON inside it. It is generated by some API calls
> by the Client.
>
> On Wed, Apr 27, 2022 at 12:39 AM Bjørn Jørgensen
> wrote:
>
>> What is that