Hi devs,
It was Spark 2.3 in Feb 2018 which introduced continuous mode in Structured
Streaming as "experimental".
Now we are here at 2.5 years after its release - I feel it would be a good
time to evaluate the mode, whether the mode has been widely used or not,
and the mode has been making
+1
Will positively improve the performance and reliability of spark...
Looking fwd to this..
Regards
Kalyan.
On Tue, Sep 15, 2020, 9:26 AM Joseph Torres
wrote:
> +1
>
> On Mon, Sep 14, 2020 at 6:39 PM angers.zhu wrote:
>
>> +1
>>
>> angers.zhu
>> angers@gmail.com
>>
>>
+1
On Mon, Sep 14, 2020 at 6:39 PM angers.zhu wrote:
> +1
>
> angers.zhu
> angers@gmail.com
>
>
I See.
In our case, we use SingleBufferInputStream, so time spent is duplicating
the backing byte buffer.
Thanks
Chang
Ryan Blue 于2020年9月15日周二 上午2:04写道:
> Before, the input was a byte array so we could read from it directly. Now,
> the input is a `ByteBufferInputStream` so that Parquet can
+1
+1
Xiao
DB Tsai 于2020年9月14日周一 下午4:09写道:
> +1
>
> On Mon, Sep 14, 2020 at 12:30 PM Chandni Singh wrote:
>
>> +1
>>
>> Chandni
>>
>> On Mon, Sep 14, 2020 at 11:41 AM Tom Graves
>> wrote:
>>
>>> +1
>>>
>>> Tom
>>>
>>> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan <
>>>
Our use case is as follows:
We repartition 6 months worth of data for each client on clientId &
recordcreationdate, so that it can write one file per partition. Our
partition is on client and recordcreationdate.
The job fills up the disk after it process say 30 tenants out of 50. I am
+1
On Mon, Sep 14, 2020 at 12:30 PM Chandni Singh wrote:
> +1
>
> Chandni
>
> On Mon, Sep 14, 2020 at 11:41 AM Tom Graves
> wrote:
>
>> +1
>>
>> Tom
>>
>> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> I'd like to call for a
There's a second new mechanism which uses TTL for cleanup of shuffle files.
Can you share more about your use case?
On Mon, Sep 14, 2020 at 1:33 PM Edward Mitchell wrote:
> We've also had some similar disk fill issues.
>
> For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM
>
We've also had some similar disk fill issues.
For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM
garbage collection. I've noticed that if RDDs maintain references in the
code, and cannot be garbage collected, then immediate shuffle files hang
around.
Best way to handle this is
+1
Chandni
On Mon, Sep 14, 2020 at 11:41 AM Tom Graves
wrote:
> +1
>
> Tom
>
> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
> Hi,
>
> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
> shuffle to improve shuffle
+1
Tom
On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan
wrote:
Hi,
I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based shuffle
to improve shuffle efficiency.Please take a look at:
- SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602
+1. Interesting indeed :)
Regards
Venkata krishnan
On Mon, Sep 14, 2020 at 11:14 AM Xingbo Jiang wrote:
> +1 This is an exciting new feature!
>
> On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan
> wrote:
>
>> Hi,
>>
>> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
+1 This is an exciting new feature!
On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan
wrote:
> Hi,
>
> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
> shuffle to improve shuffle efficiency.
> Please take a look at:
>
>- SPIP jira:
Before, the input was a byte array so we could read from it directly. Now,
the input is a `ByteBufferInputStream` so that Parquet can choose how to
allocate buffers. For example, we use vectored reads from S3 that pull back
multiple buffers in parallel.
Now that the input is a stream based on
Hi,
I have a long running application and spark seem to fill up the disk with
shuffle files. Eventually the job fails running out of disk space. Is there
a way for me to clean the shuffle files ?
Thanks
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
Ryan do you happen to have any opinion there? that particular section
was introduced in the Parquet 1.10 update:
https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
It looks like it didn't use to make a ByteBuffer each time, but read from in.
On Sun, Sep 13, 2020 at
17 matches
Mail list logo