Re: Exact meaning of spark.memory.storageFraction in spark 2.3.x [Marketing Mail] [Marketing Mail]

2020-03-20 Thread Michel Sumbul
Hi Iacovos, thansk for the reply its super clear. Do you know if there is a way to know the max memory usage? In the spark ui 2.3.x the "peak memory usage" metris is always at zero. Thanks, Michel

Re: [Spark SQL]: Stability of large many-to-many joins

2020-03-20 Thread nathan grand
Being many-to-many on two similar sized large datasets, does salting really help? On Fri, Mar 20, 2020 at 3:26 PM Peyman Mohajerian wrote: > Two options, either add salting to your join or filter records that are > frequent, join them separately and the union back, it's the skew join issue. > >

Re: [Spark SQL]: Stability of large many-to-many joins

2020-03-20 Thread Peyman Mohajerian
Two options, either add salting to your join or filter records that are frequent, join them separately and the union back, it's the skew join issue. On Fri, Mar 20, 2020 at 4:12 AM nathan grand wrote: > Hi, > > I have two very large datasets, which both have many repeated keys, which I > wish

Re: Exact meaning of spark.memory.storageFraction in spark 2.3.x [Marketing Mail] [Marketing Mail]

2020-03-20 Thread Jack Kolokasis
This is just a counter to show you the size of cached RDDs. If it is zero means that no caching has occurred. Also, even storage memory is used for computing the counter will show as zero. Iacovos On 20/3/20 4:51 μ.μ., Michel Sumbul wrote: Hi, Thanks for the very quick reply! If I see the

Re: Exact meaning of spark.memory.storageFraction in spark 2.3.x [Marketing Mail]

2020-03-20 Thread Michel Sumbul
Hi, Thanks for the very quick reply! If I see the metrics "storage memory", always at 0, does that mean that the memory is neither used for caching or computing? Thanks, Michel Garanti sans virus.

Re: Exact meaning of spark.memory.storageFraction in spark 2.3.x [Marketing Mail]

2020-03-20 Thread Jack Kolokasis
Hello Michel, Spark seperates executors memory using an adaptive boundary between storage and execution memory. If there is no caching and execution memory needs more space, then it will use a portion of the storage memory. If your program does not use caching then you can reduce storage

Exact meaning of spark.memory.storageFraction in spark 2.3.x

2020-03-20 Thread msumbul
Hello, Im asking mysef the exact meaning of the setting of spark.memory.storageFraction. The documentation mention: "Amount of storage memory immune to eviction, expressed as a fraction of the size of the region set aside by spark.memory.fraction. The higher this is, the less working memory may

[Spark SQL]: Stability of large many-to-many joins

2020-03-20 Thread nathan grand
Hi, I have two very large datasets, which both have many repeated keys, which I wish to join. A simplified example: dsA A_1 |A_2 1 |A 2 |A 3 |A 4 |A 5 |A 1 |B 2 |B 3 |B 1 |C dsB B_1 |B_2 A |B A |C A |D A |E A |F A |G B |A B |E B |G B |H C |A C |B The join I want to do is: dsA.join(dsB,