Re: Off-heap storage and dynamic allocation

Reynold Xin Tue, 03 Nov 2015 13:26:40 -0800

It is quite a bit of work. Again, I think going through the file system API
is more ideal in the long run. In the long run, I don't even think the
current offheap API makes much sense, and we should consider just removing
it to simplify things.


On Tue, Nov 3, 2015 at 1:20 PM, Justin Uang <justin.u...@gmail.com> wrote:

> Alright, we'll just stick with normal caching then.
>
> Just for future reference, how much work would it be to get it to retain
> the partitions in tachyon. This is especially helpful in a multitenant
> situation, where many users each have their own persistent spark contexts,
> but where the notebooks can be idle for long periods of time while holding
> onto cached rdds.
>
> On Tue, Nov 3, 2015 at 10:15 PM Reynold Xin <r...@databricks.com> wrote:
>
>> It is lost unfortunately (although can be recomputed automatically).
>>
>>
>> On Tue, Nov 3, 2015 at 1:13 PM, Justin Uang <justin.u...@gmail.com>
>> wrote:
>>
>>> Thanks for your response. I was worried about #3, vs being able to use
>>> the objects directly. #2 seems to be the dealbreaker for my use case right?
>>> Even if it I am using tachyon for caching, if an executor is lost, then
>>> that partition is lost for the purposes of spark?
>>>
>>> On Tue, Nov 3, 2015 at 5:53 PM Reynold Xin <r...@databricks.com> wrote:
>>>
>>>> I don't think there is any special handling w.r.t. Tachyon vs in-heap
>>>> caching. As a matter of fact, I think the current offheap caching
>>>> implementation is pretty bad, because:
>>>>
>>>> 1. There is no namespace sharing in offheap mode
>>>> 2. Similar to 1, you cannot recover the offheap memory once Spark
>>>> driver or executor crashes
>>>> 3. It requires expensive serialization to go offheap
>>>>
>>>> It would've been simpler to just treat Tachyon as a normal file system,
>>>> and use it that way to at least satisfy 1 and 2, and also substantially
>>>> simplify the internals.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Nov 3, 2015 at 7:59 AM, Justin Uang <justin.u...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yup, but I'm wondering what happens when an executor does get removed,
>>>>> but when we're using tachyon. Will the cached data still be available,
>>>>> since we're using off-heap storage, so the data isn't stored in the
>>>>> executor?
>>>>>
>>>>> On Tue, Nov 3, 2015 at 4:57 PM Ryan Williams <
>>>>> ryan.blake.willi...@gmail.com> wrote:
>>>>>
>>>>>> fwiw, I think that having cached RDD partitions prevents executors
>>>>>> from being removed under dynamic allocation by default; see
>>>>>> SPARK-8958 <https://issues.apache.org/jira/browse/SPARK-8958>. The
>>>>>> "spark.dynamicAllocation.cachedExecutorIdleTimeout" config
>>>>>> <http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation>
>>>>>> controls this.
>>>>>>
>>>>>> On Fri, Oct 30, 2015 at 12:14 PM Justin Uang <justin.u...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> According to the docs for 1.5.1, when an executor is removed for
>>>>>>> dynamic allocation, the cached data is gone. If I use off-heap storage 
>>>>>>> like
>>>>>>> tachyon, conceptually there isn't this issue anymore, but is the cached
>>>>>>> data still available in practice? This would be great because then we 
>>>>>>> would
>>>>>>> be able to set spark.dynamicAllocation.cachedExecutorIdleTimeout to be
>>>>>>> quite small.
>>>>>>>
>>>>>>> ==================
>>>>>>> In addition to writing shuffle files, executors also cache data
>>>>>>> either on disk or in memory. When an executor is removed, however, all
>>>>>>> cached data will no longer be accessible. There is currently not yet a
>>>>>>> solution for this in Spark 1.2. In future releases, the cached data may 
>>>>>>> be
>>>>>>> preserved through an off-heap storage similar in spirit to how shuffle
>>>>>>> files are preserved through the external shuffle service.
>>>>>>> ==================
>>>>>>>
>>>>>>
>>>>
>>

Re: Off-heap storage and dynamic allocation

Reply via email to