Yes, as far as I can tell, your description is accurate.

Thanks,
Gene

On Wed, Jan 4, 2017 at 9:37 PM, Vin J <winjos...@gmail.com> wrote:

> Thanks for the reply Gene. Looks like this means, with Spark 2.x, one has
> to change from rdd.persist(StorageLevel.OFF_HEAP) to 
> rdd.saveAsTextFile(alluxioPath)
> / rdd.saveAsObjectFile (alluxioPath) for guarantees like persisted rdd
> surviving a Spark JVM crash etc,  as also the other benefits you mention.
>
> Vin.
>
> On Thu, Jan 5, 2017 at 2:50 AM, Gene Pang <gene.p...@gmail.com> wrote:
>
>> Hi Vin,
>>
>> From Spark 2.x, OFF_HEAP was changed to no longer directly interface with
>> an external block store. The previous tight dependency was restrictive and
>> reduced flexibility. It looks like the new version uses the executor's off
>> heap memory to allocate direct byte buffers, and does not interface with
>> any external system for the data storage. I am not aware of a way to
>> connect the new version of OFF_HEAP to Alluxio.
>>
>> You can experience similar benefits of the old OFF_HEAP <-> Tachyon mode
>> as well as additional benefits like unified namespace
>> <http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html>
>>  or
>> sharing in-memory data across applications, by using the Alluxio
>> filesystem API
>> <http://www.alluxio.org/docs/master/en/File-System-API.html>.
>>
>> I hope this helps!
>>
>> Thanks,
>> Gene
>>
>> On Wed, Jan 4, 2017 at 10:50 AM, Vin J <winjos...@gmail.com> wrote:
>>
>>> Until Spark 1.6 I see there were specific properties to configure such
>>> as the external block store master url (spark.externalBlockStore.url) etc
>>> to use OFF_HEAP storage level which made it clear that an external Tachyon
>>> type of block store as required/used for OFF_HEAP storage.
>>>
>>> Can someone clarify how this has been changed in Spark 2.x - because I
>>> do not see config settings anymore that point Spark to an external block
>>> store like Tachyon (now Alluxio) (or am i missing seeing it?)
>>>
>>> I understand there are ways to use Alluxio with Spark, but how about
>>> OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit
>>> alluxio/external block store? Any pointers to design decisions/Spark JIRAs
>>> related to this will also help.
>>>
>>> Thanks,
>>> Vin.
>>>
>>
>>
>

Reply via email to