Yes, as far as I can tell, your description is accurate. Thanks, Gene
On Wed, Jan 4, 2017 at 9:37 PM, Vin J <winjos...@gmail.com> wrote: > Thanks for the reply Gene. Looks like this means, with Spark 2.x, one has > to change from rdd.persist(StorageLevel.OFF_HEAP) to > rdd.saveAsTextFile(alluxioPath) > / rdd.saveAsObjectFile (alluxioPath) for guarantees like persisted rdd > surviving a Spark JVM crash etc, as also the other benefits you mention. > > Vin. > > On Thu, Jan 5, 2017 at 2:50 AM, Gene Pang <gene.p...@gmail.com> wrote: > >> Hi Vin, >> >> From Spark 2.x, OFF_HEAP was changed to no longer directly interface with >> an external block store. The previous tight dependency was restrictive and >> reduced flexibility. It looks like the new version uses the executor's off >> heap memory to allocate direct byte buffers, and does not interface with >> any external system for the data storage. I am not aware of a way to >> connect the new version of OFF_HEAP to Alluxio. >> >> You can experience similar benefits of the old OFF_HEAP <-> Tachyon mode >> as well as additional benefits like unified namespace >> <http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html> >> or >> sharing in-memory data across applications, by using the Alluxio >> filesystem API >> <http://www.alluxio.org/docs/master/en/File-System-API.html>. >> >> I hope this helps! >> >> Thanks, >> Gene >> >> On Wed, Jan 4, 2017 at 10:50 AM, Vin J <winjos...@gmail.com> wrote: >> >>> Until Spark 1.6 I see there were specific properties to configure such >>> as the external block store master url (spark.externalBlockStore.url) etc >>> to use OFF_HEAP storage level which made it clear that an external Tachyon >>> type of block store as required/used for OFF_HEAP storage. >>> >>> Can someone clarify how this has been changed in Spark 2.x - because I >>> do not see config settings anymore that point Spark to an external block >>> store like Tachyon (now Alluxio) (or am i missing seeing it?) >>> >>> I understand there are ways to use Alluxio with Spark, but how about >>> OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit >>> alluxio/external block store? Any pointers to design decisions/Spark JIRAs >>> related to this will also help. >>> >>> Thanks, >>> Vin. >>> >> >> >