Re: Spark Streaming with Tachyon : Some findings

2015-05-08 Thread Haoyuan Li
Thanks for the updates!

Best,

Haoyuan

On Fri, May 8, 2015 at 8:40 AM, Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com> wrote:

> Just a followup on this Thread .
>
> I tried Hierarchical Storage on Tachyon (
> http://tachyon-project.org/Hierarchy-Storage-on-Tachyon.html ) , and that
> seems to have worked and I did not see any any Spark Job failed due to
> BlockNotFoundException.
> below is my  Hierarchical Storage settings..
>
>   -Dtachyon.worker.hierarchystore.level.max=2
>   -Dtachyon.worker.hierarchystore.level0.alias=MEM
>   -Dtachyon.worker.hierarchystore.level0.dirs.path=$TACHYON_RAM_FOLDER
>
>
> -Dtachyon.worker.hierarchystore.level0.dirs.quota=$TACHYON_WORKER_MEMORY_SIZE
>   -Dtachyon.worker.hierarchystore.level1.alias=HDD
>   -Dtachyon.worker.hierarchystore.level1.dirs.path=/mnt/tachyon
>   -Dtachyon.worker.hierarchystore.level1.dirs.quota=50GB
>   -Dtachyon.worker.allocate.strategy=MAX_FREE
>   -Dtachyon.worker.evict.strategy=LRU
>
> Regards,
> Dibyendu
>
> On Thu, May 7, 2015 at 1:46 PM, Dibyendu Bhattacharya <
> dibyendu.bhattach...@gmail.com> wrote:
>
> > Dear All ,
> >
> > I have been playing with Spark Streaming on Tachyon as the OFF_HEAP block
> > store  . Primary reason for evaluating Tachyon is to find if Tachyon can
> > solve the Spark BlockNotFoundException .
> >
> > In traditional MEMORY_ONLY StorageLevel, when blocks are evicted , jobs
> > failed due to block not found exception and storing blocks in
> > MEMORY_AND_DISK is not a good option either as it impact the throughput a
> > lot .
> >
> >
> > To test how Tachyon behave , I took the latest spark 1.4 from master ,
> and
> > used Tachyon 0.6.4 and configured Tachyon in Fault Tolerant Mode .
> Tachyon
> > is running in 3 Node AWS x-large cluster and Spark is running in 3 node
> AWS
> > x-large cluster.
> >
> > I have used the low level Receiver based Kafka consumer (
> > https://github.com/dibbhatt/kafka-spark-consumer)  which I have written
> > to pull from Kafka and write Blocks to Tachyon
> >
> >
> > I found there is similar improvement in throughput (as MEMORY_ONLY case )
> > but very good overall memory utilization (as it is off heap store) .
> >
> >
> > But I found one issue on which I need to clarification .
> >
> >
> > In Tachyon case also , I find  BlockNotFoundException  , but due to a
> > different reason .  What I see TachyonBlockManager.scala put the blocks
> in
> > WriteType.TRY_CACHE configuration . And because of this Blocks ate
> evicted
> > from Tachyon Cache and when Spark try to find the block it throws
> >  BlockNotFoundException .
> >
> > I see a pull request which discuss the same ..
> >
> > https://github.com/apache/spark/pull/158#discussion_r11195271
> >
> >
> > When I modified the WriteType to CACHE_THROUGH , BlockDropException is
> > gone , but it again impact the throughput ..
> >
> >
> > Just curious to know , if Tachyon has any settings which can solve the
> > Block Eviction from Cache to Disk, other than explicitly setting
> > CACHE_THROUGH  ?
> >
> > Regards,
> > Dibyendu
> >
> >
> >
>



-- 
Haoyuan Li
CEO, Tachyon Nexus 
AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/


Re: Spark Streaming with Tachyon : Some findings

2015-05-08 Thread Dibyendu Bhattacharya
Just a followup on this Thread .

I tried Hierarchical Storage on Tachyon (
http://tachyon-project.org/Hierarchy-Storage-on-Tachyon.html ) , and that
seems to have worked and I did not see any any Spark Job failed due to
BlockNotFoundException.
below is my  Hierarchical Storage settings..

  -Dtachyon.worker.hierarchystore.level.max=2
  -Dtachyon.worker.hierarchystore.level0.alias=MEM
  -Dtachyon.worker.hierarchystore.level0.dirs.path=$TACHYON_RAM_FOLDER

-Dtachyon.worker.hierarchystore.level0.dirs.quota=$TACHYON_WORKER_MEMORY_SIZE
  -Dtachyon.worker.hierarchystore.level1.alias=HDD
  -Dtachyon.worker.hierarchystore.level1.dirs.path=/mnt/tachyon
  -Dtachyon.worker.hierarchystore.level1.dirs.quota=50GB
  -Dtachyon.worker.allocate.strategy=MAX_FREE
  -Dtachyon.worker.evict.strategy=LRU

Regards,
Dibyendu

On Thu, May 7, 2015 at 1:46 PM, Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com> wrote:

> Dear All ,
>
> I have been playing with Spark Streaming on Tachyon as the OFF_HEAP block
> store  . Primary reason for evaluating Tachyon is to find if Tachyon can
> solve the Spark BlockNotFoundException .
>
> In traditional MEMORY_ONLY StorageLevel, when blocks are evicted , jobs
> failed due to block not found exception and storing blocks in
> MEMORY_AND_DISK is not a good option either as it impact the throughput a
> lot .
>
>
> To test how Tachyon behave , I took the latest spark 1.4 from master , and
> used Tachyon 0.6.4 and configured Tachyon in Fault Tolerant Mode . Tachyon
> is running in 3 Node AWS x-large cluster and Spark is running in 3 node AWS
> x-large cluster.
>
> I have used the low level Receiver based Kafka consumer (
> https://github.com/dibbhatt/kafka-spark-consumer)  which I have written
> to pull from Kafka and write Blocks to Tachyon
>
>
> I found there is similar improvement in throughput (as MEMORY_ONLY case )
> but very good overall memory utilization (as it is off heap store) .
>
>
> But I found one issue on which I need to clarification .
>
>
> In Tachyon case also , I find  BlockNotFoundException  , but due to a
> different reason .  What I see TachyonBlockManager.scala put the blocks in
> WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted
> from Tachyon Cache and when Spark try to find the block it throws
>  BlockNotFoundException .
>
> I see a pull request which discuss the same ..
>
> https://github.com/apache/spark/pull/158#discussion_r11195271
>
>
> When I modified the WriteType to CACHE_THROUGH , BlockDropException is
> gone , but it again impact the throughput ..
>
>
> Just curious to know , if Tachyon has any settings which can solve the
> Block Eviction from Cache to Disk, other than explicitly setting
> CACHE_THROUGH  ?
>
> Regards,
> Dibyendu
>
>
>