You're have to carefully choose if your strategy makes sense given your users 
workloads. Hence, I am not sure your reasoning makes sense.

However, You can , for example, install openstack swift  as an object store and 
use this as storage. HDFS in this case can be used as a temporary store and/or 
for checkpointing. Alternatively you can do this fully in-memory with ignite or 
alluxio.

S3 is the cloud storage provided by Amazon - this is not on premise. You can do 
the same here as a described above, but using s3 instead of swift.

> On 12 Feb 2017, at 05:28, Benjamin Kim <bbuil...@gmail.com> wrote:
> 
> Has anyone got some advice on how to remove the reliance on HDFS for storing 
> persistent data. We have an on-premise Spark cluster. It seems like a waste 
> of resources to keep adding nodes because of a lack of storage space only. I 
> would rather add more powerful nodes due to the lack of processing power at a 
> less frequent rate, than add less powerful nodes at a more frequent rate just 
> to handle the ever growing data. Can anyone point me in the right direction? 
> Is Alluxio a good solution? S3? I would like to hear your thoughts.
> 
> Cheers,
> Ben 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to