You're have to carefully choose if your strategy makes sense given your users workloads. Hence, I am not sure your reasoning makes sense.
However, You can , for example, install openstack swift as an object store and use this as storage. HDFS in this case can be used as a temporary store and/or for checkpointing. Alternatively you can do this fully in-memory with ignite or alluxio. S3 is the cloud storage provided by Amazon - this is not on premise. You can do the same here as a described above, but using s3 instead of swift. > On 12 Feb 2017, at 05:28, Benjamin Kim <bbuil...@gmail.com> wrote: > > Has anyone got some advice on how to remove the reliance on HDFS for storing > persistent data. We have an on-premise Spark cluster. It seems like a waste > of resources to keep adding nodes because of a lack of storage space only. I > would rather add more powerful nodes due to the lack of processing power at a > less frequent rate, than add less powerful nodes at a more frequent rate just > to handle the ever growing data. Can anyone point me in the right direction? > Is Alluxio a good solution? S3? I would like to hear your thoughts. > > Cheers, > Ben > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org