Has anyone got some advice on how to remove the reliance on HDFS for storing persistent data. We have an on-premise Spark cluster. It seems like a waste of resources to keep adding nodes because of a lack of storage space only. I would rather add more powerful nodes due to the lack of processing power at a less frequent rate, than add less powerful nodes at a more frequent rate just to handle the ever growing data. Can anyone point me in the right direction? Is Alluxio a good solution? S3? I would like to hear your thoughts.
Cheers, Ben --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org