Re: Low cache hit ratio when running Spark on Alluxio

2019-09-19 Thread Bin Fan
Depending on the Alluxio version you are running, e..g, for 2.0, the metrics of the local short-circuit read is not turned on by default. So I would suggest you to first turn on the metrics collecting local short-circuit reads by setting alluxio.user.metrics.collection.enabled=true Regarding the

Low cache hit ratio when running Spark on Alluxio

2019-08-28 Thread Jerry Yan
Hi, We are running Spark jobs on an Alluxio Cluster which is serving 13 gigabytes of data with 99% of the data is in memory. I was hoping to speed up the Spark jobs by reading the in-memory data in Alluxio, but found Alluxio local hit rate is only 1.68%, while Alluxio remote hit rate is 98.32%.