If the data is local to the machine then obviously it will be faster
compared to pulling it through the network and storing it locally (either
memory or disk etc). Have a look at the data locality
<http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/data_locality.html>
.

Thanks
Best Regards

On Tue, Aug 18, 2015 at 8:09 PM, gen tang <gen.tan...@gmail.com> wrote:

> Hi,
>
> Currently, I have my data in the cluster of Elasticsearch and I try to use
> spark to analyse those data.
> The cluster of Elasticsearch and the cluster of spark are two different
> clusters. And I use hadoop input format(es-hadoop) to read data in ES.
>
> I am wondering how this environment affect the speed of analysis.
> If I understand well, spark will read data from ES cluster and do
> calculate on its own cluster(include writing shuffle result on its own
> machine), Is this right? If this is correct, I think that the performance
> will just a little bit slower than the data stored on the same cluster.
>
> I will be appreciated if someone can share his/her experience about using
> spark with elasticsearch.
>
> Thanks a lot in advance for your help.
>
> Cheers
> Gen
>

Reply via email to