Hi,

Currently, I have my data in the cluster of Elasticsearch and I try to use
spark to analyse those data.
The cluster of Elasticsearch and the cluster of spark are two different
clusters. And I use hadoop input format(es-hadoop) to read data in ES.

I am wondering how this environment affect the speed of analysis.
If I understand well, spark will read data from ES cluster and do calculate
on its own cluster(include writing shuffle result on its own machine), Is
this right? If this is correct, I think that the performance will just a
little bit slower than the data stored on the same cluster.

I will be appreciated if someone can share his/her experience about using
spark with elasticsearch.

Thanks a lot in advance for your help.

Cheers
Gen

Reply via email to