Hi, Currently, I have my data in the cluster of Elasticsearch and I try to use spark to analyse those data. The cluster of Elasticsearch and the cluster of spark are two different clusters. And I use hadoop input format(es-hadoop) to read data in ES.
I am wondering how this environment affect the speed of analysis. If I understand well, spark will read data from ES cluster and do calculate on its own cluster(include writing shuffle result on its own machine), Is this right? If this is correct, I think that the performance will just a little bit slower than the data stored on the same cluster. I will be appreciated if someone can share his/her experience about using spark with elasticsearch. Thanks a lot in advance for your help. Cheers Gen