Re: Spark works with the data in another cluster(Elasticsearch)

Akhil Das Tue, 25 Aug 2015 03:02:26 -0700

If the data is local to the machine then obviously it will be faster
compared to pulling it through the network and storing it locally (either
memory or disk etc). Have a look at the data locality
<http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/data_locality.html>
.


Thanks
Best Regards

On Tue, Aug 18, 2015 at 8:09 PM, gen tang <gen.tan...@gmail.com> wrote:

> Hi,
>
> Currently, I have my data in the cluster of Elasticsearch and I try to use
> spark to analyse those data.
> The cluster of Elasticsearch and the cluster of spark are two different
> clusters. And I use hadoop input format(es-hadoop) to read data in ES.
>
> I am wondering how this environment affect the speed of analysis.
> If I understand well, spark will read data from ES cluster and do
> calculate on its own cluster(include writing shuffle result on its own
> machine), Is this right? If this is correct, I think that the performance
> will just a little bit slower than the data stored on the same cluster.
>
> I will be appreciated if someone can share his/her experience about using
> spark with elasticsearch.
>
> Thanks a lot in advance for your help.
>
> Cheers
> Gen
>

Re: Spark works with the data in another cluster(Elasticsearch)

Reply via email to