If you're using Kubernetes you can group spark and hdfs to run in the
same stack. Meaning they'll basically run in the same network space
and share ips. Just gotta make sure there's no port conflicts.
On Wed, Dec 28, 2016 at 5:07 AM, Karamba wrote:
>
> Good idea, thanks!
>
>
Good idea, thanks!
But unfortunately that's not possible. All containers are connected to
an overlay network.
Is there any other possiblity to say spark that it is on the same *NODE*
as an hdfs data node?
On 28.12.2016 12:00, Miguel Morales wrote:
> It might have to do with your container
It might have to do with your container ips, it depends on your
networking setup. You might want to try host networking so that the
containers share the ip with the host.
On Wed, Dec 28, 2016 at 1:46 AM, Karamba wrote:
>
> Hi Sun Rui,
>
> thanks for answering!
>
>
>> Although
Hi Sun Rui,
thanks for answering!
> Although the Spark task scheduler is aware of rack-level data locality, it
> seems that only YARN implements the support for it.
This explains why the script that I configured in core-site.xml
topology.script.file.name is not called in by the spark
Although the Spark task scheduler is aware of rack-level data locality, it
seems that only YARN implements the support for it. However, node-level
locality can still work for Standalone.
It is not necessary to copy the hadoop config files into the Spark CONF
directory. Set HADOOP_CONF_DIR to
Hi,
I am running a couple of docker hosts, each with an HDFS and a spark
worker in a spark standalone cluster.
In order to get data locality awareness, I would like to configure Racks
for each host, so that a spark worker container knows from which hdfs
node container it should load its data.