Re: Installing Distributed apache spark cluster with Cluster mode on Docker

Dinakar Chennubotla Sun, 25 Jul 2021 11:02:23 -0700

Hi Mich,

Posting you my comments,

Right, you seem to have an on-premise Hadoop cluster of 9 physical boxes
and you want to deploy spark on it.
*My comment: Yes.*

What spec do you have for each physical host memory and CPU and disk space?
*My comment: I am not sure, of the exact numbers. but all I can say,  there
is enough space to deploy few more tools across 9 nodes.*

You can take what is known as data affinity by putting your compute layers
(spark) on the same Hadoop nodes.
*My comment: I am not aware of this, need to check on these lines.*

Having Hadoop implies that you have a YARN resource manager already plus
HDFS. YARN is the most widely used resource manager on-premise for Spark.
*My comment: we haven't installed spark in our cluster.*

Additional information:
==================
Agenda:
1. implementation of Apache Mesos or apache Hadoop yarn, including spark
service with cluster mode.
so that, if I submit pyspark or spark scala jobs having "deployment-mode =
cluster", should work.
2. This implementation should be in docker containers.
3. I have to write Dockerfiles with Apache Hadoop with yarn and spark
(opensource only),
How can I do, this
4. To implement Apache Mesos with spark and deployment mode = cluster,
if have any kind of documentation or weblinks or your knowledge, could you
give that to me.
really I will help me a lot.
5. We have services like minio, trino, superset, jupyter, and so on.

Kindly help me, to accomplish this.
let me know, what else you need.

Thanks,
Dinakar

On Sun, Jul 25, 2021 at 10:35 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> Right you seem to have an on-premise hadoop cluster of 9 physical boxes
> and you want to deploy spark on it.
>
> What spec do you have for each physical host memory and CPU and disk space?
>
> You can take what is known as data affinity by putting your compute layers
> (spark) on the same hadoop nodes.
>
> Having hadoop implies that you have a YARN resource manager already  plus
> HDFS. YARN is the most widely used resource manager on-premise for Spark.
>
> Provide some additional info and we go from there. .
>
> HTH
>
>
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 24 Jul 2021 at 13:46, Dinakar Chennubotla <
> chennu.bigd...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am Dinakar, Hadoop admin,
>> could someone help me here,
>>
>> 1. I have a DEV-POC task to do,
>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>> on Docker containers.
>> 3. with Scalable spark-worker containers.
>> 4. we have a 9 node cluster with some other services or tools.
>>
>> Thanks,
>> Dinakar
>>
>

Re: Installing Distributed apache spark cluster with Cluster mode on Docker

Reply via email to