Hi Mich,

Posting you my comments,

Right, you seem to have an on-premise Hadoop cluster of 9 physical boxes
and you want to deploy spark on it.
*My comment: Yes.*

What spec do you have for each physical host memory and CPU and disk space?
*My comment: I am not sure, of the exact numbers. but all I can say,  there
is enough space to deploy few more tools across 9 nodes.*

You can take what is known as data affinity by putting your compute layers
(spark) on the same Hadoop nodes.
*My comment: I am not aware of this, need to check on these lines.*

Having Hadoop implies that you have a YARN resource manager already plus
HDFS. YARN is the most widely used resource manager on-premise for Spark.
*My comment: we haven't installed spark in our cluster.*

Additional information:
1. implementation of Apache Mesos or apache Hadoop yarn, including spark
service with cluster mode.
so that, if I submit pyspark or spark scala jobs having "deployment-mode =
cluster", should work.
2. This implementation should be in docker containers.
3. I have to write Dockerfiles with Apache Hadoop with yarn and spark
(opensource only),
How can I do, this
4. To implement Apache Mesos with spark and deployment mode = cluster,
if have any kind of documentation or weblinks or your knowledge, could you
give that to me.
really I will help me a lot.
5. We have services like minio, trino, superset, jupyter, and so on.

Kindly help me, to accomplish this.
let me know, what else you need.


On Sun, Jul 25, 2021 at 10:35 PM Mich Talebzadeh <mich.talebza...@gmail.com>

> Hi,
> Right you seem to have an on-premise hadoop cluster of 9 physical boxes
> and you want to deploy spark on it.
> What spec do you have for each physical host memory and CPU and disk space?
> You can take what is known as data affinity by putting your compute layers
> (spark) on the same hadoop nodes.
> Having hadoop implies that you have a YARN resource manager already  plus
> HDFS. YARN is the most widely used resource manager on-premise for Spark.
> Provide some additional info and we go from there. .
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> On Sat, 24 Jul 2021 at 13:46, Dinakar Chennubotla <
> chennu.bigd...@gmail.com> wrote:
>> Hi All,
>> I am Dinakar, Hadoop admin,
>> could someone help me here,
>> 1. I have a DEV-POC task to do,
>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>> on Docker containers.
>> 3. with Scalable spark-worker containers.
>> 4. we have a 9 node cluster with some other services or tools.
>> Thanks,
>> Dinakar

Reply via email to