Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-27 Thread Dinakar Chennubotla
alability but you need to ask whoever is
> requesting this to justify having spark inside docker as opposed to Spark
> running alongside Hadoop on premise.
> 4. To implement Apache Mesos with spark and deployment mode = cluster,
> if have any kind of documentation or weblinks or your knowledge, could you
> give that to me.
> really I will help me a lot.
> 5. We have services like minio, trino, superset, jupyter, and so on.
>
> OK we cross the bridge for 4 and 5 when there is justification for 2. The
> alternative is for Spark to be installed on each or few designated nodes of
> hadoop on your physical hosts on premise.
>
>
> Kindly help me, to accomplish this.
> let me know, what else you need.
>
> Thanks,
> Dinakar
>
>
>
> On Sun, Jul 25, 2021 at 10:35 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> Right you seem to have an on-premise hadoop cluster of 9 physical boxes
>> and you want to deploy spark on it.
>>
>> What spec do you have for each physical host memory and CPU and disk
>> space?
>>
>> You can take what is known as data affinity by putting your compute
>> layers (spark) on the same hadoop nodes.
>>
>> Having hadoop implies that you have a YARN resource manager already  plus
>> HDFS. YARN is the most widely used resource manager on-premise for Spark.
>>
>> Provide some additional info and we go from there. .
>>
>> HTH
>>
>>
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 24 Jul 2021 at 13:46, Dinakar Chennubotla <
>> chennu.bigd...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am Dinakar, Hadoop admin,
>>> could someone help me here,
>>>
>>> 1. I have a DEV-POC task to do,
>>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>>> on Docker containers.
>>> 3. with Scalable spark-worker containers.
>>> 4. we have a 9 node cluster with some other services or tools.
>>>
>>> Thanks,
>>> Dinakar
>>>
>>


Fwd: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-25 Thread Mich Talebzadeh
Jul 25, 2021 at 10:35 PM Mich Talebzadeh 
wrote:

> Hi,
>
> Right you seem to have an on-premise hadoop cluster of 9 physical boxes
> and you want to deploy spark on it.
>
> What spec do you have for each physical host memory and CPU and disk space?
>
> You can take what is known as data affinity by putting your compute layers
> (spark) on the same hadoop nodes.
>
> Having hadoop implies that you have a YARN resource manager already  plus
> HDFS. YARN is the most widely used resource manager on-premise for Spark.
>
> Provide some additional info and we go from there. .
>
> HTH
>
>
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 24 Jul 2021 at 13:46, Dinakar Chennubotla <
> chennu.bigd...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am Dinakar, Hadoop admin,
>> could someone help me here,
>>
>> 1. I have a DEV-POC task to do,
>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>> on Docker containers.
>> 3. with Scalable spark-worker containers.
>> 4. we have a 9 node cluster with some other services or tools.
>>
>> Thanks,
>> Dinakar
>>
>


Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-25 Thread Dinakar Chennubotla
Hi Mich,

Posting you my comments,

Right, you seem to have an on-premise Hadoop cluster of 9 physical boxes
and you want to deploy spark on it.
*My comment: Yes.*

What spec do you have for each physical host memory and CPU and disk space?
*My comment: I am not sure, of the exact numbers. but all I can say,  there
is enough space to deploy few more tools across 9 nodes.*

You can take what is known as data affinity by putting your compute layers
(spark) on the same Hadoop nodes.
*My comment: I am not aware of this, need to check on these lines.*

Having Hadoop implies that you have a YARN resource manager already plus
HDFS. YARN is the most widely used resource manager on-premise for Spark.
*My comment: we haven't installed spark in our cluster.*

Additional information:
==
Agenda:
1. implementation of Apache Mesos or apache Hadoop yarn, including spark
service with cluster mode.
so that, if I submit pyspark or spark scala jobs having "deployment-mode =
cluster", should work.
2. This implementation should be in docker containers.
3. I have to write Dockerfiles with Apache Hadoop with yarn and spark
(opensource only),
How can I do, this
4. To implement Apache Mesos with spark and deployment mode = cluster,
if have any kind of documentation or weblinks or your knowledge, could you
give that to me.
really I will help me a lot.
5. We have services like minio, trino, superset, jupyter, and so on.

Kindly help me, to accomplish this.
let me know, what else you need.

Thanks,
Dinakar



On Sun, Jul 25, 2021 at 10:35 PM Mich Talebzadeh 
wrote:

> Hi,
>
> Right you seem to have an on-premise hadoop cluster of 9 physical boxes
> and you want to deploy spark on it.
>
> What spec do you have for each physical host memory and CPU and disk space?
>
> You can take what is known as data affinity by putting your compute layers
> (spark) on the same hadoop nodes.
>
> Having hadoop implies that you have a YARN resource manager already  plus
> HDFS. YARN is the most widely used resource manager on-premise for Spark.
>
> Provide some additional info and we go from there. .
>
> HTH
>
>
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 24 Jul 2021 at 13:46, Dinakar Chennubotla <
> chennu.bigd...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am Dinakar, Hadoop admin,
>> could someone help me here,
>>
>> 1. I have a DEV-POC task to do,
>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>> on Docker containers.
>> 3. with Scalable spark-worker containers.
>> 4. we have a 9 node cluster with some other services or tools.
>>
>> Thanks,
>> Dinakar
>>
>


Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-25 Thread Mich Talebzadeh
Hi,

Right you seem to have an on-premise hadoop cluster of 9 physical boxes and
you want to deploy spark on it.

What spec do you have for each physical host memory and CPU and disk space?

You can take what is known as data affinity by putting your compute layers
(spark) on the same hadoop nodes.

Having hadoop implies that you have a YARN resource manager already  plus
HDFS. YARN is the most widely used resource manager on-premise for Spark.

Provide some additional info and we go from there. .

HTH





   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 24 Jul 2021 at 13:46, Dinakar Chennubotla 
wrote:

> Hi All,
>
> I am Dinakar, Hadoop admin,
> could someone help me here,
>
> 1. I have a DEV-POC task to do,
> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
> on Docker containers.
> 3. with Scalable spark-worker containers.
> 4. we have a 9 node cluster with some other services or tools.
>
> Thanks,
> Dinakar
>


Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-25 Thread Khalid Mammadov
Hi Dinakar

If you are aim is to run Spark in “distributed mode” then all these cluster 
modes (excluding local) runs the cluster in distributed mode anyway.
As I said before the “deployment =cluster” mode is only for Driver application 
and executors are running on worker nodes in parallel mode (distributed).
This is how Spark works. So you can only choose where to run a “driver” 
application which defines what to do and waits for application execution and 
actual/most work is done in the worker nodes.
That in mind you can start (submit) you python code from local and target a 
cluster started in standalone mode (using doker for example) and get your 
distributed execution.

# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000


Take a look below link and snippet from there:


https://spark.apache.org/docs/latest/submitting-applications.html



† A common deployment strategy is to submit your application from a gateway 
machine that is physically co-located with your worker machines (e.g. Master 
node in a standalone EC2 cluster). In this setup, client mode is appropriate. 
In client mode, the driver is launched directly within the spark-submit process 
which acts as a client to the cluster. The input and output of the application 
is attached to the console. Thus, this mode is especially suitable for 
applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the 
worker machines (e.g. locally on your laptop), it is common to use cluster mode 
to minimize network latency between the drivers and the executors. Currently, 
the standalone mode does not support cluster mode for Python applications.


Regarding Mesos and Yarn, I can’t comment on those as I don’t have experience 
with those modes. But I found this relevance for you ou: 
https://stackoverflow.com/questions/36461054/i-cant-seem-to-get-py-files-on-spark-to-work

Another suggestion is to keep CCing Spark user group email. So if I can’t 
answer then someone else may have. I am CCing and you can reply all.

Hope these all helps.

Regards
Khalid

Sent from my iPad

> On 25 Jul 2021, at 10:50, Dinakar Chennubotla  
> wrote:
> 
> Hi Khalid Mammadov,
> 
> I am now, reworking from scratch i.e. on How to build Distributed 
> apache-spark cluster, using yarn or apache mesos.
> 
> Sending you, my initial sketch. pictorial representation on the same.
> 
> Could you help me with the below:
> ==
> As per the Diagram,
> 1. I have to write Dockerfiles with Apache Hadoop with yarn and spark 
> (opensource only),
> How can I do, this 
> Your comments :
> 
> 2. To implement Apache Mesos with spark and deployment mode = cluster,
> if have any kind of documentation or weblinks or your knowledge, could you 
> give that to me.
> really I will help me a lot.
> your comments:
> 
> Thanks,
> dinakar
> 
> Thanks,
> Dinakar
> 
> On Sun, Jul 25, 2021 at 12:56 PM Khalid Mammadov  
> wrote:
>> Sorry Dinakat, unfortunately I dont have much availablety, but you cant drop 
>> me your questions and I would be happy to help as much as I can
>> 
>> On Sun, 25 Jul 2021, 04:17 Dinakar Chennubotla,  
>> wrote:
>>> Agenda:
>>> 1. Hoq to implementation of Apache mesos or apache Hadoop yarn, including 
>>> spark service with cluster mode.
>>> 2.  Exploration on dockering the above tools 
>>> 
>>> 
>>> Thanks,
>>> Dinakar
>>> 
>>> On Sun, 25 Jul, 2021, 08:43 Dinakar Chennubotla,  
>>> wrote:
>>>> Hi Khalid Mammadov,.
>>>> 
>>>> With all the mail discussion that we had till now, you got brief knowledge 
>>>> on my issue.
>>>> 
>>>> I would like to request we plan a zoom meeting and can complete this in 
>>>> not exceeding more than one or two sessions.
>>>> 
>>>> Kindly, let me know your availability and comments.
>>>> 
>>>> If not, we will continue our mail discussion.
>>>> 
>>>> Thanks,
>>>> Dinakar
>>>> 
>>>> On Sun, 25 Jul, 2021, 01:12 Khalid Mammadov,  
>>>> wrote:
>>>>> Had another look to your screen shot. It's also about Python, as this a 
>>>>> wrapper for java and cluster runs on java (JVM) it cant run python driver 
>>>>> inside. That means you can only run .jar files on cluster mode.
>>>>> 
>>>>> Hope all these make sense
>>>>> 
>>>>> On Sat, 24 Jul 2021, 19:58 Khalid Mammadov,  
>>>>> wrote:
>>>>>> From that link:
&

Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-24 Thread Khalid Mammadov
Can you share your Dockerfile (not all but gist of it) and instructions how
you do it and what you actually run to get that message?

I have just pushed my local repo to Github where I have created an example
of Spark on Docker some time ago.
Please take a look and compare what you are doing.

https://github.com/khalidmammadov/spark_docker


On Sat, Jul 24, 2021 at 4:07 PM Dinakar Chennubotla <
chennu.bigd...@gmail.com> wrote:

> Hi Khalid Mammadov,
>
> I tried the which says distributed mode Spark installation. But when run
> below command it says " deployment mode = cluster is not allowed in
> standalone cluster".
>
> Source Url I used is:
>
> https://towardsdatascience.com/diy-apache-spark-docker-bb4f11c10d24?gi=fa52ac767c0b
>
> Kiddly refer this section in the url I mentioned.
> "Docker & Spark — Multiple Machines"
>
> I removed third party things and dockerized my way.
>
> Thanks,
> Dinakar
>
> On Sat, 24 Jul, 2021, 20:28 Khalid Mammadov, 
> wrote:
>
>> Standalone mode already implies you are running on cluster (distributed)
>> mode. i.e. it's one of 4 available cluster manager options. The difference
>> is Standalone uses it's one resource manager rather than using YARN for
>> example.
>> If you are running docker on a single machine then you are limited to
>> that but if you run your docker on a cluster and deploy your Spark
>> containers on it then you will get your distribution and cluster mode.
>> And also If you are referring to scalability then you need to register
>> worker nodes when you need to scale.
>> You do it by registering a VM/container as a worker node as per doc using:
>>
>> ./sbin/start-worker.sh 
>>
>> You can create a new docker container with your base image and run the above 
>> command on the bootstrap and that would register a worker node and scale 
>> your cluster when you want.
>>
>> And if you kill them then you would scale down ( I think this is how 
>> Databricks autoscaling works..). I am not sure k8s TBH, perhaps it's handled 
>> this more gracefully
>>
>>
>> On Sat, Jul 24, 2021 at 3:38 PM Dinakar Chennubotla <
>> chennu.bigd...@gmail.com> wrote:
>>
>>> Hi Khalid Mammadov,
>>>
>>> Thank you for your response,
>>> Yes, I did, I built standalone apache spark cluster on docker containers.
>>>
>>> But I am looking for distributed spark cluster,
>>> Where spark workers are scalable and spark "deployment mode  = cluster".
>>>
>>> Source url I used to built standalone apache spark cluster
>>> https://www.kdnuggets.com/2020/07/apache-spark-cluster-docker.html
>>>
>>> If you have documentation on distributed spark, which I am looking for,
>>> could you please send me.
>>>
>>>
>>> Thanks,
>>> Dinakar
>>>
>>> On Sat, 24 Jul, 2021, 19:32 Khalid Mammadov, 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Have you checked out docs?
>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>
>>>> Thanks,
>>>> Khalid
>>>>
>>>> On Sat, Jul 24, 2021 at 1:45 PM Dinakar Chennubotla <
>>>> chennu.bigd...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am Dinakar, Hadoop admin,
>>>>> could someone help me here,
>>>>>
>>>>> 1. I have a DEV-POC task to do,
>>>>> 2. Need to Installing Distributed apache-spark cluster with Cluster
>>>>> mode on Docker containers.
>>>>> 3. with Scalable spark-worker containers.
>>>>> 4. we have a 9 node cluster with some other services or tools.
>>>>>
>>>>> Thanks,
>>>>> Dinakar
>>>>>
>>>>


Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-24 Thread Khalid Mammadov
Standalone mode already implies you are running on cluster (distributed)
mode. i.e. it's one of 4 available cluster manager options. The difference
is Standalone uses it's one resource manager rather than using YARN for
example.
If you are running docker on a single machine then you are limited to that
but if you run your docker on a cluster and deploy your Spark containers on
it then you will get your distribution and cluster mode.
And also If you are referring to scalability then you need to register
worker nodes when you need to scale.
You do it by registering a VM/container as a worker node as per doc using:

./sbin/start-worker.sh 

You can create a new docker container with your base image and run the
above command on the bootstrap and that would register a worker node
and scale your cluster when you want.

And if you kill them then you would scale down ( I think this is how
Databricks autoscaling works..). I am not sure k8s TBH, perhaps it's
handled this more gracefully


On Sat, Jul 24, 2021 at 3:38 PM Dinakar Chennubotla <
chennu.bigd...@gmail.com> wrote:

> Hi Khalid Mammadov,
>
> Thank you for your response,
> Yes, I did, I built standalone apache spark cluster on docker containers.
>
> But I am looking for distributed spark cluster,
> Where spark workers are scalable and spark "deployment mode  = cluster".
>
> Source url I used to built standalone apache spark cluster
> https://www.kdnuggets.com/2020/07/apache-spark-cluster-docker.html
>
> If you have documentation on distributed spark, which I am looking for,
> could you please send me.
>
>
> Thanks,
> Dinakar
>
> On Sat, 24 Jul, 2021, 19:32 Khalid Mammadov, 
> wrote:
>
>> Hi,
>>
>> Have you checked out docs?
>> https://spark.apache.org/docs/latest/spark-standalone.html
>>
>> Thanks,
>> Khalid
>>
>> On Sat, Jul 24, 2021 at 1:45 PM Dinakar Chennubotla <
>> chennu.bigd...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am Dinakar, Hadoop admin,
>>> could someone help me here,
>>>
>>> 1. I have a DEV-POC task to do,
>>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>>> on Docker containers.
>>> 3. with Scalable spark-worker containers.
>>> 4. we have a 9 node cluster with some other services or tools.
>>>
>>> Thanks,
>>> Dinakar
>>>
>>


Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-24 Thread Dinakar Chennubotla
Hi Khalid Mammadov,

Thank you for your response,
Yes, I did, I built standalone apache spark cluster on docker containers.

But I am looking for distributed spark cluster,
Where spark workers are scalable and spark "deployment mode  = cluster".

Source url I used to built standalone apache spark cluster
https://www.kdnuggets.com/2020/07/apache-spark-cluster-docker.html

If you have documentation on distributed spark, which I am looking for,
could you please send me.


Thanks,
Dinakar

On Sat, 24 Jul, 2021, 19:32 Khalid Mammadov, 
wrote:

> Hi,
>
> Have you checked out docs?
> https://spark.apache.org/docs/latest/spark-standalone.html
>
> Thanks,
> Khalid
>
> On Sat, Jul 24, 2021 at 1:45 PM Dinakar Chennubotla <
> chennu.bigd...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am Dinakar, Hadoop admin,
>> could someone help me here,
>>
>> 1. I have a DEV-POC task to do,
>> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
>> on Docker containers.
>> 3. with Scalable spark-worker containers.
>> 4. we have a 9 node cluster with some other services or tools.
>>
>> Thanks,
>> Dinakar
>>
>


Re: Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-24 Thread Khalid Mammadov
Hi,

Have you checked out docs?
https://spark.apache.org/docs/latest/spark-standalone.html

Thanks,
Khalid

On Sat, Jul 24, 2021 at 1:45 PM Dinakar Chennubotla <
chennu.bigd...@gmail.com> wrote:

> Hi All,
>
> I am Dinakar, Hadoop admin,
> could someone help me here,
>
> 1. I have a DEV-POC task to do,
> 2. Need to Installing Distributed apache-spark cluster with Cluster mode
> on Docker containers.
> 3. with Scalable spark-worker containers.
> 4. we have a 9 node cluster with some other services or tools.
>
> Thanks,
> Dinakar
>


Installing Distributed apache spark cluster with Cluster mode on Docker

2021-07-24 Thread Dinakar Chennubotla
Hi All,

I am Dinakar, Hadoop admin,
could someone help me here,

1. I have a DEV-POC task to do,
2. Need to Installing Distributed apache-spark cluster with Cluster mode on
Docker containers.
3. with Scalable spark-worker containers.
4. we have a 9 node cluster with some other services or tools.

Thanks,
Dinakar


Apache Spark Cluster

2018-07-23 Thread Uğur Sopaoğlu
We try to create a cluster which consists of 4 machines. The cluster will
be used by multiple-users. How can we configured that user can submit jobs
from personal computer and is there any free tool you can suggest to
leverage procedure.
-- 
Uğur Sopaoğlu