Upcoming talks on BigDL and Analytics Zoo this week

2019-03-24 Thread Jason Dai
Hi all,



Please see below for a list of upcoming technical sessions
on BigDL and Analytics Zoo (
https://github.com/intel-analytics/analytics-zoo/) this week:


   - Engineers from Intel will deliver a 3-hour tutorial Analytics Zoo:
   Distributed TensorFlow and Keras on Apache Spark
   

at *Strata Data Conference in San Francisco* (March 26, 1:30–5:00pm)
   - Engineers from Office Depot and Intel will present a technical
talk User-based
   real-time product recommendations leveraging deep learning using Analytics
   Zoo on Apache Spark and BigDL
   

at *Strata Data Conference in San Francisco* (March 27, 4:20–5:00pm)
   - Engineers from Intel will host a poster session Analytics Zoo: Unified
   Analytics + AI Platform for Big Data  at
*ScaledML
   2019 in Mountain View* (March 27, 5:00–7:00PM)
   - Engineers from Intel will present a technical talk Analytics Zoo:
   Distributed TensorFlow in production on Apache Spark
   

at *Strata Data Conference in San Francisco* (March 28, 3:50–4:30pm)

If you plan to attend these events, please drop by and talk to the speakers
:-)



Thanks,

-Jason


Re: Where does the Driver run?

2019-03-24 Thread Akhil Das
There's also a driver ui (usually available on port 4040), after running
your code, I assume you are running it on your machine, visit
localhost:4040 and you will get the driver UI.

If you think the driver is running on your master/executor nodes, login to
those machines and do a

   netstat -napt | grep -I listen

You will see the driver listening on 404x there, this won't be the case
mostly as you are not doing Spark-submit or using the deployMode=cluster.

On Mon, 25 Mar 2019, 01:03 Pat Ferrel,  wrote:

> Thanks, I have seen this many times in my research. Paraphrasing docs: “in
> deployMode ‘cluster' the Driver runs on a Worker in the cluster”
>
> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
> with addresses that match slaves). When I look at memory usage while the
> job runs I see virtually identical usage on the 2 Workers. This would
> support your claim and contradict Spark docs for deployMode = cluster.
>
> The evidence seems to contradict the docs. I am now beginning to wonder if
> the Driver only runs in the cluster if we use spark-submit
>
>
>
> From: Akhil Das  
> Reply: Akhil Das  
> Date: March 23, 2019 at 9:26:50 PM
> To: Pat Ferrel  
> Cc: user  
> Subject:  Re: Where does the Driver run?
>
> If you are starting your "my-app" on your local machine, that's where the
> driver is running.
>
> [image: image.png]
>
> Hope this helps.
> 
>
> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel  wrote:
>
>> I have researched this for a significant amount of time and find answers
>> that seem to be for a slightly different question than mine.
>>
>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>> http://master-address:8080;, there are 2 idle workers, as configured.
>>
>> I have a Scala application that creates a context and starts execution of
>> a Job. I *do not use spark-submit*, I start the Job programmatically and
>> this is where many explanations forks from my question.
>>
>> In "my-app" I create a new SparkConf, with the following code (slightly
>> abbreviated):
>>
>>   conf.setAppName(“my-job")
>>   conf.setMaster(“spark://master-address:7077”)
>>   conf.set(“deployMode”, “cluster”)
>>   // other settings like driver and executor memory requests
>>   // the driver and executor memory requests are for all mem on the
>> slaves, more than
>>   // mem available on the launching machine with “my-app"
>>   val jars = listJars(“/path/to/lib")
>>   conf.setJars(jars)
>>   …
>>
>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>> Everything seems to run fine and sometimes completes successfully. Frequent
>> failures are the reason for this question.
>>
>> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
>> taking all cluster resources. With a Yarn cluster I would expect the
>> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
>> Master, where is the Drive part of the Job running?
>>
>> If is is running in the Master, we are in trouble because I start the
>> Master on one of my 2 Workers sharing resources with one of the Executors.
>> Executor mem + driver mem is > available mem on a Worker. I can change this
>> but need so understand where the Driver part of the Spark Job runs. Is it
>> in the Spark Master, or inside and Executor, or ???
>>
>> The “Driver” creates and broadcasts some large data structures so the
>> need for an answer is more critical than with more typical tiny Drivers.
>>
>> Thanks for you help!
>>
>
>
> --
> Cheers!
>
>


Re: Where does the Driver run?

2019-03-24 Thread Arko Provo Mukherjee
Hello,

Is spark.driver.memory per Job or shared across jobs? You should do load
testing before setting this?

Thanks & regards
Arko


On Sun, Mar 24, 2019 at 3:09 PM Pat Ferrel  wrote:

>
> 2 Slaves, one of which is also Master.
>
> Node 1 & 2 are slaves. Node 1 is where I run start-all.sh.
>
> The machines both have 60g of free memory (leaving about 4g for the master
> process on Node 1). The only constraint to the Driver and Executors is
> spark.driver.memory = spark.executor.memory = 60g
>
> BTW I would expect this to create one Executor, one Driver, and the Master
> on 2 Workers.
>
>
>
>
> From: Andrew Melo  
> Reply: Andrew Melo  
> Date: March 24, 2019 at 12:46:35 PM
> To: Pat Ferrel  
> Cc: Akhil Das  , user
>  
> Subject:  Re: Where does the Driver run?
>
> Hi Pat,
>
> On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel  wrote:
>
>> Thanks, I have seen this many times in my research. Paraphrasing docs:
>> “in deployMode ‘cluster' the Driver runs on a Worker in the cluster”
>>
>> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
>> with addresses that match slaves). When I look at memory usage while the
>> job runs I see virtually identical usage on the 2 Workers. This would
>> support your claim and contradict Spark docs for deployMode = cluster.
>>
>> The evidence seems to contradict the docs. I am now beginning to wonder
>> if the Driver only runs in the cluster if we use spark-submit
>>
>
> Where/how are you starting "./sbin/start-master.sh"?
>
> Cheers
> Andrew
>
>
>>
>>
>>
>> From: Akhil Das  
>> Reply: Akhil Das  
>> Date: March 23, 2019 at 9:26:50 PM
>> To: Pat Ferrel  
>> Cc: user  
>> Subject:  Re: Where does the Driver run?
>>
>> If you are starting your "my-app" on your local machine, that's where the
>> driver is running.
>>
>> [image: image.png]
>>
>> Hope this helps.
>> 
>>
>> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel  wrote:
>>
>>> I have researched this for a significant amount of time and find answers
>>> that seem to be for a slightly different question than mine.
>>>
>>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>>> http://master-address:8080;, there are 2 idle workers, as configured.
>>>
>>> I have a Scala application that creates a context and starts execution
>>> of a Job. I *do not use spark-submit*, I start the Job programmatically and
>>> this is where many explanations forks from my question.
>>>
>>> In "my-app" I create a new SparkConf, with the following code (slightly
>>> abbreviated):
>>>
>>>   conf.setAppName(“my-job")
>>>   conf.setMaster(“spark://master-address:7077”)
>>>   conf.set(“deployMode”, “cluster”)
>>>   // other settings like driver and executor memory requests
>>>   // the driver and executor memory requests are for all mem on the
>>> slaves, more than
>>>   // mem available on the launching machine with “my-app"
>>>   val jars = listJars(“/path/to/lib")
>>>   conf.setJars(jars)
>>>   …
>>>
>>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>>> Everything seems to run fine and sometimes completes successfully. Frequent
>>> failures are the reason for this question.
>>>
>>> Where is the Driver running? I don’t see it in the GUI, I see 2
>>> Executors taking all cluster resources. With a Yarn cluster I would expect
>>> the “Driver" to run on/in the Yarn Master but I am using the Spark
>>> Standalone Master, where is the Drive part of the Job running?
>>>
>>> If is is running in the Master, we are in trouble because I start the
>>> Master on one of my 2 Workers sharing resources with one of the Executors.
>>> Executor mem + driver mem is > available mem on a Worker. I can change this
>>> but need so understand where the Driver part of the Spark Job runs. Is it
>>> in the Spark Master, or inside and Executor, or ???
>>>
>>> The “Driver” creates and broadcasts some large data structures so the
>>> need for an answer is more critical than with more typical tiny Drivers.
>>>
>>> Thanks for you help!
>>>
>>
>>
>> --
>> Cheers!
>>
>>


CCEACC67-4431-4246-AEB8-60CEC0940BA9
Description: Binary data


Re: Where does the Driver run?

2019-03-24 Thread Pat Ferrel
2 Slaves, one of which is also Master.

Node 1 & 2 are slaves. Node 1 is where I run start-all.sh.

The machines both have 60g of free memory (leaving about 4g for the master
process on Node 1). The only constraint to the Driver and Executors is
spark.driver.memory = spark.executor.memory = 60g

BTW I would expect this to create one Executor, one Driver, and the Master
on 2 Workers.




From: Andrew Melo  
Reply: Andrew Melo  
Date: March 24, 2019 at 12:46:35 PM
To: Pat Ferrel  
Cc: Akhil Das  , user
 
Subject:  Re: Where does the Driver run?

Hi Pat,

On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel  wrote:

> Thanks, I have seen this many times in my research. Paraphrasing docs: “in
> deployMode ‘cluster' the Driver runs on a Worker in the cluster”
>
> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
> with addresses that match slaves). When I look at memory usage while the
> job runs I see virtually identical usage on the 2 Workers. This would
> support your claim and contradict Spark docs for deployMode = cluster.
>
> The evidence seems to contradict the docs. I am now beginning to wonder if
> the Driver only runs in the cluster if we use spark-submit
>

Where/how are you starting "./sbin/start-master.sh"?

Cheers
Andrew


>
>
>
> From: Akhil Das  
> Reply: Akhil Das  
> Date: March 23, 2019 at 9:26:50 PM
> To: Pat Ferrel  
> Cc: user  
> Subject:  Re: Where does the Driver run?
>
> If you are starting your "my-app" on your local machine, that's where the
> driver is running.
>
> [image: image.png]
>
> Hope this helps.
> 
>
> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel  wrote:
>
>> I have researched this for a significant amount of time and find answers
>> that seem to be for a slightly different question than mine.
>>
>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>> http://master-address:8080;, there are 2 idle workers, as configured.
>>
>> I have a Scala application that creates a context and starts execution of
>> a Job. I *do not use spark-submit*, I start the Job programmatically and
>> this is where many explanations forks from my question.
>>
>> In "my-app" I create a new SparkConf, with the following code (slightly
>> abbreviated):
>>
>>   conf.setAppName(“my-job")
>>   conf.setMaster(“spark://master-address:7077”)
>>   conf.set(“deployMode”, “cluster”)
>>   // other settings like driver and executor memory requests
>>   // the driver and executor memory requests are for all mem on the
>> slaves, more than
>>   // mem available on the launching machine with “my-app"
>>   val jars = listJars(“/path/to/lib")
>>   conf.setJars(jars)
>>   …
>>
>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>> Everything seems to run fine and sometimes completes successfully. Frequent
>> failures are the reason for this question.
>>
>> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
>> taking all cluster resources. With a Yarn cluster I would expect the
>> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
>> Master, where is the Drive part of the Job running?
>>
>> If is is running in the Master, we are in trouble because I start the
>> Master on one of my 2 Workers sharing resources with one of the Executors.
>> Executor mem + driver mem is > available mem on a Worker. I can change this
>> but need so understand where the Driver part of the Spark Job runs. Is it
>> in the Spark Master, or inside and Executor, or ???
>>
>> The “Driver” creates and broadcasts some large data structures so the
>> need for an answer is more critical than with more typical tiny Drivers.
>>
>> Thanks for you help!
>>
>
>
> --
> Cheers!
>
>


CCEACC67-4431-4246-AEB8-60CEC0940BA9
Description: Binary data


Re: Where does the Driver run?

2019-03-24 Thread Pat Ferrel
2 Slaves, one of which is also Master.

Node 1 & 2 are slaves. Node 1 is where I run start-all.sh.

The machines both have 60g of free memory (leaving about 4g for the master
process on Node 1). The only constraint to the Driver and Executors is
spark.driver.memory = spark.executor.memory = 60g


From: Andrew Melo  
Reply: Andrew Melo  
Date: March 24, 2019 at 12:46:35 PM
To: Pat Ferrel  
Cc: Akhil Das  , user
 
Subject:  Re: Where does the Driver run?

Hi Pat,

On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel  wrote:

> Thanks, I have seen this many times in my research. Paraphrasing docs: “in
> deployMode ‘cluster' the Driver runs on a Worker in the cluster”
>
> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
> with addresses that match slaves). When I look at memory usage while the
> job runs I see virtually identical usage on the 2 Workers. This would
> support your claim and contradict Spark docs for deployMode = cluster.
>
> The evidence seems to contradict the docs. I am now beginning to wonder if
> the Driver only runs in the cluster if we use spark-submit
>

Where/how are you starting "./sbin/start-master.sh"?

Cheers
Andrew


>
>
>
> From: Akhil Das  
> Reply: Akhil Das  
> Date: March 23, 2019 at 9:26:50 PM
> To: Pat Ferrel  
> Cc: user  
> Subject:  Re: Where does the Driver run?
>
> If you are starting your "my-app" on your local machine, that's where the
> driver is running.
>
> [image: image.png]
>
> Hope this helps.
> 
>
> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel  wrote:
>
>> I have researched this for a significant amount of time and find answers
>> that seem to be for a slightly different question than mine.
>>
>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>> http://master-address:8080;, there are 2 idle workers, as configured.
>>
>> I have a Scala application that creates a context and starts execution of
>> a Job. I *do not use spark-submit*, I start the Job programmatically and
>> this is where many explanations forks from my question.
>>
>> In "my-app" I create a new SparkConf, with the following code (slightly
>> abbreviated):
>>
>>   conf.setAppName(“my-job")
>>   conf.setMaster(“spark://master-address:7077”)
>>   conf.set(“deployMode”, “cluster”)
>>   // other settings like driver and executor memory requests
>>   // the driver and executor memory requests are for all mem on the
>> slaves, more than
>>   // mem available on the launching machine with “my-app"
>>   val jars = listJars(“/path/to/lib")
>>   conf.setJars(jars)
>>   …
>>
>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>> Everything seems to run fine and sometimes completes successfully. Frequent
>> failures are the reason for this question.
>>
>> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
>> taking all cluster resources. With a Yarn cluster I would expect the
>> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
>> Master, where is the Drive part of the Job running?
>>
>> If is is running in the Master, we are in trouble because I start the
>> Master on one of my 2 Workers sharing resources with one of the Executors.
>> Executor mem + driver mem is > available mem on a Worker. I can change this
>> but need so understand where the Driver part of the Spark Job runs. Is it
>> in the Spark Master, or inside and Executor, or ???
>>
>> The “Driver” creates and broadcasts some large data structures so the
>> need for an answer is more critical than with more typical tiny Drivers.
>>
>> Thanks for you help!
>>
>
>
> --
> Cheers!
>
>


3847fb65eedb5792_0.1.1
Description: Binary data


How to support writeStream in data source v2 (spark 2.3.1)?

2019-03-24 Thread kineret M
I write spark data source v2 in spark 2.3 and I want to support
writeStream. What should I do in order to do so?

my defaultSource class:

class MyDefaultSource extends DataSourceV2 with ReadSupport with
WriteSupport with MicroBatchReadSupport { ..

Which interface is missing?


Re: Where does the Driver run?

2019-03-24 Thread Andrew Melo
Hi Pat,

On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel  wrote:

> Thanks, I have seen this many times in my research. Paraphrasing docs: “in
> deployMode ‘cluster' the Driver runs on a Worker in the cluster”
>
> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
> with addresses that match slaves). When I look at memory usage while the
> job runs I see virtually identical usage on the 2 Workers. This would
> support your claim and contradict Spark docs for deployMode = cluster.
>
> The evidence seems to contradict the docs. I am now beginning to wonder if
> the Driver only runs in the cluster if we use spark-submit
>

Where/how are you starting "./sbin/start-master.sh"?

Cheers
Andrew


>
>
>
> From: Akhil Das  
> Reply: Akhil Das  
> Date: March 23, 2019 at 9:26:50 PM
> To: Pat Ferrel  
> Cc: user  
> Subject:  Re: Where does the Driver run?
>
> If you are starting your "my-app" on your local machine, that's where the
> driver is running.
>
> [image: image.png]
>
> Hope this helps.
> 
>
> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel  wrote:
>
>> I have researched this for a significant amount of time and find answers
>> that seem to be for a slightly different question than mine.
>>
>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>> http://master-address:8080;, there are 2 idle workers, as configured.
>>
>> I have a Scala application that creates a context and starts execution of
>> a Job. I *do not use spark-submit*, I start the Job programmatically and
>> this is where many explanations forks from my question.
>>
>> In "my-app" I create a new SparkConf, with the following code (slightly
>> abbreviated):
>>
>>   conf.setAppName(“my-job")
>>   conf.setMaster(“spark://master-address:7077”)
>>   conf.set(“deployMode”, “cluster”)
>>   // other settings like driver and executor memory requests
>>   // the driver and executor memory requests are for all mem on the
>> slaves, more than
>>   // mem available on the launching machine with “my-app"
>>   val jars = listJars(“/path/to/lib")
>>   conf.setJars(jars)
>>   …
>>
>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>> Everything seems to run fine and sometimes completes successfully. Frequent
>> failures are the reason for this question.
>>
>> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
>> taking all cluster resources. With a Yarn cluster I would expect the
>> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
>> Master, where is the Drive part of the Job running?
>>
>> If is is running in the Master, we are in trouble because I start the
>> Master on one of my 2 Workers sharing resources with one of the Executors.
>> Executor mem + driver mem is > available mem on a Worker. I can change this
>> but need so understand where the Driver part of the Spark Job runs. Is it
>> in the Spark Master, or inside and Executor, or ???
>>
>> The “Driver” creates and broadcasts some large data structures so the
>> need for an answer is more critical than with more typical tiny Drivers.
>>
>> Thanks for you help!
>>
>
>
> --
> Cheers!
>
>


Re: Where does the Driver run?

2019-03-24 Thread Pat Ferrel
Thanks, I have seen this many times in my research. Paraphrasing docs: “in
deployMode ‘cluster' the Driver runs on a Worker in the cluster”

When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
with addresses that match slaves). When I look at memory usage while the
job runs I see virtually identical usage on the 2 Workers. This would
support your claim and contradict Spark docs for deployMode = cluster.

The evidence seems to contradict the docs. I am now beginning to wonder if
the Driver only runs in the cluster if we use spark-submit



From: Akhil Das  
Reply: Akhil Das  
Date: March 23, 2019 at 9:26:50 PM
To: Pat Ferrel  
Cc: user  
Subject:  Re: Where does the Driver run?

If you are starting your "my-app" on your local machine, that's where the
driver is running.

[image: image.png]

Hope this helps.


On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel  wrote:

> I have researched this for a significant amount of time and find answers
> that seem to be for a slightly different question than mine.
>
> The Spark 2.3.3 cluster is running fine. I see the GUI on “
> http://master-address:8080;, there are 2 idle workers, as configured.
>
> I have a Scala application that creates a context and starts execution of
> a Job. I *do not use spark-submit*, I start the Job programmatically and
> this is where many explanations forks from my question.
>
> In "my-app" I create a new SparkConf, with the following code (slightly
> abbreviated):
>
>   conf.setAppName(“my-job")
>   conf.setMaster(“spark://master-address:7077”)
>   conf.set(“deployMode”, “cluster”)
>   // other settings like driver and executor memory requests
>   // the driver and executor memory requests are for all mem on the
> slaves, more than
>   // mem available on the launching machine with “my-app"
>   val jars = listJars(“/path/to/lib")
>   conf.setJars(jars)
>   …
>
> When I launch the job I see 2 executors running on the 2 workers/slaves.
> Everything seems to run fine and sometimes completes successfully. Frequent
> failures are the reason for this question.
>
> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
> taking all cluster resources. With a Yarn cluster I would expect the
> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
> Master, where is the Drive part of the Job running?
>
> If is is running in the Master, we are in trouble because I start the
> Master on one of my 2 Workers sharing resources with one of the Executors.
> Executor mem + driver mem is > available mem on a Worker. I can change this
> but need so understand where the Driver part of the Spark Job runs. Is it
> in the Spark Master, or inside and Executor, or ???
>
> The “Driver” creates and broadcasts some large data structures so the need
> for an answer is more critical than with more typical tiny Drivers.
>
> Thanks for you help!
>


--
Cheers!


ii_jtmf6k1q0.png
Description: Binary data