how can I get the application belong to the driver?

2016-12-26 Thread John Fang
I hope I can get the application by the driverId, but I don't find the rest api 
at spark。Then how can i get the application, which belong to one driver。

Spark Graphx with Database

2016-12-26 Thread balaji9058
Hi All,

I would like to know about spark graphx execution/processing with
database.Yes, i understand spark graphx is in-memory processing but some
extent we can manage querying but would like to do much more complex query
or processing.Please suggest me the usecase or steps for the same.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Graphx-with-Database-tp28253.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Chawla,Sumit
What is the expected effect of reducing the mesosExecutor.cores to zero?
What functionality of executor is impacted? Is the impact is just that it
just behaves like a regular process?

Regards
Sumit Chawla


On Mon, Dec 26, 2016 at 9:25 AM, Michael Gummelt 
wrote:

> > Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
> allocation
>
> Maybe for CPU, but definitely not for memory.  Executors never shut down
> in fine-grained mode, which means you only elastically grow and shrink CPU
> usage, not memory.
>
> On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu  wrote:
>
>> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
>> allocation, but have to pay a little more overhead for launching a
>> task, which should be OK if the task is not trivial.
>>
>> Since the direct result (up to 1M by default) will also go through
>> mesos, it's better to tune it lower, otherwise mesos could become the
>> bottleneck.
>>
>> spark.task.maxDirectResultSize
>>
>> On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit 
>> wrote:
>> > Tim,
>> >
>> > We will try to run the application in coarse grain mode, and share the
>> > findings with you.
>> >
>> > Regards
>> > Sumit Chawla
>> >
>> >
>> > On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen 
>> wrote:
>> >
>> >> Dynamic allocation works with Coarse grain mode only, we wasn't aware
>> >> a need for Fine grain mode after we enabled dynamic allocation support
>> >> on the coarse grain mode.
>> >>
>> >> What's the reason you're running fine grain mode instead of coarse
>> >> grain + dynamic allocation?
>> >>
>> >> Tim
>> >>
>> >> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
>> >>  wrote:
>> >> > We will be interested by the results if you give a try to Dynamic
>> >> allocation
>> >> > with mesos !
>> >> >
>> >> >
>> >> > - Mail Original -
>> >> > De: "Michael Gummelt" 
>> >> > À: "Sumit Chawla" 
>> >> > Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User"
>> >> > , d...@spark.apache.org
>> >> > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam /
>> Berlin /
>> >> > Berne / Rome / Stockholm / Vienne
>> >> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
>> >> >
>> >> >
>> >> >> Is this problem of idle executors sticking around solved in Dynamic
>> >> >> Resource Allocation?  Is there some timeout after which Idle
>> executors
>> >> can
>> >> >> just shutdown and cleanup its resources.
>> >> >
>> >> > Yes, that's exactly what dynamic allocation does.  But again I have
>> no
>> >> idea
>> >> > what the state of dynamic allocation + mesos is.
>> >> >
>> >> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit <
>> sumitkcha...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Great.  Makes much better sense now.  What will be reason to have
>> >> >> spark.mesos.mesosExecutor.cores more than 1, as this number doesn't
>> >> include
>> >> >> the number of cores for tasks.
>> >> >>
>> >> >> So in my case it seems like 30 CPUs are allocated to executors.  And
>> >> there
>> >> >> are 48 tasks so 48 + 30 =  78 CPUs.  And i am noticing this gap of
>> 30 is
>> >> >> maintained till the last task exits.  This explains the gap.
>>  Thanks
>> >> >> everyone.  I am still not sure how this number 30 is calculated.  (
>> Is
>> >> it
>> >> >> dynamic based on current resources, or is it some configuration.  I
>> >> have 32
>> >> >> nodes in my cluster).
>> >> >>
>> >> >> Is this problem of idle executors sticking around solved in Dynamic
>> >> >> Resource Allocation?  Is there some timeout after which Idle
>> executors
>> >> can
>> >> >> just shutdown and cleanup its resources.
>> >> >>
>> >> >>
>> >> >> Regards
>> >> >> Sumit Chawla
>> >> >>
>> >> >>
>> >> >> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt <
>> >> mgumm...@mesosphere.io>
>> >> >> wrote:
>> >> >>>
>> >> >>> >  I should preassume that No of executors should be less than
>> number
>> >> of
>> >> >>> > tasks.
>> >> >>>
>> >> >>> No.  Each executor runs 0 or more tasks.
>> >> >>>
>> >> >>> Each executor consumes 1 CPU, and each task running on that
>> executor
>> >> >>> consumes another CPU.  You can customize this via
>> >> >>> spark.mesos.mesosExecutor.cores
>> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/running-
>> on-mesos.md)
>> >> and
>> >> >>> spark.task.cpus
>> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/configuration.md
>> )
>> >> >>>
>> >> >>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit <
>> sumitkcha...@gmail.com
>> >> >
>> >> >>> wrote:
>> >> 
>> >>  Ah thanks. looks like i skipped reading this "Neither will
>> executors
>> >>  terminate when they’re idle."
>> >> 
>> >>  So in my job scenario,  I should preassume that No of executors
>> should
>> >>  be less than number of tasks. Ideally one executor should execute
>> 1
>> >> or more
>> >>  tasks.  But i am observing something strange instead.  I start my
>> job
>> >> with
>> >>  48 partitions for a spark job. In mesos ui i see that number of
>> tasks
>> >> is 48,
>> >>  b

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Thanks a LOT, Michael!

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Dec 26, 2016 at 10:04 PM, Michael Gummelt
 wrote:
> In fine-grained mode (which is deprecated), Spark tasks (which are threads)
> were implemented as Mesos tasks.  When a Mesos task starts and stops, its
> underlying cgroup, and therefore the resources its consuming on the cluster,
> grows or shrinks based on the resources allocated to the tasks, which in
> Spark is just CPU.  This is what I mean by CPU usage "elastically growing".
>
> However, all Mesos tasks are run by an "executor", which has its own
> resource allocation.  In Spark, the executor is the JVM, and all memory is
> allocated to the executor, because JVMs can't relinquish memory.  If memory
> were allocated to the tasks, then the cgroup's memory allocation would
> shrink when the task terminated, but the JVM's memory consumption would stay
> constant, and the JVM would OOM.
>
> And, without dynamic allocation, executors never terminate during the
> duration of a Spark job, because even if they're idle (no tasks), they still
> may be hosting shuffle files.  That's why dynamic allocation depends on an
> external shuffle service.  Since executors never terminate, and all memory
> is allocated to the executors, Spark jobs even in fine-grained mode only
> grow in memory allocation, they don't shrink.
>
> On Mon, Dec 26, 2016 at 12:39 PM, Jacek Laskowski  wrote:
>>
>> Hi Michael,
>>
>> That caught my attention...
>>
>> Could you please elaborate on "elastically grow and shrink CPU usage"
>> and how it really works under the covers? It seems that CPU usage is
>> just a "label" for an executor on Mesos. Where's this in the code?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Mon, Dec 26, 2016 at 6:25 PM, Michael Gummelt 
>> wrote:
>> >> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
>> >> allocation
>> >
>> > Maybe for CPU, but definitely not for memory.  Executors never shut down
>> > in
>> > fine-grained mode, which means you only elastically grow and shrink CPU
>> > usage, not memory.
>> >
>> > On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu 
>> > wrote:
>> >>
>> >> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
>> >> allocation, but have to pay a little more overhead for launching a
>> >> task, which should be OK if the task is not trivial.
>> >>
>> >> Since the direct result (up to 1M by default) will also go through
>> >> mesos, it's better to tune it lower, otherwise mesos could become the
>> >> bottleneck.
>> >>
>> >> spark.task.maxDirectResultSize
>> >>
>> >> On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit 
>> >> wrote:
>> >> > Tim,
>> >> >
>> >> > We will try to run the application in coarse grain mode, and share
>> >> > the
>> >> > findings with you.
>> >> >
>> >> > Regards
>> >> > Sumit Chawla
>> >> >
>> >> >
>> >> > On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen 
>> >> > wrote:
>> >> >
>> >> >> Dynamic allocation works with Coarse grain mode only, we wasn't
>> >> >> aware
>> >> >> a need for Fine grain mode after we enabled dynamic allocation
>> >> >> support
>> >> >> on the coarse grain mode.
>> >> >>
>> >> >> What's the reason you're running fine grain mode instead of coarse
>> >> >> grain + dynamic allocation?
>> >> >>
>> >> >> Tim
>> >> >>
>> >> >> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
>> >> >>  wrote:
>> >> >> > We will be interested by the results if you give a try to Dynamic
>> >> >> allocation
>> >> >> > with mesos !
>> >> >> >
>> >> >> >
>> >> >> > - Mail Original -
>> >> >> > De: "Michael Gummelt" 
>> >> >> > À: "Sumit Chawla" 
>> >> >> > Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User"
>> >> >> > , d...@spark.apache.org
>> >> >> > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam /
>> >> >> > Berlin
>> >> >> > /
>> >> >> > Berne / Rome / Stockholm / Vienne
>> >> >> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
>> >> >> >
>> >> >> >
>> >> >> >> Is this problem of idle executors sticking around solved in
>> >> >> >> Dynamic
>> >> >> >> Resource Allocation?  Is there some timeout after which Idle
>> >> >> >> executors
>> >> >> can
>> >> >> >> just shutdown and cleanup its resources.
>> >> >> >
>> >> >> > Yes, that's exactly what dynamic allocation does.  But again I
>> >> >> > have
>> >> >> > no
>> >> >> idea
>> >> >> > what the state of dynamic allocation + mesos is.
>> >> >> >
>> >> >> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit
>> >> >> > 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Great.  Makes much better sense now.  What will be reason to have
>> >> >> >> spark.mesos.mesosExecutor.cores more than 1, as this number
>> >> >> >> doesn't
>> >> >> include
>> >> >> >

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Michael Gummelt
In fine-grained mode (which is deprecated), Spark tasks (which are threads)
were implemented as Mesos tasks.  When a Mesos task starts and stops, its
underlying cgroup, and therefore the resources its consuming on the
cluster, grows or shrinks based on the resources allocated to the tasks,
which in Spark is just CPU.  This is what I mean by CPU usage "elastically
growing".

However, all Mesos tasks are run by an "executor", which has its own
resource allocation.  In Spark, the executor is the JVM, and all memory is
allocated to the executor, because JVMs can't relinquish memory.  If memory
were allocated to the tasks, then the cgroup's memory allocation would
shrink when the task terminated, but the JVM's memory consumption would
stay constant, and the JVM would OOM.

And, without dynamic allocation, executors never terminate during the
duration of a Spark job, because even if they're idle (no tasks), they
still may be hosting shuffle files.  That's why dynamic allocation depends
on an external shuffle service.  Since executors never terminate, and all
memory is allocated to the executors, Spark jobs even in fine-grained mode
only grow in memory allocation, they don't shrink.

On Mon, Dec 26, 2016 at 12:39 PM, Jacek Laskowski  wrote:

> Hi Michael,
>
> That caught my attention...
>
> Could you please elaborate on "elastically grow and shrink CPU usage"
> and how it really works under the covers? It seems that CPU usage is
> just a "label" for an executor on Mesos. Where's this in the code?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Dec 26, 2016 at 6:25 PM, Michael Gummelt 
> wrote:
> >> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
> >> allocation
> >
> > Maybe for CPU, but definitely not for memory.  Executors never shut down
> in
> > fine-grained mode, which means you only elastically grow and shrink CPU
> > usage, not memory.
> >
> > On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu 
> wrote:
> >>
> >> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
> >> allocation, but have to pay a little more overhead for launching a
> >> task, which should be OK if the task is not trivial.
> >>
> >> Since the direct result (up to 1M by default) will also go through
> >> mesos, it's better to tune it lower, otherwise mesos could become the
> >> bottleneck.
> >>
> >> spark.task.maxDirectResultSize
> >>
> >> On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit 
> >> wrote:
> >> > Tim,
> >> >
> >> > We will try to run the application in coarse grain mode, and share the
> >> > findings with you.
> >> >
> >> > Regards
> >> > Sumit Chawla
> >> >
> >> >
> >> > On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen 
> wrote:
> >> >
> >> >> Dynamic allocation works with Coarse grain mode only, we wasn't aware
> >> >> a need for Fine grain mode after we enabled dynamic allocation
> support
> >> >> on the coarse grain mode.
> >> >>
> >> >> What's the reason you're running fine grain mode instead of coarse
> >> >> grain + dynamic allocation?
> >> >>
> >> >> Tim
> >> >>
> >> >> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
> >> >>  wrote:
> >> >> > We will be interested by the results if you give a try to Dynamic
> >> >> allocation
> >> >> > with mesos !
> >> >> >
> >> >> >
> >> >> > - Mail Original -
> >> >> > De: "Michael Gummelt" 
> >> >> > À: "Sumit Chawla" 
> >> >> > Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User"
> >> >> > , d...@spark.apache.org
> >> >> > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam /
> Berlin
> >> >> > /
> >> >> > Berne / Rome / Stockholm / Vienne
> >> >> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
> >> >> >
> >> >> >
> >> >> >> Is this problem of idle executors sticking around solved in
> Dynamic
> >> >> >> Resource Allocation?  Is there some timeout after which Idle
> >> >> >> executors
> >> >> can
> >> >> >> just shutdown and cleanup its resources.
> >> >> >
> >> >> > Yes, that's exactly what dynamic allocation does.  But again I have
> >> >> > no
> >> >> idea
> >> >> > what the state of dynamic allocation + mesos is.
> >> >> >
> >> >> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit
> >> >> > 
> >> >> > wrote:
> >> >> >>
> >> >> >> Great.  Makes much better sense now.  What will be reason to have
> >> >> >> spark.mesos.mesosExecutor.cores more than 1, as this number
> doesn't
> >> >> include
> >> >> >> the number of cores for tasks.
> >> >> >>
> >> >> >> So in my case it seems like 30 CPUs are allocated to executors.
> And
> >> >> there
> >> >> >> are 48 tasks so 48 + 30 =  78 CPUs.  And i am noticing this gap of
> >> >> >> 30 is
> >> >> >> maintained till the last task exits.  This explains the gap.
> >> >> >> Thanks
> >> >> >> everyone.  I am still not sure how this number 30 is calculated.
> (
> >> >> >> Is
> >> >> it
> >> >> >> dynamic based on current resources, or 

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Hi Michael,

That caught my attention...

Could you please elaborate on "elastically grow and shrink CPU usage"
and how it really works under the covers? It seems that CPU usage is
just a "label" for an executor on Mesos. Where's this in the code?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Dec 26, 2016 at 6:25 PM, Michael Gummelt  wrote:
>> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
>> allocation
>
> Maybe for CPU, but definitely not for memory.  Executors never shut down in
> fine-grained mode, which means you only elastically grow and shrink CPU
> usage, not memory.
>
> On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu  wrote:
>>
>> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
>> allocation, but have to pay a little more overhead for launching a
>> task, which should be OK if the task is not trivial.
>>
>> Since the direct result (up to 1M by default) will also go through
>> mesos, it's better to tune it lower, otherwise mesos could become the
>> bottleneck.
>>
>> spark.task.maxDirectResultSize
>>
>> On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit 
>> wrote:
>> > Tim,
>> >
>> > We will try to run the application in coarse grain mode, and share the
>> > findings with you.
>> >
>> > Regards
>> > Sumit Chawla
>> >
>> >
>> > On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen  wrote:
>> >
>> >> Dynamic allocation works with Coarse grain mode only, we wasn't aware
>> >> a need for Fine grain mode after we enabled dynamic allocation support
>> >> on the coarse grain mode.
>> >>
>> >> What's the reason you're running fine grain mode instead of coarse
>> >> grain + dynamic allocation?
>> >>
>> >> Tim
>> >>
>> >> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
>> >>  wrote:
>> >> > We will be interested by the results if you give a try to Dynamic
>> >> allocation
>> >> > with mesos !
>> >> >
>> >> >
>> >> > - Mail Original -
>> >> > De: "Michael Gummelt" 
>> >> > À: "Sumit Chawla" 
>> >> > Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User"
>> >> > , d...@spark.apache.org
>> >> > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam / Berlin
>> >> > /
>> >> > Berne / Rome / Stockholm / Vienne
>> >> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
>> >> >
>> >> >
>> >> >> Is this problem of idle executors sticking around solved in Dynamic
>> >> >> Resource Allocation?  Is there some timeout after which Idle
>> >> >> executors
>> >> can
>> >> >> just shutdown and cleanup its resources.
>> >> >
>> >> > Yes, that's exactly what dynamic allocation does.  But again I have
>> >> > no
>> >> idea
>> >> > what the state of dynamic allocation + mesos is.
>> >> >
>> >> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> Great.  Makes much better sense now.  What will be reason to have
>> >> >> spark.mesos.mesosExecutor.cores more than 1, as this number doesn't
>> >> include
>> >> >> the number of cores for tasks.
>> >> >>
>> >> >> So in my case it seems like 30 CPUs are allocated to executors.  And
>> >> there
>> >> >> are 48 tasks so 48 + 30 =  78 CPUs.  And i am noticing this gap of
>> >> >> 30 is
>> >> >> maintained till the last task exits.  This explains the gap.
>> >> >> Thanks
>> >> >> everyone.  I am still not sure how this number 30 is calculated.  (
>> >> >> Is
>> >> it
>> >> >> dynamic based on current resources, or is it some configuration.  I
>> >> have 32
>> >> >> nodes in my cluster).
>> >> >>
>> >> >> Is this problem of idle executors sticking around solved in Dynamic
>> >> >> Resource Allocation?  Is there some timeout after which Idle
>> >> >> executors
>> >> can
>> >> >> just shutdown and cleanup its resources.
>> >> >>
>> >> >>
>> >> >> Regards
>> >> >> Sumit Chawla
>> >> >>
>> >> >>
>> >> >> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt <
>> >> mgumm...@mesosphere.io>
>> >> >> wrote:
>> >> >>>
>> >> >>> >  I should preassume that No of executors should be less than
>> >> >>> > number
>> >> of
>> >> >>> > tasks.
>> >> >>>
>> >> >>> No.  Each executor runs 0 or more tasks.
>> >> >>>
>> >> >>> Each executor consumes 1 CPU, and each task running on that
>> >> >>> executor
>> >> >>> consumes another CPU.  You can customize this via
>> >> >>> spark.mesos.mesosExecutor.cores
>> >> >>>
>> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/running-on-mesos.md)
>> >> and
>> >> >>> spark.task.cpus
>> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/configuration.md)
>> >> >>>
>> >> >>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit
>> >> >>> > >> >
>> >> >>> wrote:
>> >> 
>> >>  Ah thanks. looks like i skipped reading this "Neither will
>> >>  executors
>> >>  terminate when they’re idle."
>> >> 
>> >>  So in my job scenario,  I should preassume that No of executors
>> >>  should
>> >>  be less than number of tasks. Ide

Re: Spark Storage Tab is empty

2016-12-26 Thread Jacek Laskowski
Hi David,

Can you use persist instead? Perhaps with some other StorageLevel? It
worked with Spark 2.2.0-SNAPSHOT I use and don't remember how it
worked back then in 1.6.2.

You could also check the Executors tab and see how many blocks you
have in their BlockManagers.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Dec 26, 2016 at 7:08 PM, David Hodeffi
 wrote:
> I have tried the following code but didn't see anything on the storage tab.
>
>
>
>
>
> val myrdd = sc.parallelilize(1 to 100)
>
> myrdd.setName("my_rdd")
>
> myrdd.cache()
>
> myrdd.collect()
>
>
>
> Storage tab is empty, though I can see the stage of collect() .
>
> I am using 1.6.2 ,HDP 2.5 , spark on yarn
>
>
>
>
>
> Thanks David
>
>
>
>
> Confidentiality: This communication and any attachments are intended for the
> above-named persons only and may be confidential and/or legally privileged.
> Any opinions expressed in this communication are not necessarily those of
> NICE Actimize. If this communication has come to you in error you must take
> no action based on it, nor must you copy or show it to anyone; please
> delete/destroy and inform the sender by e-mail immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark Storage Tab is empty

2016-12-26 Thread David Hodeffi
I have tried the following code but didn't see anything on the storage tab.


val myrdd = sc.parallelilize(1 to 100)
myrdd.setName("my_rdd")
myrdd.cache()
myrdd.collect()

Storage tab is empty, though I can see the stage of collect() .
I am using 1.6.2 ,HDP 2.5 , spark on yarn


Thanks David


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


[Spark 2.0.2 HDFS]: no data locality

2016-12-26 Thread Karamba
Hi,

I am running a couple of docker hosts, each with an HDFS and a spark
worker in a spark standalone cluster.
In order to get data locality awareness, I would like to configure Racks
for each host, so that a spark worker container knows from which hdfs
node container it should load its data. Does this make sense?

I configured HDFS container nodes via the core-site.xml in
$HADOOP_HOME/etc and this works. hdfs dfsadmin -printTopology shows my
setup.

I configured SPARK the same way. I placed core-site.xml and
hdfs-site.xml in the SPARK_CONF_DIR ... BUT this has no effect.

Submitting a spark job via spark-submit to the spark-master that loads
from HDFS just has Data locality ANY.

It would be great if anybody would help me getting the right configuration!

Thanks and best regards,
on

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Michael Gummelt
> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
allocation

Maybe for CPU, but definitely not for memory.  Executors never shut down in
fine-grained mode, which means you only elastically grow and shrink CPU
usage, not memory.

On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu  wrote:

> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
> allocation, but have to pay a little more overhead for launching a
> task, which should be OK if the task is not trivial.
>
> Since the direct result (up to 1M by default) will also go through
> mesos, it's better to tune it lower, otherwise mesos could become the
> bottleneck.
>
> spark.task.maxDirectResultSize
>
> On Mon, Dec 19, 2016 at 3:23 PM, Chawla,Sumit 
> wrote:
> > Tim,
> >
> > We will try to run the application in coarse grain mode, and share the
> > findings with you.
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen  wrote:
> >
> >> Dynamic allocation works with Coarse grain mode only, we wasn't aware
> >> a need for Fine grain mode after we enabled dynamic allocation support
> >> on the coarse grain mode.
> >>
> >> What's the reason you're running fine grain mode instead of coarse
> >> grain + dynamic allocation?
> >>
> >> Tim
> >>
> >> On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
> >>  wrote:
> >> > We will be interested by the results if you give a try to Dynamic
> >> allocation
> >> > with mesos !
> >> >
> >> >
> >> > - Mail Original -
> >> > De: "Michael Gummelt" 
> >> > À: "Sumit Chawla" 
> >> > Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User"
> >> > , d...@spark.apache.org
> >> > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam / Berlin
> /
> >> > Berne / Rome / Stockholm / Vienne
> >> > Objet: Re: Mesos Spark Fine Grained Execution - CPU count
> >> >
> >> >
> >> >> Is this problem of idle executors sticking around solved in Dynamic
> >> >> Resource Allocation?  Is there some timeout after which Idle
> executors
> >> can
> >> >> just shutdown and cleanup its resources.
> >> >
> >> > Yes, that's exactly what dynamic allocation does.  But again I have no
> >> idea
> >> > what the state of dynamic allocation + mesos is.
> >> >
> >> > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit  >
> >> > wrote:
> >> >>
> >> >> Great.  Makes much better sense now.  What will be reason to have
> >> >> spark.mesos.mesosExecutor.cores more than 1, as this number doesn't
> >> include
> >> >> the number of cores for tasks.
> >> >>
> >> >> So in my case it seems like 30 CPUs are allocated to executors.  And
> >> there
> >> >> are 48 tasks so 48 + 30 =  78 CPUs.  And i am noticing this gap of
> 30 is
> >> >> maintained till the last task exits.  This explains the gap.   Thanks
> >> >> everyone.  I am still not sure how this number 30 is calculated.  (
> Is
> >> it
> >> >> dynamic based on current resources, or is it some configuration.  I
> >> have 32
> >> >> nodes in my cluster).
> >> >>
> >> >> Is this problem of idle executors sticking around solved in Dynamic
> >> >> Resource Allocation?  Is there some timeout after which Idle
> executors
> >> can
> >> >> just shutdown and cleanup its resources.
> >> >>
> >> >>
> >> >> Regards
> >> >> Sumit Chawla
> >> >>
> >> >>
> >> >> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt <
> >> mgumm...@mesosphere.io>
> >> >> wrote:
> >> >>>
> >> >>> >  I should preassume that No of executors should be less than
> number
> >> of
> >> >>> > tasks.
> >> >>>
> >> >>> No.  Each executor runs 0 or more tasks.
> >> >>>
> >> >>> Each executor consumes 1 CPU, and each task running on that executor
> >> >>> consumes another CPU.  You can customize this via
> >> >>> spark.mesos.mesosExecutor.cores
> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/
> running-on-mesos.md)
> >> and
> >> >>> spark.task.cpus
> >> >>> (https://github.com/apache/spark/blob/v1.6.3/docs/configuration.md)
> >> >>>
> >> >>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit <
> sumitkcha...@gmail.com
> >> >
> >> >>> wrote:
> >> 
> >>  Ah thanks. looks like i skipped reading this "Neither will
> executors
> >>  terminate when they’re idle."
> >> 
> >>  So in my job scenario,  I should preassume that No of executors
> should
> >>  be less than number of tasks. Ideally one executor should execute 1
> >> or more
> >>  tasks.  But i am observing something strange instead.  I start my
> job
> >> with
> >>  48 partitions for a spark job. In mesos ui i see that number of
> tasks
> >> is 48,
> >>  but no. of CPUs is 78 which is way more than 48.  Here i am
> assuming
> >> that 1
> >>  CPU is 1 executor.   I am not specifying any configuration to set
> >> number of
> >>  cores per executor.
> >> 
> >>  Regards
> >>  Sumit Chawla
> >> 
> >> 
> >>  On Mon, Dec 19, 2016 at 11:35 AM, Joris Van Remoortere
> >>   wrote:
> >> >
> >> > That makes sense. From the documentation it looks like the
> executors
> >> >