Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-26 Thread Gurvinder Singh
On 05/26/2016 02:38 AM, Matei Zaharia wrote:
> Just wondering, what is the main use case for the Docker images -- to
> develop apps locally or to deploy a cluster? 
I use docker images for both development and deploying on production
cluster. As it makes sure I have the correct version of Java and Spark.
If the image is really just
> a script to download a certain package name from a mirror, it may be
> okay to create an official one, though it does seem tricky to make it
> properly use the right mirror.
I don't think that's an issue, as you will publish a docker image which
will already have spark baked from which ever mirror you choose. The
mirror issue will be only when people want to build their own image from
published Dockerfile, then they can change the mirror if they prefer.

Here is the link to current Spark Dockerfile
(https://gist.github.com/gurvindersingh/8308d46995a58303b90e4bc2fc46e343)
I use as base, then I can start master and worker from it as I like.

- Gurvinder
> 
> Matei
> 
>> On May 25, 2016, at 6:05 PM, Luciano Resende > > wrote:
>>
>>
>>
>> On Wed, May 25, 2016 at 2:34 PM, Sean Owen > > wrote:
>>
>> I don't think the project would bless anything but the standard
>> release artifacts since only those are voted on. People are free to
>> maintain whatever they like and even share it, as long as it's clear
>> it's not from the Apache project.
>>
>>
>> +1
>>
>>
>> -- 
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark on kubernetes

2016-05-23 Thread Gurvinder Singh
OK created this issue https://issues.apache.org/jira/browse/SPARK-15487
please comment on this and also let me know if anyone want to
collaborate on implementing it. Its my first contribution to Spark so
will be exciting.

- Gurvinder
On 05/23/2016 07:55 PM, Gurvinder Singh wrote:
> On 05/23/2016 07:18 PM, Radoslaw Gruchalski wrote:
>> Sounds surprisingly close to this:
>> https://github.com/apache/spark/pull/9608
>>
> I might have overlooked it but bridge mode work appears to make Spark
> work with docker containers and able to communicate with them when
> running on more than one machines.
> 
> Here I am trying to enable getting information from Spark UI
> irrespective of Spark running in containers or not. Spark UI's link to
> workers and application drivers are pointing to internal/protected
> network. So to get this information from user's machine, he/she has to
> connect to VPN. Therefore the proposal is to make Spark master UI
> reverse proxy this information back to user. So only Spark master UI
> needs to be opened up to internet and there is no need to change
> anything else how Spark runs internally either in Standalone mode, Mesos
> or in containers on kubernetes.
> 
> - Gurvinder
>> I can ressurect the work on the bridge mode for Spark 2. The reason why
>> the work on the old one was suspended was because Spark was going
>> through so many changes at that time that a lot of work done, was wiped
>> out by the changes towards 2.0.
>>
>> I know that Lightbend was also interested in having bridge mode.
>>
>> –
>> Best regards,

>> Radek Gruchalski
>> 
ra...@gruchalski.com <mailto:ra...@gruchalski.com>
>> de.linkedin.com/in/radgruchalski
>>
>> *Confidentiality:
>> *This communication is intended for the above-named person and may be
>> confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>>
>> On May 23, 2016 at 7:14:51 PM, Timothy Chen (tnac...@gmail.com
>> <mailto:tnac...@gmail.com>) wrote:
>>
>>> This will also simplify Mesos users as well, DCOS has to work around
>>> this with our own proxying.
>>>
>>> Tim
>>>
>>> On Sun, May 22, 2016 at 11:53 PM, Gurvinder Singh
>>> <gurvinder.si...@uninett.no> wrote:
>>>> Hi Reynold,
>>>>
>>>> So if that's OK with you, can I go ahead and create JIRA for this. As it
>>>> seems this feature is missing currently and can benefit not just for
>>>> kubernetes users but in general Spark standalone mode users too.
>>>>
>>>> - Gurvinder
>>>> On 05/22/2016 12:49 PM, Gurvinder Singh wrote:
>>>>> On 05/22/2016 10:23 AM, Sun Rui wrote:
>>>>>> If it is possible to rewrite URL in outbound responses in Knox or other 
>>>>>> reverse proxy, would that solve your issue?
>>>>> Any process which can keep track of workers and application drivers IP
>>>>> addresses and route traffic to those will work. Considering Spark Master
>>>>> does exactly this due to all workers and application has to register to
>>>>> the master, therefore I propose master to be the place to add such a
>>>>> functionality.
>>>>>
>>>>> I am not aware with Knox capabilities but Nginx or any other normal
>>>>> reverse proxy will not be able to this on its own due to dynamic nature
>>>>> of application drivers and to some extent workers too.
>>>>>
>>>>> - Gurvinder
>>>>>>> On May 22, 2016, at 14:55, Gurvinder Singh <gurvinder.si...@uninett.no> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>>>>>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>>>>>>
>>>>>>> Yeah kubernetes has ingress controller which can act the L7 load
>>>>>>> balancer and router traffic to Spark UI in this case. But I am referring
>>>>>>> to link present in UI to worker and application UI. Replied in the
>>>>>>> detail to Sun Rui's mail where I gave example of possible scenario.
>>>>>>>
>>>>>>> - Gurvinder
>>>>>>>>
>>>>>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>>>>>>> <gurvinder.si...@unin

Re: spark on kubernetes

2016-05-23 Thread Gurvinder Singh
Hi Reynold,

So if that's OK with you, can I go ahead and create JIRA for this. As it
seems this feature is missing currently and can benefit not just for
kubernetes users but in general Spark standalone mode users too.

- Gurvinder
On 05/22/2016 12:49 PM, Gurvinder Singh wrote:
> On 05/22/2016 10:23 AM, Sun Rui wrote:
>> If it is possible to rewrite URL in outbound responses in Knox or other 
>> reverse proxy, would that solve your issue?
> Any process which can keep track of workers and application drivers IP
> addresses and route traffic to those will work. Considering Spark Master
> does exactly this due to all workers and application has to register to
> the master, therefore I propose master to be the place to add such a
> functionality.
> 
> I am not aware with Knox capabilities but Nginx or any other normal
> reverse proxy will not be able to this on its own due to dynamic nature
> of application drivers and to some extent workers too.
> 
> - Gurvinder
>>> On May 22, 2016, at 14:55, Gurvinder Singh <gurvinder.si...@uninett.no> 
>>> wrote:
>>>
>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>>
>>> Yeah kubernetes has ingress controller which can act the L7 load
>>> balancer and router traffic to Spark UI in this case. But I am referring
>>> to link present in UI to worker and application UI. Replied in the
>>> detail to Sun Rui's mail where I gave example of possible scenario.
>>>
>>> - Gurvinder
>>>>
>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>>> <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> wrote:
>>>>
>>>>Hi,
>>>>
>>>>I am currently working on deploying Spark on kuberentes (K8s) and it is
>>>>working fine. I am running Spark with standalone mode and checkpointing
>>>>the state to shared system. So if master fails K8s starts it and from
>>>>checkpoint it recover the earlier state and things just works fine. I
>>>>have an issue with the Spark master Web UI to access the worker and
>>>>application UI links. In brief, kubernetes service model allows me to
>>>>expose the master service to internet, but accessing the
>>>>application/workers UI is not possible as then I have to expose them too
>>>>individually and given I can have multiple application it becomes hard
>>>>to manage.
>>>>
>>>>One solution can be that the master can act as reverse proxy to access
>>>>information/state/logs from application/workers. As it has the
>>>>information about their endpoint when application/worker register with
>>>>master, so when a user initiate a request to access the information,
>>>>master can proxy the request to corresponding endpoint.
>>>>
>>>>So I am wondering if someone has already done work in this direction
>>>>then it would be great to know. If not then would the community will be
>>>>interesting in such feature. If yes then how and where I should get
>>>>started as it would be helpful for me to have some guidance to start
>>>>working on this.
>>>>
>>>>Kind Regards,
>>>>Gurvinder
>>>>
>>>>-
>>>>To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>><mailto:dev-unsubscr...@spark.apache.org>
>>>>For additional commands, e-mail: dev-h...@spark.apache.org
>>>><mailto:dev-h...@spark.apache.org>
>>>>
>>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark on kubernetes

2016-05-22 Thread Gurvinder Singh
On 05/22/2016 10:23 AM, Sun Rui wrote:
> If it is possible to rewrite URL in outbound responses in Knox or other 
> reverse proxy, would that solve your issue?
Any process which can keep track of workers and application drivers IP
addresses and route traffic to those will work. Considering Spark Master
does exactly this due to all workers and application has to register to
the master, therefore I propose master to be the place to add such a
functionality.

I am not aware with Knox capabilities but Nginx or any other normal
reverse proxy will not be able to this on its own due to dynamic nature
of application drivers and to some extent workers too.

- Gurvinder
>> On May 22, 2016, at 14:55, Gurvinder Singh <gurvinder.si...@uninett.no> 
>> wrote:
>>
>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>
>> Yeah kubernetes has ingress controller which can act the L7 load
>> balancer and router traffic to Spark UI in this case. But I am referring
>> to link present in UI to worker and application UI. Replied in the
>> detail to Sun Rui's mail where I gave example of possible scenario.
>>
>> - Gurvinder
>>>
>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>> <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> wrote:
>>>
>>>Hi,
>>>
>>>I am currently working on deploying Spark on kuberentes (K8s) and it is
>>>working fine. I am running Spark with standalone mode and checkpointing
>>>the state to shared system. So if master fails K8s starts it and from
>>>checkpoint it recover the earlier state and things just works fine. I
>>>have an issue with the Spark master Web UI to access the worker and
>>>application UI links. In brief, kubernetes service model allows me to
>>>expose the master service to internet, but accessing the
>>>application/workers UI is not possible as then I have to expose them too
>>>individually and given I can have multiple application it becomes hard
>>>to manage.
>>>
>>>One solution can be that the master can act as reverse proxy to access
>>>information/state/logs from application/workers. As it has the
>>>information about their endpoint when application/worker register with
>>>master, so when a user initiate a request to access the information,
>>>master can proxy the request to corresponding endpoint.
>>>
>>>So I am wondering if someone has already done work in this direction
>>>then it would be great to know. If not then would the community will be
>>>interesting in such feature. If yes then how and where I should get
>>>started as it would be helpful for me to have some guidance to start
>>>working on this.
>>>
>>>Kind Regards,
>>>Gurvinder
>>>
>>>-
>>>To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>><mailto:dev-unsubscr...@spark.apache.org>
>>>For additional commands, e-mail: dev-h...@spark.apache.org
>>><mailto:dev-h...@spark.apache.org>
>>>
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark on kubernetes

2016-05-22 Thread Gurvinder Singh
On 05/22/2016 08:32 AM, Reynold Xin wrote:
> Kubernetes itself already has facilities for http proxy, doesn't it?
> 
Yeah kubernetes has ingress controller which can act the L7 load
balancer and router traffic to Spark UI in this case. But I am referring
to link present in UI to worker and application UI. Replied in the
detail to Sun Rui's mail where I gave example of possible scenario.

- Gurvinder
> 
> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
> <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> wrote:
> 
> Hi,
> 
> I am currently working on deploying Spark on kuberentes (K8s) and it is
> working fine. I am running Spark with standalone mode and checkpointing
> the state to shared system. So if master fails K8s starts it and from
> checkpoint it recover the earlier state and things just works fine. I
> have an issue with the Spark master Web UI to access the worker and
> application UI links. In brief, kubernetes service model allows me to
> expose the master service to internet, but accessing the
> application/workers UI is not possible as then I have to expose them too
> individually and given I can have multiple application it becomes hard
> to manage.
> 
> One solution can be that the master can act as reverse proxy to access
> information/state/logs from application/workers. As it has the
> information about their endpoint when application/worker register with
> master, so when a user initiate a request to access the information,
> master can proxy the request to corresponding endpoint.
> 
> So I am wondering if someone has already done work in this direction
> then it would be great to know. If not then would the community will be
> interesting in such feature. If yes then how and where I should get
> started as it would be helpful for me to have some guidance to start
> working on this.
> 
> Kind Regards,
> Gurvinder
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> <mailto:dev-unsubscr...@spark.apache.org>
> For additional commands, e-mail: dev-h...@spark.apache.org
> <mailto:dev-h...@spark.apache.org>
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark on kubernetes

2016-05-22 Thread Gurvinder Singh
On 05/22/2016 08:30 AM, Sun Rui wrote:
> I think “reverse proxy” is beneficial  to monitoring a cluster in a
> secure way. This feature is not only desired for Spark on standalone,
> but also Spark on YARN, and also projects other than spark.
I think to secure the Spark you can use any reverse proxy out there e.g
Knox or light weight as nginx/node-http-proxy or pick your favorite
language. There are even oauth2-proxy
(https://github.com/bitly/oauth2_proxy) too which can secure for example
Spark UI using github/google accounts.

But the issue here is that in the Spark master UI page has links to
information about workers which points to their internal IP addresses,
so you need to have either VPN or on the same network to get the worker
information e.g logs, etc. Same goes for application UI as driver is
inside the spark cluster network.

So the idea is that the Spark master UI can act as a reverse proxy to
get these information. for example

Worker with ID worker1 running at IP address 10.2.3.4:8081 in current
situation a user accessing the master UI and want to see information
from worker1, user needs to either connect VPN or have 10.2.3.4
accessible from his/her machine. So the proposal is to have a
functionality in spark master UI where to access the worker with ID
worker1 the link will be like spark-master.com/worker1 when user access
this link, master will proxy this to 10.2.3.4:8081 and back. So user
does not need to be on the same network.

This will really simplify the spark ui access in general case too where
you will need to expose only one IP to the public.

I have done preliminary study of the code and it seems Spark is using
Jetty for it and Jetty has ProxyServlet which can serve this purpose. So
would be good to know if community is interested in having such a
feature and get together to add it then :)

- Gurvinder
> 
> Maybe Apache Knox can help you. Not sure how Knox can integrate with Spark.
>> On May 22, 2016, at 00:30, Gurvinder Singh <gurvinder.si...@uninett.no
>> <mailto:gurvinder.si...@uninett.no>> wrote:
>>
>> standalone mod
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



spark on kubernetes

2016-05-21 Thread Gurvinder Singh
Hi,

I am currently working on deploying Spark on kuberentes (K8s) and it is
working fine. I am running Spark with standalone mode and checkpointing
the state to shared system. So if master fails K8s starts it and from
checkpoint it recover the earlier state and things just works fine. I
have an issue with the Spark master Web UI to access the worker and
application UI links. In brief, kubernetes service model allows me to
expose the master service to internet, but accessing the
application/workers UI is not possible as then I have to expose them too
individually and given I can have multiple application it becomes hard
to manage.

One solution can be that the master can act as reverse proxy to access
information/state/logs from application/workers. As it has the
information about their endpoint when application/worker register with
master, so when a user initiate a request to access the information,
master can proxy the request to corresponding endpoint.

So I am wondering if someone has already done work in this direction
then it would be great to know. If not then would the community will be
interesting in such feature. If yes then how and where I should get
started as it would be helpful for me to have some guidance to start
working on this.

Kind Regards,
Gurvinder

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Fwd: spark master ui to proxy app and worker ui

2016-03-06 Thread Gurvinder Singh
I wonder if anyone got any feedback on it. I can look into implement it
but would like to know if such a functionality can be merged into master
back. If yes then please let me know and point me to the direction to
get started.

Regards,
Gurvinder
On 03/04/2016 09:25 AM, Gurvinder Singh wrote:
> Forwarding to development mailing list, as it might be more relevant
> here to ask for it. I am wondering if I miss something in the
> documentation that it might be possible already. If yes then please
> point me to the documentation as how to achieve it. If no, then would it
> make sense to implement it ?
> 
> Thanks,
> Gurvinder
> 
> 
>  Forwarded Message 
> Subject: spark master ui to proxy app and worker ui
> Date: Thu, 3 Mar 2016 20:12:07 +0100
> From: Gurvinder Singh <gurvinder.si...@uninett.no>
> To: user <u...@spark.apache.org>
> 
> Hi,
> 
> I am wondering if it is possible for the spark standalone master UI to
> proxy app/driver UI and worker UI. The reason for this is that currently
> if you want to access UI of driver and worker to see logs, you need to
> have access to their IP:port which makes it harder to open up from
> networking point of view. So operationally it makes life easier if
> master can simply proxy those connections and allow access both app and
> worker UI details from master UI itself.
> 
> Master does not need to have content stream to it all the time, only
> when user wants to access contents from other UIs then it simply proxy
> the request/response during that duration. Thus master will not have to
> incur extra load all the time.
> 
> Thanks,
> Gurvinder
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Fwd: spark master ui to proxy app and worker ui

2016-03-04 Thread Gurvinder Singh
Forwarding to development mailing list, as it might be more relevant
here to ask for it. I am wondering if I miss something in the
documentation that it might be possible already. If yes then please
point me to the documentation as how to achieve it. If no, then would it
make sense to implement it ?

Thanks,
Gurvinder


 Forwarded Message 
Subject: spark master ui to proxy app and worker ui
Date: Thu, 3 Mar 2016 20:12:07 +0100
From: Gurvinder Singh <gurvinder.si...@uninett.no>
To: user <u...@spark.apache.org>

Hi,

I am wondering if it is possible for the spark standalone master UI to
proxy app/driver UI and worker UI. The reason for this is that currently
if you want to access UI of driver and worker to see logs, you need to
have access to their IP:port which makes it harder to open up from
networking point of view. So operationally it makes life easier if
master can simply proxy those connections and allow access both app and
worker UI details from master UI itself.

Master does not need to have content stream to it all the time, only
when user wants to access contents from other UIs then it simply proxy
the request/response during that duration. Thus master will not have to
incur extra load all the time.

Thanks,
Gurvinder

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark on Mesos 0.20

2014-10-10 Thread Gurvinder Singh
On 10/10/2014 06:11 AM, Fairiz Azizi wrote:
 Hello,
 
 Sorry for the late reply.
 
 When I tried the LogQuery example this time, things now seem to be fine!
 
 ...
 
 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
 LogQuery.scala:80) finished in 0.429 s
 
 14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
 whose tasks have all completed, from pool defa
 
 14/10/10 04:01:21 INFO spark.SparkContext: Job finished: collect at
 LogQuery.scala:80, took 12.802743914 s
 
 (10.10.10.10,FRED,GET http://images.com/2013/Generic.jpg HTTP/1.1)   
 bytes=621   n=2
 
 
 Not sure if this is the correct response for that example.
 
 Our mesos/spark builds have since been updated since I last wrote.
 
 Possibly, the JDK version was updated to 1.7.0_67
 
 If you are using an older JDK, maybe try updating that?
I have tested on current JDK 7 and now I am running JDK 8, the problem
still exist. Can you run logquery on data of size say 100+ GB, so that
you have more map tasks. As we start to see the issue on larger tasks.

- Gurvinder
 
 
 - Fi
 
 
 
 Fairiz Fi Azizi
 
 On Wed, Oct 8, 2014 at 7:54 AM, RJ Nowling rnowl...@gmail.com
 mailto:rnowl...@gmail.com wrote:
 
 Yep!  That's the example I was talking about.
 
 Is an error message printed when it hangs? I get :
 
 14/09/30 13:23:14 ERROR BlockManagerMasterActor: Got two different block 
 manager registrations on 20140930-131734-1723727882-5050-1895-1
 
 
 
 On Tue, Oct 7, 2014 at 8:36 PM, Fairiz Azizi code...@gmail.com
 mailto:code...@gmail.com wrote:
 
 Sure, could you point me to the example?
 
 The only thing I could find was
 
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
 
 So do you mean running it like:
MASTER=mesos://xxx_:5050_ ./run-example LogQuery
 
 I tried that and I can see the job run and the tasks complete on
 the slave nodes, but the client process seems to hang forever,
 it's probably a different problem. BTW, only a dozen or so tasks
 kick off.
 
 I actually haven't done much with Scala and Spark (it's been all
 python).
 
 Fi
 
 
 
 Fairiz Fi Azizi
 
 On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com
 mailto:rnowl...@gmail.com wrote:
 
 I was able to reproduce it on a small 4 node cluster (1
 mesos master and 3 mesos slaves) with relatively low-end
 specs.  As I said, I just ran the log query examples with
 the fine-grained mesos mode.
 
 Spark 1.1.0 and mesos 0.20.1.
 
 Fairiz, could you try running the logquery example included
 with Spark and see what you get?
 
 Thanks!
 
 On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi
 code...@gmail.com mailto:code...@gmail.com wrote:
 
 That's what great about Spark, the community is so
 active! :)
 
 I compiled Mesos 0.20.1 from the source tarball.
 
 Using the Mapr3 Spark 1.1.0 distribution from the Spark
 downloads page  (spark-1.1.0-bin-mapr3.tgz).
 
 I see no problems for the workloads we are trying. 
 
 However, the cluster is small (less than 100 cores
 across 3 nodes).
 
 The workloads reads in just a few gigabytes from HDFS,
 via an ipython notebook spark shell.
 
 thanks,
 Fi
 
 
 
 Fairiz Fi Azizi
 
 On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen
 tnac...@gmail.com mailto:tnac...@gmail.com wrote:
 
 Ok I created SPARK-3817 to track this, will try to
 repro it as well.
 
 Tim
 
 On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling
 rnowl...@gmail.com mailto:rnowl...@gmail.com wrote:
  I've recently run into this issue as well. I get
 it from running Spark
  examples such as log query.  Maybe that'll help
 reproduce the issue.
 
 
  On Monday, October 6, 2014, Gurvinder Singh
 gurvinder.si...@uninett.no
 mailto:gurvinder.si...@uninett.no
  wrote:
 
  The issue does not occur if the task at hand has
 small number of map
  tasks. I have a task which has 978 map tasks and
 I see this error as
 
  14/10/06 09:34:40 ERROR BlockManagerMasterActor:
 Got two different block
  manager registrations on
 20140711-081617

Re: Spark on Mesos 0.20

2014-10-06 Thread Gurvinder Singh
On 10/06/2014 08:19 AM, Fairiz Azizi wrote:
 The Spark online docs indicate that Spark is compatible with Mesos 0.18.1
 
 I've gotten it to work just fine on 0.18.1 and 0.18.2
 
 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0?
 
 -Fi
 
Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse
mode, in fine grain mode there is an issue with blockmanager names
conflict. I have been waiting for it to be fixed but it is still there.

-Gurvinder

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark on Mesos 0.20

2014-10-06 Thread Gurvinder Singh
The issue does not occur if the task at hand has small number of map
tasks. I have a task which has 978 map tasks and I see this error as

14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block
manager registrations on 20140711-081617-711206558-5050-2543-5

Here is the log from the mesos-slave where this container was running.

http://pastebin.com/Q1Cuzm6Q

If you look for the code from where error produced by spark, you will
see that it simply exit and saying in comments this should never
happen, lets just quit :-)

- Gurvinder
On 10/06/2014 09:30 AM, Timothy Chen wrote:
 (Hit enter too soon...)
 
 What is your setup and steps to repro this?
 
 Tim
 
 On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com wrote:
 Hi Gurvinder,

 I tried fine grain mode before and didn't get into that problem.


 On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh
 gurvinder.si...@uninett.no wrote:
 On 10/06/2014 08:19 AM, Fairiz Azizi wrote:
 The Spark online docs indicate that Spark is compatible with Mesos 0.18.1

 I've gotten it to work just fine on 0.18.1 and 0.18.2

 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0?

 -Fi

 Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse
 mode, in fine grain mode there is an issue with blockmanager names
 conflict. I have been waiting for it to be fixed but it is still there.

 -Gurvinder

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-04 Thread Gurvinder Singh
On 09/03/2014 04:23 PM, Nicholas Chammas wrote:
 On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell pwend...@gmail.com wrote:
 
 == What default changes should I be aware of? ==
 1. The default value of spark.io.compression.codec is now snappy
 -- Old behavior can be restored by switching to lzf

 2. PySpark now performs external spilling during aggregations.
 -- Old behavior can be restored by setting spark.shuffle.spill to
 false.

 3. PySpark uses a new heuristic for determining the parallelism of
 shuffle operations.
 -- Old behavior can be restored by setting
 spark.default.parallelism to the number of cores in the cluster.

 
 Will these changes be called out in the release notes or somewhere in the
 docs?
 
 That last one (which I believe is what we discovered as the result of
 SPARK- https://issues.apache.org/jira/browse/SPARK-) could have a
 large impact on PySpark users.

Just wanted to add, it might be related to this issue or different.
There is a regression when using pyspark to read data
from HDFS. its performance during map tasks has gone down approx 1 -
0.5x. I have tested the 1.0.2 and the performance was fine, but the 1.1
release candidate has this issue. I tested by setting the following
properties to make sure it was not due to these.

set(spark.io.compression.codec,lzf).set(spark.shuffle.spill,false)

in conf object.

Regards,
Gurvinder
 
 Nick
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org