Unsubscribe

2019-01-30 Thread Nikhil.R.Patil
Unsubscribe
"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."


Re: Unsubscribe

2019-01-30 Thread Raghunadh Madamanchi
Unsubscribe

On Wed, Jan 30, 2019 at 9:19 PM Soumitra Johri 
wrote:

> unsubscribe
> On Wed, Jan 30, 2019 at 10:05 PM wang wei 
> wrote:
>
>> Unsubscribe
>>
>


Re: Unsubscribe

2019-01-30 Thread Soumitra Johri
unsubscribe
On Wed, Jan 30, 2019 at 10:05 PM wang wei 
wrote:

> Unsubscribe
>


Unsubscribe

2019-01-30 Thread 15313776907


Unsubscribe

Unsubscribe

2019-01-30 Thread wang wei
Unsubscribe

unsubscribe

2019-01-30 Thread Aditya Gautam
unsubscribe


[no subject]

2019-01-30 Thread Daniel O' Shaughnessy
Unsubscribe


Re: Apply Kmeans in partitions

2019-01-30 Thread Apostolos N. Papadopoulos

Hi Dimitri,

what is the error you are getting, please specify.

Apostolos


On 30/1/19 16:30, dimitris plakas wrote:

Hello everyone,

I have a dataframe which has 5040 rows where these rows are splitted 
in 5 groups. So i have a column called "Group_Id" which marks every 
row with values from 0-4 depending on in which group every rows 
belongs to. I am trying to split my dataframe to 5 partitions and 
apply Kmeans to every partition. I have tried


rdd=mydataframe.rdd.mapPartitions(function, True)
test = Kmeans.train(rdd, num_of_centers, "random")

but i get an error.

How can i apply Kmeans to every partition?

Thank you in advance,


--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



unsubscribe

2019-01-30 Thread Andrew Milkowski
unsubscribe


Re: Spark Kubernetes Architecture: Deployments vs Pods that create Pods

2019-01-30 Thread Li Gao
Hi Wilson,

As Yinan well said, for batch jobs with dynamic scaling requirements and
communication between driver and executor, it does not fit into the service
oriented Deployment paradigm of k8s. Thus we have the need to abstract
these spark specific differences to k8s CRD and CRD controller to manage
the lifecycle of spark batch on k8s:
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator. The CRD makes
the spark job more k8s compliant and repeatable.

Like you discovered, Deployment is typically used for job server type of
services.

-Li


On Tue, Jan 29, 2019 at 1:49 PM Yinan Li  wrote:

> Hi Wilson,
>
> The behavior of a Deployment doesn't fit with the way Spark executor pods
> are run and managed. For example, executor pods are created and deleted per
> the requests from the driver dynamically and normally they run to
> completion. A Deployment assumes uniformity and statelessness of the set of
> Pods it manages, which is not necessarily the case for Spark executors. For
> example, executor Pods have unique executor IDs. Dynamic resource
> allocation doesn't play well with a Deployment as scaling or shrinking the
> number of executor Pods requires a rolling update with a Deployment, which
> means restarting all the executor Pods. In the Kubernetes mode, the driver
> is effectively a custom controller of executor Pods that adds or deletes
> Pods per requests from the driver, and watches the status of the Pods.
>
> The way Flink on Kubernetes works, as you said, is basically running the
> Flink job/task managers using Deployments. A equivalent is running a
> standalone Spark cluster on top of Kubernetes. If you want auto-restart for
> Spark streaming jobs, I would suggest you take a look at the K8S Spark
> Operator .
>
> On Tue, Jan 29, 2019 at 5:53 AM WILSON Frank <
> frank.wil...@uk.thalesgroup.com> wrote:
>
>> Hi,
>>
>>
>>
>> I’ve been playing around with Spark Kubernetes deployments over the past
>> week and I’m curious to know why Spark deploys as a driver pod that creates
>> more worker pods.
>>
>>
>>
>> I’ve read that it’s normal to use Kubernetes Deployments to create a
>> distributed service, so I am wondering why Spark just creates Pods. I
>> suppose the driver program
>>
>> is ‘the odd one out’ so it doesn’t belong in a Deployment or ReplicaSet,
>> but maybe the workers could be Deployment? Is this something to do with
>> data locality?
>>
>>
>>
>> I have tried Streaming pipelines on Kubernetes yet, are these also Pods
>> that create Pods rather than Deployments? It seems more important for a
>> streaming pipeline to be ‘durable’[1] as the Kubernetes documentation might
>> say.
>>
>>
>>
>> I ask this question partly because the Kubernetes deployment of Spark is
>> still experimental and I am wondering whether this aspect of the deployment
>> might change.
>>
>>
>>
>> I had a look at the Flink[2] documentation and it does seem to use
>> Deployments however these seem to be a lightweight job/task manager that
>> accepts Flink jobs. It sounds actually like running a lightweight version
>> YARN inside containers on Kubernetes.
>>
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>>
>>
>> Frank
>>
>>
>>
>> [1]
>> https://kubernetes.io/docs/concepts/workloads/pods/pod/#durability-of-pods-or-lack-thereof
>>
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html
>>
>


Apply Kmeans in partitions

2019-01-30 Thread dimitris plakas
Hello everyone,

I have a dataframe which has 5040 rows where these rows are splitted in 5
groups. So i have a column called "Group_Id" which marks every row with
values from 0-4 depending on in which group every rows belongs to. I am
trying to split my dataframe to 5 partitions and apply Kmeans to every
partition. I have tried

rdd=mydataframe.rdd.mapPartitions(function, True)
test = Kmeans.train(rdd, num_of_centers, "random")

but i get an error.

How can i apply Kmeans to every partition?

Thank you in advance,


Re: unsubscribe

2019-01-30 Thread Aditya Gautam
On Tue, Jan 29, 2019, 10:14 AM Charles Nnamdi Akalugwu  unsubscribe
>