Convert a line of String into column

2019-10-01 Thread hamishberridge
I want to convert a line of String to a table. For instance, I want to convert 
following line 

   ... # this is a line in a text file, 
separated by a white space

to table 

+-+--++--+
|col1| col2| col3...|col6|
+-+-+-+-+
|val1|val2|val3|val6|
+-+--+---.+-+
.

The code looks as below

import org.apache.spark.sql.functions._import 
org.apache.spark.sql.SparkSessionval spark = SparkSession  .builder 
 .master("local")  .appName("MyApp")  .getOrCreate()import 
spark.implicits._val lines = spark.readStream.textFile("/tmp/data/")val 
words = lines.as[String].flatMap(_.split(" "))words.printSchema()val 
query = words.  writeStream.  outputMode("append").  
format("console").  startquery.awaitTermination()
But in fact this code only turns the line into a single column

+---+
| value|
+---+
|col1...|
|col2...|
| col3..|
|  ... |
|  col6 |
+--+

How to achieve the effect that I want to do?

Thanks? 



Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-01 Thread manish gupta
Kube-api server logs are not enabled. I will enable and check and get back
on this.

Regards
Manish Gupta

On Tue, Oct 1, 2019 at 9:05 PM Prudhvi Chennuru (CONT) <
prudhvi.chenn...@capitalone.com> wrote:

> If you are passing the service account for executors as spark property
> then executor will use the one you are passing not the default service
> account. Did you check the api server logs?
>
> On Tue, Oct 1, 2019 at 11:07 AM manish gupta 
> wrote:
>
>> While launching the driver pod I am passing the service account which has
>> cluster role and has all the required permissions to create a new pod. So
>> will driver pass the same details to API server while creating executor pod
>> OR executors will be created with default service account?
>>
>> Regards
>> Manish Gupta
>>
>> On Tue, Oct 1, 2019 at 8:01 PM Prudhvi Chennuru (CONT) <
>> prudhvi.chenn...@capitalone.com> wrote:
>>
>>> By default, executors use default service account in the namespace you
>>> are creating the driver and executors so i am guessing that executors don't
>>> have access to run on the cluster, if you check the kube-apisever logs you
>>> will know the issue
>>> and try giving privileged access to default service account in the
>>> namespace you are creating the executors it should work.
>>>
>>> On Tue, Oct 1, 2019 at 10:25 AM manish gupta 
>>> wrote:
>>>
 Hi Prudhvi

 I can see this issue consistently. I am doing a POC wherein I am trying
 to create a dynamic spark cluster to run my job using spark submit on
 Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it
 fails to launch executor pod. It is able to launch driver pod but not sure
 why it cannot launch executor pod even though it has ample resources.I dont
 see any error message in the logs apart from the warning message that I
 have provided above.
 Not even a single executor pod is getting launched.

 Regards
 Manish Gupta

 On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) <
 prudhvi.chenn...@capitalone.com> wrote:

> Hi Manish,
>
> Are you seeing this issue consistently or sporadically?
> and when you say executors are not launched not even a single executor
> created for that driver pod?
>
> On Tue, Oct 1, 2019 at 1:43 AM manish gupta 
> wrote:
>
>> Hi Team
>>
>> I am trying to create a spark cluster on kubernetes with rbac enabled
>> using spark submit job. I am using spark-2.4.1 version.
>> Spark submit is able to launch the driver pod by contacting
>> Kubernetes API server but executor Pod is not getting launched. I can see
>> the below warning message in the driver pod logs.
>>
>>
>> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3
>> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not 
>> accepted
>> any resources; check your cluster UI to ensure that workers are 
>> registered
>> and have sufficient resources*
>>
>> I have faced this issue in standalone spark clusters and resolved it
>> but not sure how to resolve this issue in kubernetes. I have not given 
>> any
>> ResourceQuota configuration in kubernetes rbac yaml file and there is 
>> ample
>> memory and cpu available for any new pod/container to be launched.
>>
>> Any leads/pointers to resolve this issue would be of great help.
>>
>> Thanks and Regards
>> Manish Gupta
>>
>
>
> --
> *Thanks,*
> *Prudhvi Chennuru.*
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the 
> intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>
>>>
>>> --
>>> *Thanks,*
>>> *Prudhvi Chennuru.*
>>> --
>>>
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The information
>>> transmitted herewith is intended only for use by the individual or entity
>>> to which it is addressed. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or

Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-01 Thread Prudhvi Chennuru (CONT)
If you are passing the service account for executors as spark property then
executor will use the one you are passing not the default service account.
Did you check the api server logs?

On Tue, Oct 1, 2019 at 11:07 AM manish gupta 
wrote:

> While launching the driver pod I am passing the service account which has
> cluster role and has all the required permissions to create a new pod. So
> will driver pass the same details to API server while creating executor pod
> OR executors will be created with default service account?
>
> Regards
> Manish Gupta
>
> On Tue, Oct 1, 2019 at 8:01 PM Prudhvi Chennuru (CONT) <
> prudhvi.chenn...@capitalone.com> wrote:
>
>> By default, executors use default service account in the namespace you
>> are creating the driver and executors so i am guessing that executors don't
>> have access to run on the cluster, if you check the kube-apisever logs you
>> will know the issue
>> and try giving privileged access to default service account in the
>> namespace you are creating the executors it should work.
>>
>> On Tue, Oct 1, 2019 at 10:25 AM manish gupta 
>> wrote:
>>
>>> Hi Prudhvi
>>>
>>> I can see this issue consistently. I am doing a POC wherein I am trying
>>> to create a dynamic spark cluster to run my job using spark submit on
>>> Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it
>>> fails to launch executor pod. It is able to launch driver pod but not sure
>>> why it cannot launch executor pod even though it has ample resources.I dont
>>> see any error message in the logs apart from the warning message that I
>>> have provided above.
>>> Not even a single executor pod is getting launched.
>>>
>>> Regards
>>> Manish Gupta
>>>
>>> On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) <
>>> prudhvi.chenn...@capitalone.com> wrote:
>>>
 Hi Manish,

 Are you seeing this issue consistently or sporadically? and
 when you say executors are not launched not even a single executor created
 for that driver pod?

 On Tue, Oct 1, 2019 at 1:43 AM manish gupta 
 wrote:

> Hi Team
>
> I am trying to create a spark cluster on kubernetes with rbac enabled
> using spark submit job. I am using spark-2.4.1 version.
> Spark submit is able to launch the driver pod by contacting Kubernetes
> API server but executor Pod is not getting launched. I can see the below
> warning message in the driver pod logs.
>
>
> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3
> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not 
> accepted
> any resources; check your cluster UI to ensure that workers are registered
> and have sufficient resources*
>
> I have faced this issue in standalone spark clusters and resolved it
> but not sure how to resolve this issue in kubernetes. I have not given any
> ResourceQuota configuration in kubernetes rbac yaml file and there is 
> ample
> memory and cpu available for any new pod/container to be launched.
>
> Any leads/pointers to resolve this issue would be of great help.
>
> Thanks and Regards
> Manish Gupta
>


 --
 *Thanks,*
 *Prudhvi Chennuru.*
 --

 The information contained in this e-mail is confidential and/or
 proprietary to Capital One and/or its affiliates and may only be used
 solely in performance of work or services for Capital One. The information
 transmitted herewith is intended only for use by the individual or entity
 to which it is addressed. If the reader of this message is not the intended
 recipient, you are hereby notified that any review, retransmission,
 dissemination, distribution, copying or other use of, or taking of any
 action in reliance upon this information is strictly prohibited. If you
 have received this communication in error, please contact the sender and
 delete the material from your computer.





>>
>> --
>> *Thanks,*
>> *Prudhvi Chennuru.*
>> --
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>
>>
>>
>>

-- 
*Thanks,*
*Prudhvi Chennuru.*

__



The informatio

Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-01 Thread manish gupta
While launching the driver pod I am passing the service account which has
cluster role and has all the required permissions to create a new pod. So
will driver pass the same details to API server while creating executor pod
OR executors will be created with default service account?

Regards
Manish Gupta

On Tue, Oct 1, 2019 at 8:01 PM Prudhvi Chennuru (CONT) <
prudhvi.chenn...@capitalone.com> wrote:

> By default, executors use default service account in the namespace you are
> creating the driver and executors so i am guessing that executors don't
> have access to run on the cluster, if you check the kube-apisever logs you
> will know the issue
> and try giving privileged access to default service account in the
> namespace you are creating the executors it should work.
>
> On Tue, Oct 1, 2019 at 10:25 AM manish gupta 
> wrote:
>
>> Hi Prudhvi
>>
>> I can see this issue consistently. I am doing a POC wherein I am trying
>> to create a dynamic spark cluster to run my job using spark submit on
>> Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it
>> fails to launch executor pod. It is able to launch driver pod but not sure
>> why it cannot launch executor pod even though it has ample resources.I dont
>> see any error message in the logs apart from the warning message that I
>> have provided above.
>> Not even a single executor pod is getting launched.
>>
>> Regards
>> Manish Gupta
>>
>> On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) <
>> prudhvi.chenn...@capitalone.com> wrote:
>>
>>> Hi Manish,
>>>
>>> Are you seeing this issue consistently or sporadically? and
>>> when you say executors are not launched not even a single executor created
>>> for that driver pod?
>>>
>>> On Tue, Oct 1, 2019 at 1:43 AM manish gupta 
>>> wrote:
>>>
 Hi Team

 I am trying to create a spark cluster on kubernetes with rbac enabled
 using spark submit job. I am using spark-2.4.1 version.
 Spark submit is able to launch the driver pod by contacting Kubernetes
 API server but executor Pod is not getting launched. I can see the below
 warning message in the driver pod logs.


 *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3
 tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted
 any resources; check your cluster UI to ensure that workers are registered
 and have sufficient resources*

 I have faced this issue in standalone spark clusters and resolved it
 but not sure how to resolve this issue in kubernetes. I have not given any
 ResourceQuota configuration in kubernetes rbac yaml file and there is ample
 memory and cpu available for any new pod/container to be launched.

 Any leads/pointers to resolve this issue would be of great help.

 Thanks and Regards
 Manish Gupta

>>>
>>>
>>> --
>>> *Thanks,*
>>> *Prudhvi Chennuru.*
>>> --
>>>
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The information
>>> transmitted herewith is intended only for use by the individual or entity
>>> to which it is addressed. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of any
>>> action in reliance upon this information is strictly prohibited. If you
>>> have received this communication in error, please contact the sender and
>>> delete the material from your computer.
>>>
>>>
>>>
>>>
>>>
>
> --
> *Thanks,*
> *Prudhvi Chennuru.*
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>


Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-01 Thread Prudhvi Chennuru (CONT)
By default, executors use default service account in the namespace you are
creating the driver and executors so i am guessing that executors don't
have access to run on the cluster, if you check the kube-apisever logs you
will know the issue
and try giving privileged access to default service account in the
namespace you are creating the executors it should work.

On Tue, Oct 1, 2019 at 10:25 AM manish gupta 
wrote:

> Hi Prudhvi
>
> I can see this issue consistently. I am doing a POC wherein I am trying to
> create a dynamic spark cluster to run my job using spark submit on
> Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it
> fails to launch executor pod. It is able to launch driver pod but not sure
> why it cannot launch executor pod even though it has ample resources.I dont
> see any error message in the logs apart from the warning message that I
> have provided above.
> Not even a single executor pod is getting launched.
>
> Regards
> Manish Gupta
>
> On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) <
> prudhvi.chenn...@capitalone.com> wrote:
>
>> Hi Manish,
>>
>> Are you seeing this issue consistently or sporadically? and
>> when you say executors are not launched not even a single executor created
>> for that driver pod?
>>
>> On Tue, Oct 1, 2019 at 1:43 AM manish gupta 
>> wrote:
>>
>>> Hi Team
>>>
>>> I am trying to create a spark cluster on kubernetes with rbac enabled
>>> using spark submit job. I am using spark-2.4.1 version.
>>> Spark submit is able to launch the driver pod by contacting Kubernetes
>>> API server but executor Pod is not getting launched. I can see the below
>>> warning message in the driver pod logs.
>>>
>>>
>>> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3
>>> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted
>>> any resources; check your cluster UI to ensure that workers are registered
>>> and have sufficient resources*
>>>
>>> I have faced this issue in standalone spark clusters and resolved it but
>>> not sure how to resolve this issue in kubernetes. I have not given any
>>> ResourceQuota configuration in kubernetes rbac yaml file and there is ample
>>> memory and cpu available for any new pod/container to be launched.
>>>
>>> Any leads/pointers to resolve this issue would be of great help.
>>>
>>> Thanks and Regards
>>> Manish Gupta
>>>
>>
>>
>> --
>> *Thanks,*
>> *Prudhvi Chennuru.*
>> --
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>
>>
>>
>>

-- 
*Thanks,*
*Prudhvi Chennuru.*

__



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.





Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-01 Thread manish gupta
Hi Prudhvi

I can see this issue consistently. I am doing a POC wherein I am trying to
create a dynamic spark cluster to run my job using spark submit on
Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it
fails to launch executor pod. It is able to launch driver pod but not sure
why it cannot launch executor pod even though it has ample resources.I dont
see any error message in the logs apart from the warning message that I
have provided above.
Not even a single executor pod is getting launched.

Regards
Manish Gupta

On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) <
prudhvi.chenn...@capitalone.com> wrote:

> Hi Manish,
>
> Are you seeing this issue consistently or sporadically? and
> when you say executors are not launched not even a single executor created
> for that driver pod?
>
> On Tue, Oct 1, 2019 at 1:43 AM manish gupta 
> wrote:
>
>> Hi Team
>>
>> I am trying to create a spark cluster on kubernetes with rbac enabled
>> using spark submit job. I am using spark-2.4.1 version.
>> Spark submit is able to launch the driver pod by contacting Kubernetes
>> API server but executor Pod is not getting launched. I can see the below
>> warning message in the driver pod logs.
>>
>>
>> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3
>> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted
>> any resources; check your cluster UI to ensure that workers are registered
>> and have sufficient resources*
>>
>> I have faced this issue in standalone spark clusters and resolved it but
>> not sure how to resolve this issue in kubernetes. I have not given any
>> ResourceQuota configuration in kubernetes rbac yaml file and there is ample
>> memory and cpu available for any new pod/container to be launched.
>>
>> Any leads/pointers to resolve this issue would be of great help.
>>
>> Thanks and Regards
>> Manish Gupta
>>
>
>
> --
> *Thanks,*
> *Prudhvi Chennuru.*
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>


Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-01 Thread Prudhvi Chennuru (CONT)
Hi Manish,

Are you seeing this issue consistently or sporadically? and
when you say executors are not launched not even a single executor created
for that driver pod?

On Tue, Oct 1, 2019 at 1:43 AM manish gupta 
wrote:

> Hi Team
>
> I am trying to create a spark cluster on kubernetes with rbac enabled
> using spark submit job. I am using spark-2.4.1 version.
> Spark submit is able to launch the driver pod by contacting Kubernetes API
> server but executor Pod is not getting launched. I can see the below
> warning message in the driver pod logs.
>
>
> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3
> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted
> any resources; check your cluster UI to ensure that workers are registered
> and have sufficient resources*
>
> I have faced this issue in standalone spark clusters and resolved it but
> not sure how to resolve this issue in kubernetes. I have not given any
> ResourceQuota configuration in kubernetes rbac yaml file and there is ample
> memory and cpu available for any new pod/container to be launched.
>
> Any leads/pointers to resolve this issue would be of great help.
>
> Thanks and Regards
> Manish Gupta
>


-- 
*Thanks,*
*Prudhvi Chennuru.*

__



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.