Convert a line of String into column
I want to convert a line of String to a table. For instance, I want to convert following line ... # this is a line in a text file, separated by a white space to table +-+--++--+ |col1| col2| col3...|col6| +-+-+-+-+ |val1|val2|val3|val6| +-+--+---.+-+ . The code looks as below import org.apache.spark.sql.functions._import org.apache.spark.sql.SparkSessionval spark = SparkSession .builder .master("local") .appName("MyApp") .getOrCreate()import spark.implicits._val lines = spark.readStream.textFile("/tmp/data/")val words = lines.as[String].flatMap(_.split(" "))words.printSchema()val query = words. writeStream. outputMode("append"). format("console"). startquery.awaitTermination() But in fact this code only turns the line into a single column +---+ | value| +---+ |col1...| |col2...| | col3..| | ... | | col6 | +--+ How to achieve the effect that I want to do? Thanks?
Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster
Kube-api server logs are not enabled. I will enable and check and get back on this. Regards Manish Gupta On Tue, Oct 1, 2019 at 9:05 PM Prudhvi Chennuru (CONT) < prudhvi.chenn...@capitalone.com> wrote: > If you are passing the service account for executors as spark property > then executor will use the one you are passing not the default service > account. Did you check the api server logs? > > On Tue, Oct 1, 2019 at 11:07 AM manish gupta > wrote: > >> While launching the driver pod I am passing the service account which has >> cluster role and has all the required permissions to create a new pod. So >> will driver pass the same details to API server while creating executor pod >> OR executors will be created with default service account? >> >> Regards >> Manish Gupta >> >> On Tue, Oct 1, 2019 at 8:01 PM Prudhvi Chennuru (CONT) < >> prudhvi.chenn...@capitalone.com> wrote: >> >>> By default, executors use default service account in the namespace you >>> are creating the driver and executors so i am guessing that executors don't >>> have access to run on the cluster, if you check the kube-apisever logs you >>> will know the issue >>> and try giving privileged access to default service account in the >>> namespace you are creating the executors it should work. >>> >>> On Tue, Oct 1, 2019 at 10:25 AM manish gupta >>> wrote: >>> Hi Prudhvi I can see this issue consistently. I am doing a POC wherein I am trying to create a dynamic spark cluster to run my job using spark submit on Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it fails to launch executor pod. It is able to launch driver pod but not sure why it cannot launch executor pod even though it has ample resources.I dont see any error message in the logs apart from the warning message that I have provided above. Not even a single executor pod is getting launched. Regards Manish Gupta On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) < prudhvi.chenn...@capitalone.com> wrote: > Hi Manish, > > Are you seeing this issue consistently or sporadically? > and when you say executors are not launched not even a single executor > created for that driver pod? > > On Tue, Oct 1, 2019 at 1:43 AM manish gupta > wrote: > >> Hi Team >> >> I am trying to create a spark cluster on kubernetes with rbac enabled >> using spark submit job. I am using spark-2.4.1 version. >> Spark submit is able to launch the driver pod by contacting >> Kubernetes API server but executor Pod is not getting launched. I can see >> the below warning message in the driver pod logs. >> >> >> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 >> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not >> accepted >> any resources; check your cluster UI to ensure that workers are >> registered >> and have sufficient resources* >> >> I have faced this issue in standalone spark clusters and resolved it >> but not sure how to resolve this issue in kubernetes. I have not given >> any >> ResourceQuota configuration in kubernetes rbac yaml file and there is >> ample >> memory and cpu available for any new pod/container to be launched. >> >> Any leads/pointers to resolve this issue would be of great help. >> >> Thanks and Regards >> Manish Gupta >> > > > -- > *Thanks,* > *Prudhvi Chennuru.* > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the > intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > > > > > >>> >>> -- >>> *Thanks,* >>> *Prudhvi Chennuru.* >>> -- >>> >>> The information contained in this e-mail is confidential and/or >>> proprietary to Capital One and/or its affiliates and may only be used >>> solely in performance of work or services for Capital One. The information >>> transmitted herewith is intended only for use by the individual or entity >>> to which it is addressed. If the reader of this message is not the intended >>> recipient, you are hereby notified that any review, retransmission, >>> dissemination, distribution, copying or other use of, or
Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster
If you are passing the service account for executors as spark property then executor will use the one you are passing not the default service account. Did you check the api server logs? On Tue, Oct 1, 2019 at 11:07 AM manish gupta wrote: > While launching the driver pod I am passing the service account which has > cluster role and has all the required permissions to create a new pod. So > will driver pass the same details to API server while creating executor pod > OR executors will be created with default service account? > > Regards > Manish Gupta > > On Tue, Oct 1, 2019 at 8:01 PM Prudhvi Chennuru (CONT) < > prudhvi.chenn...@capitalone.com> wrote: > >> By default, executors use default service account in the namespace you >> are creating the driver and executors so i am guessing that executors don't >> have access to run on the cluster, if you check the kube-apisever logs you >> will know the issue >> and try giving privileged access to default service account in the >> namespace you are creating the executors it should work. >> >> On Tue, Oct 1, 2019 at 10:25 AM manish gupta >> wrote: >> >>> Hi Prudhvi >>> >>> I can see this issue consistently. I am doing a POC wherein I am trying >>> to create a dynamic spark cluster to run my job using spark submit on >>> Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it >>> fails to launch executor pod. It is able to launch driver pod but not sure >>> why it cannot launch executor pod even though it has ample resources.I dont >>> see any error message in the logs apart from the warning message that I >>> have provided above. >>> Not even a single executor pod is getting launched. >>> >>> Regards >>> Manish Gupta >>> >>> On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) < >>> prudhvi.chenn...@capitalone.com> wrote: >>> Hi Manish, Are you seeing this issue consistently or sporadically? and when you say executors are not launched not even a single executor created for that driver pod? On Tue, Oct 1, 2019 at 1:43 AM manish gupta wrote: > Hi Team > > I am trying to create a spark cluster on kubernetes with rbac enabled > using spark submit job. I am using spark-2.4.1 version. > Spark submit is able to launch the driver pod by contacting Kubernetes > API server but executor Pod is not getting launched. I can see the below > warning message in the driver pod logs. > > > *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 > tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not > accepted > any resources; check your cluster UI to ensure that workers are registered > and have sufficient resources* > > I have faced this issue in standalone spark clusters and resolved it > but not sure how to resolve this issue in kubernetes. I have not given any > ResourceQuota configuration in kubernetes rbac yaml file and there is > ample > memory and cpu available for any new pod/container to be launched. > > Any leads/pointers to resolve this issue would be of great help. > > Thanks and Regards > Manish Gupta > -- *Thanks,* *Prudhvi Chennuru.* -- The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. >> >> -- >> *Thanks,* >> *Prudhvi Chennuru.* >> -- >> >> The information contained in this e-mail is confidential and/or >> proprietary to Capital One and/or its affiliates and may only be used >> solely in performance of work or services for Capital One. The information >> transmitted herewith is intended only for use by the individual or entity >> to which it is addressed. If the reader of this message is not the intended >> recipient, you are hereby notified that any review, retransmission, >> dissemination, distribution, copying or other use of, or taking of any >> action in reliance upon this information is strictly prohibited. If you >> have received this communication in error, please contact the sender and >> delete the material from your computer. >> >> >> >> >> -- *Thanks,* *Prudhvi Chennuru.* __ The informatio
Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster
While launching the driver pod I am passing the service account which has cluster role and has all the required permissions to create a new pod. So will driver pass the same details to API server while creating executor pod OR executors will be created with default service account? Regards Manish Gupta On Tue, Oct 1, 2019 at 8:01 PM Prudhvi Chennuru (CONT) < prudhvi.chenn...@capitalone.com> wrote: > By default, executors use default service account in the namespace you are > creating the driver and executors so i am guessing that executors don't > have access to run on the cluster, if you check the kube-apisever logs you > will know the issue > and try giving privileged access to default service account in the > namespace you are creating the executors it should work. > > On Tue, Oct 1, 2019 at 10:25 AM manish gupta > wrote: > >> Hi Prudhvi >> >> I can see this issue consistently. I am doing a POC wherein I am trying >> to create a dynamic spark cluster to run my job using spark submit on >> Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it >> fails to launch executor pod. It is able to launch driver pod but not sure >> why it cannot launch executor pod even though it has ample resources.I dont >> see any error message in the logs apart from the warning message that I >> have provided above. >> Not even a single executor pod is getting launched. >> >> Regards >> Manish Gupta >> >> On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) < >> prudhvi.chenn...@capitalone.com> wrote: >> >>> Hi Manish, >>> >>> Are you seeing this issue consistently or sporadically? and >>> when you say executors are not launched not even a single executor created >>> for that driver pod? >>> >>> On Tue, Oct 1, 2019 at 1:43 AM manish gupta >>> wrote: >>> Hi Team I am trying to create a spark cluster on kubernetes with rbac enabled using spark submit job. I am using spark-2.4.1 version. Spark submit is able to launch the driver pod by contacting Kubernetes API server but executor Pod is not getting launched. I can see the below warning message in the driver pod logs. *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources* I have faced this issue in standalone spark clusters and resolved it but not sure how to resolve this issue in kubernetes. I have not given any ResourceQuota configuration in kubernetes rbac yaml file and there is ample memory and cpu available for any new pod/container to be launched. Any leads/pointers to resolve this issue would be of great help. Thanks and Regards Manish Gupta >>> >>> >>> -- >>> *Thanks,* >>> *Prudhvi Chennuru.* >>> -- >>> >>> The information contained in this e-mail is confidential and/or >>> proprietary to Capital One and/or its affiliates and may only be used >>> solely in performance of work or services for Capital One. The information >>> transmitted herewith is intended only for use by the individual or entity >>> to which it is addressed. If the reader of this message is not the intended >>> recipient, you are hereby notified that any review, retransmission, >>> dissemination, distribution, copying or other use of, or taking of any >>> action in reliance upon this information is strictly prohibited. If you >>> have received this communication in error, please contact the sender and >>> delete the material from your computer. >>> >>> >>> >>> >>> > > -- > *Thanks,* > *Prudhvi Chennuru.* > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > > > > >
Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster
By default, executors use default service account in the namespace you are creating the driver and executors so i am guessing that executors don't have access to run on the cluster, if you check the kube-apisever logs you will know the issue and try giving privileged access to default service account in the namespace you are creating the executors it should work. On Tue, Oct 1, 2019 at 10:25 AM manish gupta wrote: > Hi Prudhvi > > I can see this issue consistently. I am doing a POC wherein I am trying to > create a dynamic spark cluster to run my job using spark submit on > Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it > fails to launch executor pod. It is able to launch driver pod but not sure > why it cannot launch executor pod even though it has ample resources.I dont > see any error message in the logs apart from the warning message that I > have provided above. > Not even a single executor pod is getting launched. > > Regards > Manish Gupta > > On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) < > prudhvi.chenn...@capitalone.com> wrote: > >> Hi Manish, >> >> Are you seeing this issue consistently or sporadically? and >> when you say executors are not launched not even a single executor created >> for that driver pod? >> >> On Tue, Oct 1, 2019 at 1:43 AM manish gupta >> wrote: >> >>> Hi Team >>> >>> I am trying to create a spark cluster on kubernetes with rbac enabled >>> using spark submit job. I am using spark-2.4.1 version. >>> Spark submit is able to launch the driver pod by contacting Kubernetes >>> API server but executor Pod is not getting launched. I can see the below >>> warning message in the driver pod logs. >>> >>> >>> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 >>> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted >>> any resources; check your cluster UI to ensure that workers are registered >>> and have sufficient resources* >>> >>> I have faced this issue in standalone spark clusters and resolved it but >>> not sure how to resolve this issue in kubernetes. I have not given any >>> ResourceQuota configuration in kubernetes rbac yaml file and there is ample >>> memory and cpu available for any new pod/container to be launched. >>> >>> Any leads/pointers to resolve this issue would be of great help. >>> >>> Thanks and Regards >>> Manish Gupta >>> >> >> >> -- >> *Thanks,* >> *Prudhvi Chennuru.* >> -- >> >> The information contained in this e-mail is confidential and/or >> proprietary to Capital One and/or its affiliates and may only be used >> solely in performance of work or services for Capital One. The information >> transmitted herewith is intended only for use by the individual or entity >> to which it is addressed. If the reader of this message is not the intended >> recipient, you are hereby notified that any review, retransmission, >> dissemination, distribution, copying or other use of, or taking of any >> action in reliance upon this information is strictly prohibited. If you >> have received this communication in error, please contact the sender and >> delete the material from your computer. >> >> >> >> >> -- *Thanks,* *Prudhvi Chennuru.* __ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster
Hi Prudhvi I can see this issue consistently. I am doing a POC wherein I am trying to create a dynamic spark cluster to run my job using spark submit on Kubernetes. On Minikube it works fine but on rbac enabled kubernetes it fails to launch executor pod. It is able to launch driver pod but not sure why it cannot launch executor pod even though it has ample resources.I dont see any error message in the logs apart from the warning message that I have provided above. Not even a single executor pod is getting launched. Regards Manish Gupta On Tue, Oct 1, 2019 at 6:31 PM Prudhvi Chennuru (CONT) < prudhvi.chenn...@capitalone.com> wrote: > Hi Manish, > > Are you seeing this issue consistently or sporadically? and > when you say executors are not launched not even a single executor created > for that driver pod? > > On Tue, Oct 1, 2019 at 1:43 AM manish gupta > wrote: > >> Hi Team >> >> I am trying to create a spark cluster on kubernetes with rbac enabled >> using spark submit job. I am using spark-2.4.1 version. >> Spark submit is able to launch the driver pod by contacting Kubernetes >> API server but executor Pod is not getting launched. I can see the below >> warning message in the driver pod logs. >> >> >> *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 >> tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted >> any resources; check your cluster UI to ensure that workers are registered >> and have sufficient resources* >> >> I have faced this issue in standalone spark clusters and resolved it but >> not sure how to resolve this issue in kubernetes. I have not given any >> ResourceQuota configuration in kubernetes rbac yaml file and there is ample >> memory and cpu available for any new pod/container to be launched. >> >> Any leads/pointers to resolve this issue would be of great help. >> >> Thanks and Regards >> Manish Gupta >> > > > -- > *Thanks,* > *Prudhvi Chennuru.* > -- > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > > > > >
Re: [External Sender] Spark Executor pod not getting created on kubernetes cluster
Hi Manish, Are you seeing this issue consistently or sporadically? and when you say executors are not launched not even a single executor created for that driver pod? On Tue, Oct 1, 2019 at 1:43 AM manish gupta wrote: > Hi Team > > I am trying to create a spark cluster on kubernetes with rbac enabled > using spark submit job. I am using spark-2.4.1 version. > Spark submit is able to launch the driver pod by contacting Kubernetes API > server but executor Pod is not getting launched. I can see the below > warning message in the driver pod logs. > > > *19/09/27 10:16:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 > tasks19/09/27 10:16:16 WARN TaskSchedulerImpl: Initial job has not accepted > any resources; check your cluster UI to ensure that workers are registered > and have sufficient resources* > > I have faced this issue in standalone spark clusters and resolved it but > not sure how to resolve this issue in kubernetes. I have not given any > ResourceQuota configuration in kubernetes rbac yaml file and there is ample > memory and cpu available for any new pod/container to be launched. > > Any leads/pointers to resolve this issue would be of great help. > > Thanks and Regards > Manish Gupta > -- *Thanks,* *Prudhvi Chennuru.* __ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.