Re: [EXTERNAL] Urgent Help - Py Spark submit error
Hi, once again lets start with the requirement. Why are you trying to pass xml and json files to SPARK instead of reading them in SPARK? Generally when people pass on files they are python or jar files. Regards, Gourav On Sat, May 15, 2021 at 5:03 AM Amit Joshi wrote: > Hi KhajaAsmath, > > Client vs Cluster: In client mode driver runs in the machine from where > you submit your job. Whereas in cluster mode driver runs in one of the > worker nodes. > > I think you need to pass the conf file to your driver, as you are using it > in the driver code, which runs in one of the worker nodes. > Use this command to pass it to driver > *--files /appl/common/ftp/conf.json --conf > spark.driver.extraJavaOptions="-Dconfig.file=conf.json* > > And make sure you are able to access the file location from worker nodes. > > > Regards > Amit Joshi > > On Sat, May 15, 2021 at 5:14 AM KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >> Here is my updated spark submit without any luck., >> >> spark-submit --master yarn --deploy-mode cluster --files >> /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml >> --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g >> --executor-memory 7g /appl/common/ftp/ftp_event_data.py >> /appl/common/ftp/conf.json 2021-05-10 7 >> >> On Fri, May 14, 2021 at 6:19 PM KhajaAsmath Mohammed < >> mdkhajaasm...@gmail.com> wrote: >> >>> Sorry my bad, it did not resolve the issue. I still have the same issue. >>> can anyone please guide me. I was still running as a client instead of a >>> cluster. >>> >>> On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed < >>> mdkhajaasm...@gmail.com> wrote: >>> You are right. It worked but I still don't understand why I need to pass that to all executors. On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > I am using json only to read properties before calling spark session. > I don't know why we need to pass that to all executors. > > > On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < > longjiang.y...@target.com> wrote: > >> Could you check whether this file is accessible in executors? (is it >> in HDFS or in the client local FS) >> /appl/common/ftp/conf.json >> >> >> >> >> >> *From: *KhajaAsmath Mohammed >> *Date: *Friday, May 14, 2021 at 4:50 PM >> *To: *"user @spark" >> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >> >> >> >> /appl/common/ftp/conf.json >> >
Re: [EXTERNAL] Urgent Help - Py Spark submit error
Hi KhajaAsmath, Client vs Cluster: In client mode driver runs in the machine from where you submit your job. Whereas in cluster mode driver runs in one of the worker nodes. I think you need to pass the conf file to your driver, as you are using it in the driver code, which runs in one of the worker nodes. Use this command to pass it to driver *--files /appl/common/ftp/conf.json --conf spark.driver.extraJavaOptions="-Dconfig.file=conf.json* And make sure you are able to access the file location from worker nodes. Regards Amit Joshi On Sat, May 15, 2021 at 5:14 AM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Here is my updated spark submit without any luck., > > spark-submit --master yarn --deploy-mode cluster --files > /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml > --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g > --executor-memory 7g /appl/common/ftp/ftp_event_data.py > /appl/common/ftp/conf.json 2021-05-10 7 > > On Fri, May 14, 2021 at 6:19 PM KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >> Sorry my bad, it did not resolve the issue. I still have the same issue. >> can anyone please guide me. I was still running as a client instead of a >> cluster. >> >> On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed < >> mdkhajaasm...@gmail.com> wrote: >> >>> You are right. It worked but I still don't understand why I need to pass >>> that to all executors. >>> >>> On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed < >>> mdkhajaasm...@gmail.com> wrote: >>> I am using json only to read properties before calling spark session. I don't know why we need to pass that to all executors. On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < longjiang.y...@target.com> wrote: > Could you check whether this file is accessible in executors? (is it > in HDFS or in the client local FS) > /appl/common/ftp/conf.json > > > > > > *From: *KhajaAsmath Mohammed > *Date: *Friday, May 14, 2021 at 4:50 PM > *To: *"user @spark" > *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error > > > > /appl/common/ftp/conf.json >
Re: [EXTERNAL] Urgent Help - Py Spark submit error
Here is my updated spark submit without any luck., spark-submit --master yarn --deploy-mode cluster --files /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g --executor-memory 7g /appl/common/ftp/ftp_event_data.py /appl/common/ftp/conf.json 2021-05-10 7 On Fri, May 14, 2021 at 6:19 PM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Sorry my bad, it did not resolve the issue. I still have the same issue. > can anyone please guide me. I was still running as a client instead of a > cluster. > > On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >> You are right. It worked but I still don't understand why I need to pass >> that to all executors. >> >> On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed < >> mdkhajaasm...@gmail.com> wrote: >> >>> I am using json only to read properties before calling spark session. I >>> don't know why we need to pass that to all executors. >>> >>> >>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < >>> longjiang.y...@target.com> wrote: >>> Could you check whether this file is accessible in executors? (is it in HDFS or in the client local FS) /appl/common/ftp/conf.json *From: *KhajaAsmath Mohammed *Date: *Friday, May 14, 2021 at 4:50 PM *To: *"user @spark" *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error /appl/common/ftp/conf.json >>>
Re: [EXTERNAL] Urgent Help - Py Spark submit error
Sorry my bad, it did not resolve the issue. I still have the same issue. can anyone please guide me. I was still running as a client instead of a cluster. On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > You are right. It worked but I still don't understand why I need to pass > that to all executors. > > On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >> I am using json only to read properties before calling spark session. I >> don't know why we need to pass that to all executors. >> >> >> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang >> wrote: >> >>> Could you check whether this file is accessible in executors? (is it in >>> HDFS or in the client local FS) >>> /appl/common/ftp/conf.json >>> >>> >>> >>> >>> >>> *From: *KhajaAsmath Mohammed >>> *Date: *Friday, May 14, 2021 at 4:50 PM >>> *To: *"user @spark" >>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>> >>> >>> >>> /appl/common/ftp/conf.json >>> >>
Re: [EXTERNAL] Urgent Help - Py Spark submit error
You are right. It worked but I still don't understand why I need to pass that to all executors. On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > I am using json only to read properties before calling spark session. I > don't know why we need to pass that to all executors. > > > On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang > wrote: > >> Could you check whether this file is accessible in executors? (is it in >> HDFS or in the client local FS) >> /appl/common/ftp/conf.json >> >> >> >> >> >> *From: *KhajaAsmath Mohammed >> *Date: *Friday, May 14, 2021 at 4:50 PM >> *To: *"user @spark" >> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >> >> >> >> /appl/common/ftp/conf.json >> >
Re: [EXTERNAL] Urgent Help - Py Spark submit error
I am using json only to read properties before calling spark session. I don't know why we need to pass that to all executors. On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang wrote: > Could you check whether this file is accessible in executors? (is it in > HDFS or in the client local FS) > /appl/common/ftp/conf.json > > > > > > *From: *KhajaAsmath Mohammed > *Date: *Friday, May 14, 2021 at 4:50 PM > *To: *"user @spark" > *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error > > > > /appl/common/ftp/conf.json >
Urgent Help - Py Spark submit error
Hi, I am having a weird situation where the below command works when the deploy mode is a client and fails if it is a cluster. spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --driver-memory 70g --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g --py-files /appl/common/ftp/ftp_event_data.py /appl/common/ftp/ftp_event_data.py /appl/common/ftp/conf.json 2021-05-10 7 21/05/14 17:34:39 INFO ApplicationMaster: Waiting for spark context initialization... 21/05/14 17:34:39 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead. 21/05/14 17:34:39 ERROR ApplicationMaster: User application exited with status 1 21/05/14 17:34:39 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1) 21/05/14 17:34:39 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:447) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:275) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:799) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:798) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:798) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:667) 21/05/14 17:34:39 INFO ApplicationMaster: Deleting staging directory hdfs://dev-cbb-datalake/user/nifiuser/.sparkStaging/application_1620318563358_0046 21/05/14 17:34:41 INFO ShutdownHookManager: Shutdown hook called For more detailed output, check the application tracking page: https://srvbigddvlsh115.us.dev.corp:8090/cluster/app/application_1620318563358_0046 Then click on links to logs of each attempt. . Failing the application. Exception in thread "main" org.apache.spark.SparkException: Application application_1620318563358_0046 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603) at org.apache.spark.deploy.SparkSubmit.org $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 21/05/14 17:34:42 INFO util.ShutdownHookManager: Shutdown hook called 21/05/14 17:34:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-28fa7d64-5a1d-42fb-865f-e9bb24854e7c 21/05/14 17:34:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-db93f731-d48a-4a7b-986f-e0a016bbd7f3 Thanks, Asmath
Multiple destination single source
I have a single source of data. The processing of records have to be directed to multiple destinations. i.e 1. read the source data 2. based on condition route to the following sources 1. Kafka for error records 2. store success records with certain condition in s3 bucket, bucket name : "A", folder - "a" 4. store success records with certain condition in s3 bucket, bucket name : "A", folder - "b" 3. store success records with certain condition in a different s3 bucket How can I achieve this in pyspark? Are there any resources on the design patterns or common industry followed architectural patterns for apache spark? | Source| Destination | | --- | --- | | Single | Single| | Multiple | Single| | Single | Multiple | | Multiple | Multiple | -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Thrift2 Server on Kubernetes?
Hi Meikel, If you want to run Spark Thrift Server on Kubernetes, take a look at my blog post: https://itnext.io/hive-on-spark-in-kubernetes-115c8e9fa5c1 Cheers, - Kidong Lee. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Thrift2 Server on Kubernetes?
Hi all, We migrate to k8s and I wonder whether there are already "good practices" to run thrift2 on k8s? Best, Meikel