Re: How to handle auto-restart in Kubernetes Spark application

2021-05-02 Thread Ali Gouta
Hello, Better to ask your question on the spark operator github and not on this mailing list. For the answer, try: type: Always Best regards, Ali Gouta. On Sun, May 2, 2021 at 6:15 PM Sachit Murarka wrote: > Hi All, > > I am using Spark with Kubernetes, Can anyone please tell me

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Ali Gouta
Thanks Mich ! Ali Gouta. On Sun, Apr 4, 2021 at 6:44 PM Mich Talebzadeh wrote: > Hi Ali, > > The old saying of one experiment is worth a hundred hypotheses, still > stands. > > As per Test driven approach have a go at it and see what comes out. Forum > members including

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Ali Gouta
Great, so SSS provides also an api that allows handling RDDs through dataFrames using foreachBatch. Still that I am not sure this is a good practice in general right ? Well, it depends on the use case in any way. Thank you so much for the hints ! Best regards, Ali Gouta. On Sun, Apr 4, 2021

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Ali Gouta
Thank you guys for your answers, I will dig more this new way of doing things and why not consider leaving the old Dstreams and use instead structured streaming. Hope that strucrured streaming + spark on Kubernetes works well and the combination is production ready. Best regards, Ali Gouta. Le

Spark structured streaming + offset management in kafka + kafka headers

2021-04-03 Thread Ali Gouta
ark structured streaming to the concerned consumer group ? Best regards, Ali Gouta.

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread Ali Gouta
think this is the simplest way to achieve what you want to do. Best regards, Ali Gouta. On Tue, Mar 9, 2021 at 11:30 AM forece85 wrote: > We are doing batch processing using Spark Streaming with Kinesis with a > batch > size of 5 mins. We want to send all events with same eventI

Re: Spark Streaming Memory

2020-05-17 Thread Ali Gouta
The spark UI is misleading in spark 2.4.4. I moved to spark 2.4.5 and it fixed it. Now, your problem should be somewhere else. Probably related to memory consumption but not the one you see in the UI. Best regards, Ali Gouta. On Sun, May 17, 2020 at 7:36 PM András Kolbert wrote: > Hi, &g

Re: spark on k8s - can driver and executor have separate checkpoint location?

2020-05-16 Thread Ali Gouta
. Then have pod anti-affinity to make sure they are not running on the same node. You may achieve this by running an NFS fiesystem and then create a PV/PVC that mounts to that shared file system. The persistentVolumeClaim defined in your Yaml should call the PVC you created. Best regards, Ali Gouta

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Ali Gouta
about it. Ali Gouta. Le 3 févr. 2017 22:24, "Jacek Laskowski" <ja...@japila.pl> a écrit : Hi, An interesting case. You don't use Spark resources whatsoever. Creating a SparkConf does not use YARN...yet. I think any run mode would have the same effect. So, although spark-submit cou

Re: Copying all Hive tables from Prod to UAT

2016-04-08 Thread Ali Gouta
For hive, you may use sqoop to achieve this. In my opinion, you may also run a spark job to make it.. Le 9 avr. 2016 00:25, "Ashok Kumar" a écrit : Hi, Anyone has suggestions how to create and copy Hive and Spark tables from Production to UAT. One way would be to

Re: Spark Streaming - print accumulators value every period as logs

2015-12-25 Thread Ali Gouta
Something like Stream.foreachRdd(rdd=> rdd.collect.foreach(print accum)) Should answer your question. You get things printed in Each batch interval Ali Gouta Le 25 déc. 2015 04:22, "Roberto Coluccio" <roberto.coluc...@gmail.com> a écrit : > Hello, > > I have a

Re: How do I link JavaEsSpark.saveToEs() to a sparkConf?

2015-12-14 Thread Ali Gouta
eRDD, "foo"); That's it... At last, be carreful while defining your sets of your "conf". For instance you may end-up changing the localhost by the real IP adresse of your Elasticsearch node... Ali Gouta. On Mon, Dec 14, 2015 at 1:52 PM, Spark Enthusiast <sparkenthusi...@yaho

Re: Replaying an RDD in spark streaming to update an accumulator

2015-12-10 Thread Ali Gouta
Indeed, you are right! I felt like I was missing or misunderstanding something. Thank you so much! Ali Gouta. On Thu, Dec 10, 2015 at 10:04 PM, Cody Koeninger <c...@koeninger.org> wrote: > I'm a little confused as to why you have fake events rather than just > doin