Re: Watch "Airbus makes more of the sky with Spark - Jesse Anderson & Hassene Ben Salem" on YouTube

2020-05-03 Thread Fuo Bol
Why did you remove email zahidr1...@gmail.com following this query. ? The query was commercially hostile but based on sound research. The two responses were then sent to a removed email account. The two accounts responded and didn't agree with you. > Note that there are ways to detect

Re: [Structured Streaminig] multiple queries in one application

2020-05-03 Thread lec ssmi
For example, put the generated query into a list and start every one, then use the method awaitTermination() on the last one . Abhisheks 于2020年5月1日周五 上午10:32写道: > I hope you are using the Query object that is returned by the Structured > streaming, right? > Returned object contains a lot of

Re: Watch "Airbus makes more of the sky with Spark - Jesse Anderson & Hassene Ben Salem" on YouTube

2020-05-03 Thread Sean Owen
It was not removed because of this e-mail, but many other spam and in appropriate messages, from this and sock-puppet accounts. This one is IMHO off-topic however. Note that there are ways to detect when someone is signing up a sock puppet account, and mods will ban both if so. On Sun, May 3,

Re: Good idea to do multi-threading in spark job?

2020-05-03 Thread Sean Owen
Spark will by default assume each task needs 1 CPU. On an executor with 16 cores and 16 slots, you'd schedule 16 tasks. If each is using 4 cores, then 64 threads are trying to run. If you're CPU-bound, that could slow things down. But to the extent some of tasks take some time blocking on I/O, it

Watch "Airbus makes more of the sky with Spark - Jesse Anderson & Hassene Ben Salem" on YouTube

2020-05-03 Thread Fuo Bol
@Sean Owen Why did you remove email zahidr1...@gmail.com following this query. ? The two responses were then sent to a removed email account. > -- Forwarded message - > From: > Date: Sat, 25 Apr 2020, 19:40 > Subject: RE: Watch "Airbus makes more of the sky with Spark -

Good idea to do multi-threading in spark job?

2020-05-03 Thread Ruijing Li
Hi all, We have a spark job (spark 2.4.4, hadoop 2.7, scala 2.11.12) where we use semaphores / parallel collections within our spark job. We definitely notice a huge speedup in our job from doing this, but were wondering if this could cause any unintended side effects? Particularly I’m worried

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-03 Thread Jungtaek Lim
Replied inline: On Sun, May 3, 2020 at 6:25 PM Magnus Nilsson wrote: > Thank you, so that would mean spark gets the current latest offset(s) when > the trigger fires and then process all available messages in the topic upto > and including that offset as long as maxOffsetsPerTrigger is the

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-03 Thread Magnus Nilsson
Thank you, so that would mean spark gets the current latest offset(s) when the trigger fires and then process all available messages in the topic upto and including that offset as long as maxOffsetsPerTrigger is the default of None (or large enought to handle all available messages). I think the

Unsubscribe

2020-05-03 Thread Bibudh Lahiri
-- Bibudh Lahiri, PhD Principal Researcher, Artificial Intelligence Accenture Services Pvt Ltd Unitech Infospace, DLF SEZ Sector 21, Gurugram, Haryana 122016