Wanted to update everyone on this, thanks for all the responses. I was able to solve this issue after doing a jstack dump - I found out this was the cause
https://github.com/scala/bug/issues/10436 Lesson learned - I’ll use a safer json parser like json4s, seems like that one should be able to be thread-safe hopefully. On Fri, Apr 24, 2020 at 4:34 AM Waleed Fateem <waleed.fat...@gmail.com> wrote: > Are you running this in local mode? If not, are you even sure that the > hanging is occurring on the driver's side? > > Did you check the Spark UI to see if there is a straggler task or not? If > you do have a straggler/hanging task, and in case this is not an > application running in local mode then you need to get the Java thread dump > of the executor's JVM process. Once you do, you'll want to review the > "Executor > task launch worker for task XYZ" thread, whee XYZ is some integer value > representing the task ID that was launched on that executor. In case you're > running > this is local mode that thread would be located in the same Java thread > dump that you have already collected. > > > On Tue, Apr 21, 2020 at 9:51 PM Ruijing Li <liruijin...@gmail.com> wrote: > >> I apologize, but I cannot share it, even if it is just typical spark >> libraries. I definitely understand that limits debugging help, but wanted >> to understand if anyone has encountered a similar issue. >> >> On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> If there's no third party libraries in the dump then why not share the >>> thread dump? (I mean, the output of jstack) >>> >>> stack trace would be more helpful to find which thing acquired lock and >>> which other things are waiting for acquiring lock, if we suspect deadlock. >>> >>> On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <liruijin...@gmail.com> >>> wrote: >>> >>>> After refreshing a couple of times, I notice the lock is being swapped >>>> between these 3. The other 2 will be blocked by whoever gets this lock, in >>>> a cycle of 160 has lock -> 161 -> 159 -> 160 >>>> >>>> On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <liruijin...@gmail.com> >>>> wrote: >>>> >>>>> In thread dump, I do see this >>>>> - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | >>>>> Monitor >>>>> - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | >>>>> Blocked by Thread(Some(160)) Lock >>>>> - SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | >>>>> Blocked by Thread(Some(160)) Lock >>>>> >>>>> Could the fact that 160 has the monitor but is not running be causing >>>>> a deadlock preventing the job from finishing? >>>>> >>>>> I do see my Finalizer and main method are waiting. I don’t see any >>>>> other threads from 3rd party libraries or my code in the dump. I do see >>>>> spark context cleaner has timed waiting. >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <liruijin...@gmail.com> >>>>> wrote: >>>>> >>>>>> Strangely enough I found an old issue that is the exact same issue as >>>>>> mine >>>>>> >>>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343 >>>>>> >>>>>> However I’m using spark 2.4.4 so the issue should have been solved by >>>>>> now. >>>>>> >>>>>> Like the user in the jira issue I am using mesos, but I am reading >>>>>> from oracle instead of writing to Cassandra and S3. >>>>>> >>>>>> >>>>>> On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezh...@outlook.com> >>>>>> wrote: >>>>>> >>>>>>> The Thread dump result table of Spark UI can provide some clues to >>>>>>> find out thread locks issue, such as: >>>>>>> >>>>>>> Thread ID | Thread Name | Thread State | Thread >>>>>>> Locks >>>>>>> 13 | NonBlockingInputStreamThread | WAITING | Blocked >>>>>>> by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951 >>>>>>> }) >>>>>>> 48 | Thread-16 | RUNNABLE | >>>>>>> Monitor(jline.internal.NonBlockingInputStream@103008951}) >>>>>>> >>>>>>> And echo thread row can show the call stacks after being clicked, >>>>>>> then you can check the root cause of holding locks like this(Thread 48 >>>>>>> of >>>>>>> above): >>>>>>> >>>>>>> org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native >>>>>>> Method) >>>>>>> >>>>>>> org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811) >>>>>>> >>>>>>> org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842) >>>>>>> >>>>>>> org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97) >>>>>>> jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222) >>>>>>> <snip...> >>>>>>> >>>>>>> Hope it can help you. >>>>>>> >>>>>>> -- >>>>>>> Cheers, >>>>>>> -z >>>>>>> >>>>>>> On Thu, 16 Apr 2020 16:36:42 +0900 >>>>>>> Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: >>>>>>> >>>>>>> > Do thread dump continuously, per specific period (like 1s) and see >>>>>>> the >>>>>>> > change of stack / lock for each thread. (This is not easy to be >>>>>>> done in UI >>>>>>> > so maybe doing manually would be the only option. Not sure Spark >>>>>>> UI will >>>>>>> > provide the same, haven't used at all.) >>>>>>> > >>>>>>> > It will tell which thread is being blocked (even it's shown as >>>>>>> running) and >>>>>>> > which point to look at. >>>>>>> > >>>>>>> > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijin...@gmail.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > > Once I do. thread dump, what should I be looking for to tell >>>>>>> where it is >>>>>>> > > hanging? Seeing a lot of timed_waiting and waiting on driver. >>>>>>> Driver is >>>>>>> > > also being blocked by spark UI. If there are no tasks, is there >>>>>>> a point to >>>>>>> > > do thread dump of executors? >>>>>>> > > >>>>>>> > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi < >>>>>>> gabor.g.somo...@gmail.com> >>>>>>> > > wrote: >>>>>>> > > >>>>>>> > >> The simplest way is to do thread dump which doesn't require any >>>>>>> fancy >>>>>>> > >> tool (it's available on Spark UI). >>>>>>> > >> Without thread dump it's hard to say anything... >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe >>>>>>> <janethor...@aol.com.invalid> >>>>>>> > >> wrote: >>>>>>> > >> >>>>>>> > >>> Here a is another tool I use Logic Analyser 7:55 >>>>>>> > >>> https://youtu.be/LnzuMJLZRdU >>>>>>> > >>> >>>>>>> > >>> you could take some suggestions for improving performance >>>>>>> queries. >>>>>>> > >>> >>>>>>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1 >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> Jane thorpe >>>>>>> > >>> janethor...@aol.com >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> -----Original Message----- >>>>>>> > >>> From: jane thorpe <janethor...@aol.com.INVALID> >>>>>>> > >>> To: janethorpe1 <janethor...@aol.com>; mich.talebzadeh < >>>>>>> > >>> mich.talebza...@gmail.com>; liruijing09 <liruijin...@gmail.com>; >>>>>>> user < >>>>>>> > >>> user@spark.apache.org> >>>>>>> > >>> Sent: Mon, 13 Apr 2020 8:32 >>>>>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does >>>>>>> nothing Removing >>>>>>> > >>> Guess work from trouble shooting >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> This tool may be useful for you to trouble shoot your problems >>>>>>> away. >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> "APM tools typically use a waterfall-type view to show the >>>>>>> blocking >>>>>>> > >>> time of different components cascading through the control >>>>>>> flow within an >>>>>>> > >>> application. >>>>>>> > >>> These types of visualizations are useful, and AppOptics has >>>>>>> them, but >>>>>>> > >>> they can be difficult to understand for those of us without a >>>>>>> PhD." >>>>>>> > >>> >>>>>>> > >>> Especially helpful if you want to understand through >>>>>>> visualisation and >>>>>>> > >>> you do not have a phD. >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> Jane thorpe >>>>>>> > >>> janethor...@aol.com >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> -----Original Message----- >>>>>>> > >>> From: jane thorpe <janethor...@aol.com.INVALID> >>>>>>> > >>> To: mich.talebzadeh <mich.talebza...@gmail.com>; liruijing09 < >>>>>>> > >>> liruijin...@gmail.com>; user <user@spark.apache.org> >>>>>>> > >>> CC: user <user@spark.apache.org> >>>>>>> > >>> Sent: Sun, 12 Apr 2020 4:35 >>>>>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing >>>>>>> > >>> >>>>>>> > >>> You seem to be implying the error is intermittent. >>>>>>> > >>> You seem to be implying data is being ingested via JDBC. So >>>>>>> the >>>>>>> > >>> connection has proven itself to be working unless no data is >>>>>>> arriving from >>>>>>> > >>> the JDBC channel at all. If no data is arriving then one >>>>>>> could say it >>>>>>> > >>> could be the JDBC. >>>>>>> > >>> If the error is intermittent then it is likely a resource >>>>>>> involved in >>>>>>> > >>> processing is filling to capacity. >>>>>>> > >>> Try reducing the data ingestion volume and see if that >>>>>>> completes, then >>>>>>> > >>> increase the data ingested incrementally. >>>>>>> > >>> I assume you have run the job on small amount of data so you >>>>>>> have >>>>>>> > >>> completed your prototype stage successfully. >>>>>>> > >>> >>>>>>> > >>> ------------------------------ >>>>>>> > >>> On Saturday, 11 April 2020 Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> >>>>>>> > >>> wrote: >>>>>>> > >>> Hi, >>>>>>> > >>> >>>>>>> > >>> Have you checked your JDBC connections from Spark to Oracle. >>>>>>> What is >>>>>>> > >>> Oracle saying? Is it doing anything or hanging? >>>>>>> > >>> >>>>>>> > >>> set pagesize 9999 >>>>>>> > >>> set linesize 140 >>>>>>> > >>> set heading off >>>>>>> > >>> select SUBSTR(name,1,8) || ' sessions as on >>>>>>> '||TO_CHAR(CURRENT_DATE, >>>>>>> > >>> 'MON DD YYYY HH:MI AM') from v$database; >>>>>>> > >>> set heading on >>>>>>> > >>> column spid heading "OS PID" format a6 >>>>>>> > >>> column process format a13 heading "Client ProcID" >>>>>>> > >>> column username format a15 >>>>>>> > >>> column sid format 999 >>>>>>> > >>> column serial# format 99999 >>>>>>> > >>> column STATUS format a3 HEADING 'ACT' >>>>>>> > >>> column last format 9,999.99 >>>>>>> > >>> column TotGets format 999,999,999,999 HEADING 'Logical I/O' >>>>>>> > >>> column phyRds format 999,999,999 HEADING 'Physical I/O' >>>>>>> > >>> column total_memory format 999,999,999 HEADING 'MEM/KB' >>>>>>> > >>> -- >>>>>>> > >>> SELECT >>>>>>> > >>> substr(a.username,1,15) "LOGIN" >>>>>>> > >>> , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS >>>>>>> > >>> "SID/serial#" >>>>>>> > >>> , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN >>>>>>> SINCE" >>>>>>> > >>> , substr(a.machine,1,10) HOST >>>>>>> > >>> , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS >>>>>>> PID" >>>>>>> > >>> , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) >>>>>>> "Client PID" >>>>>>> > >>> , substr(a.program,1,15) PROGRAM >>>>>>> > >>> --,ROUND((CURRENT_DATE-a.logon_time)*24) AS >>>>>>> "Logged/Hours" >>>>>>> > >>> , ( >>>>>>> > >>> select round(sum(ss.value)/1024) from >>>>>>> v$sesstat ss, >>>>>>> > >>> v$statname sn >>>>>>> > >>> where ss.sid = a.sid and >>>>>>> > >>> sn.statistic# = ss.statistic# and >>>>>>> > >>> -- sn.name in ('session pga memory') >>>>>>> > >>> sn.name in ('session pga >>>>>>> memory','session uga >>>>>>> > >>> memory') >>>>>>> > >>> ) AS total_memory >>>>>>> > >>> , (b.block_gets + b.consistent_gets) TotGets >>>>>>> > >>> , b.physical_reads phyRds >>>>>>> > >>> , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') >>>>>>> STATUS >>>>>>> > >>> , CASE WHEN a.sid in (select sid from v$mystat where >>>>>>> rownum = 1) >>>>>>> > >>> THEN '<-- YOU' ELSE ' ' END "INFO" >>>>>>> > >>> FROM >>>>>>> > >>> v$process p >>>>>>> > >>> ,v$session a >>>>>>> > >>> ,v$sess_io b >>>>>>> > >>> WHERE >>>>>>> > >>> a.paddr = p.addr >>>>>>> > >>> AND p.background IS NULL >>>>>>> > >>> --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) >>>>>>> > >>> AND a.sid = b.sid >>>>>>> > >>> AND a.username is not null >>>>>>> > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') >>>>>>> > >>> --AND CURRENT_DATE - logon_time > 0 >>>>>>> > >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) >>>>>>> -- >>>>>>> > >>> exclude me >>>>>>> > >>> --AND (b.block_gets + b.consistent_gets) > 0 >>>>>>> > >>> ORDER BY a.username; >>>>>>> > >>> exit >>>>>>> > >>> >>>>>>> > >>> HTH >>>>>>> > >>> >>>>>>> > >>> Dr Mich Talebzadeh >>>>>>> > >>> >>>>>>> > >>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> > >>> < >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> >* >>>>>>> > >>> >>>>>>> > >>> http://talebzadehmich.wordpress.com >>>>>>> > >>> >>>>>>> > >>> *Disclaimer:* Use it at your own risk. Any and all >>>>>>> responsibility for >>>>>>> > >>> any loss, damage or destruction of data or any other property >>>>>>> which may >>>>>>> > >>> arise from relying on this email's technical content is >>>>>>> explicitly >>>>>>> > >>> disclaimed. The author will in no case be liable for any >>>>>>> monetary damages >>>>>>> > >>> arising from such loss, damage or destruction. >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li < >>>>>>> liruijin...@gmail.com> wrote: >>>>>>> > >>> >>>>>>> > >>> Hi all, >>>>>>> > >>> >>>>>>> > >>> I am on spark 2.4.4 and using scala 2.11.12, and running >>>>>>> cluster mode on >>>>>>> > >>> mesos. I am ingesting from an oracle database using >>>>>>> spark.read.jdbc. I am >>>>>>> > >>> seeing a strange issue where spark just hangs and does >>>>>>> nothing, not >>>>>>> > >>> starting any new tasks. Normally this job finishes in 30 >>>>>>> stages but >>>>>>> > >>> sometimes it stops at 29 completed stages and doesn’t start >>>>>>> the last stage. >>>>>>> > >>> The spark job is idling and there is no pending or active >>>>>>> task. What could >>>>>>> > >>> be the problem? Thanks. >>>>>>> > >>> -- >>>>>>> > >>> Cheers, >>>>>>> > >>> Ruijing Li >>>>>>> > >>> >>>>>>> > >>> -- >>>>>>> > > Cheers, >>>>>>> > > Ruijing Li >>>>>>> > > >>>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> Ruijing Li >>>>>> >>>>> -- >>>>> Cheers, >>>>> Ruijing Li >>>>> >>>> -- >>>> Cheers, >>>> Ruijing Li >>>> >>> -- >> Cheers, >> Ruijing Li >> > -- Cheers, Ruijing Li