After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160
On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <liruijin...@gmail.com> wrote: > In thread dump, I do see this > - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | > Monitor > - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | > Blocked by Thread(Some(160)) Lock > - SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | > Blocked by Thread(Some(160)) Lock > > Could the fact that 160 has the monitor but is not running be causing a > deadlock preventing the job from finishing? > > I do see my Finalizer and main method are waiting. I don’t see any other > threads from 3rd party libraries or my code in the dump. I do see spark > context cleaner has timed waiting. > > Thanks > > > On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <liruijin...@gmail.com> wrote: > >> Strangely enough I found an old issue that is the exact same issue as >> mine >> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343 >> >> However I’m using spark 2.4.4 so the issue should have been solved by now. >> >> Like the user in the jira issue I am using mesos, but I am reading from >> oracle instead of writing to Cassandra and S3. >> >> >> On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezh...@outlook.com> wrote: >> >>> The Thread dump result table of Spark UI can provide some clues to find >>> out thread locks issue, such as: >>> >>> Thread ID | Thread Name | Thread State | Thread Locks >>> 13 | NonBlockingInputStreamThread | WAITING | Blocked by >>> Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951}) >>> 48 | Thread-16 | RUNNABLE | >>> Monitor(jline.internal.NonBlockingInputStream@103008951}) >>> >>> And echo thread row can show the call stacks after being clicked, then >>> you can check the root cause of holding locks like this(Thread 48 of above): >>> >>> org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method) >>> >>> org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811) >>> >>> org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842) >>> >>> org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97) >>> jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222) >>> <snip...> >>> >>> Hope it can help you. >>> >>> -- >>> Cheers, >>> -z >>> >>> On Thu, 16 Apr 2020 16:36:42 +0900 >>> Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: >>> >>> > Do thread dump continuously, per specific period (like 1s) and see the >>> > change of stack / lock for each thread. (This is not easy to be done >>> in UI >>> > so maybe doing manually would be the only option. Not sure Spark UI >>> will >>> > provide the same, haven't used at all.) >>> > >>> > It will tell which thread is being blocked (even it's shown as >>> running) and >>> > which point to look at. >>> > >>> > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijin...@gmail.com> >>> wrote: >>> > >>> > > Once I do. thread dump, what should I be looking for to tell where >>> it is >>> > > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver >>> is >>> > > also being blocked by spark UI. If there are no tasks, is there a >>> point to >>> > > do thread dump of executors? >>> > > >>> > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi < >>> gabor.g.somo...@gmail.com> >>> > > wrote: >>> > > >>> > >> The simplest way is to do thread dump which doesn't require any >>> fancy >>> > >> tool (it's available on Spark UI). >>> > >> Without thread dump it's hard to say anything... >>> > >> >>> > >> >>> > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe >>> <janethor...@aol.com.invalid> >>> > >> wrote: >>> > >> >>> > >>> Here a is another tool I use Logic Analyser 7:55 >>> > >>> https://youtu.be/LnzuMJLZRdU >>> > >>> >>> > >>> you could take some suggestions for improving performance queries. >>> > >>> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1 >>> > >>> >>> > >>> >>> > >>> Jane thorpe >>> > >>> janethor...@aol.com >>> > >>> >>> > >>> >>> > >>> -----Original Message----- >>> > >>> From: jane thorpe <janethor...@aol.com.INVALID> >>> > >>> To: janethorpe1 <janethor...@aol.com>; mich.talebzadeh < >>> > >>> mich.talebza...@gmail.com>; liruijing09 <liruijin...@gmail.com>; >>> user < >>> > >>> user@spark.apache.org> >>> > >>> Sent: Mon, 13 Apr 2020 8:32 >>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing >>> Removing >>> > >>> Guess work from trouble shooting >>> > >>> >>> > >>> >>> > >>> >>> > >>> This tool may be useful for you to trouble shoot your problems >>> away. >>> > >>> >>> > >>> >>> > >>> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html >>> > >>> >>> > >>> >>> > >>> "APM tools typically use a waterfall-type view to show the blocking >>> > >>> time of different components cascading through the control flow >>> within an >>> > >>> application. >>> > >>> These types of visualizations are useful, and AppOptics has them, >>> but >>> > >>> they can be difficult to understand for those of us without a PhD." >>> > >>> >>> > >>> Especially helpful if you want to understand through >>> visualisation and >>> > >>> you do not have a phD. >>> > >>> >>> > >>> >>> > >>> Jane thorpe >>> > >>> janethor...@aol.com >>> > >>> >>> > >>> >>> > >>> -----Original Message----- >>> > >>> From: jane thorpe <janethor...@aol.com.INVALID> >>> > >>> To: mich.talebzadeh <mich.talebza...@gmail.com>; liruijing09 < >>> > >>> liruijin...@gmail.com>; user <user@spark.apache.org> >>> > >>> CC: user <user@spark.apache.org> >>> > >>> Sent: Sun, 12 Apr 2020 4:35 >>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing >>> > >>> >>> > >>> You seem to be implying the error is intermittent. >>> > >>> You seem to be implying data is being ingested via JDBC. So the >>> > >>> connection has proven itself to be working unless no data is >>> arriving from >>> > >>> the JDBC channel at all. If no data is arriving then one could >>> say it >>> > >>> could be the JDBC. >>> > >>> If the error is intermittent then it is likely a resource >>> involved in >>> > >>> processing is filling to capacity. >>> > >>> Try reducing the data ingestion volume and see if that completes, >>> then >>> > >>> increase the data ingested incrementally. >>> > >>> I assume you have run the job on small amount of data so you have >>> > >>> completed your prototype stage successfully. >>> > >>> >>> > >>> ------------------------------ >>> > >>> On Saturday, 11 April 2020 Mich Talebzadeh < >>> mich.talebza...@gmail.com> >>> > >>> wrote: >>> > >>> Hi, >>> > >>> >>> > >>> Have you checked your JDBC connections from Spark to Oracle. What >>> is >>> > >>> Oracle saying? Is it doing anything or hanging? >>> > >>> >>> > >>> set pagesize 9999 >>> > >>> set linesize 140 >>> > >>> set heading off >>> > >>> select SUBSTR(name,1,8) || ' sessions as on >>> '||TO_CHAR(CURRENT_DATE, >>> > >>> 'MON DD YYYY HH:MI AM') from v$database; >>> > >>> set heading on >>> > >>> column spid heading "OS PID" format a6 >>> > >>> column process format a13 heading "Client ProcID" >>> > >>> column username format a15 >>> > >>> column sid format 999 >>> > >>> column serial# format 99999 >>> > >>> column STATUS format a3 HEADING 'ACT' >>> > >>> column last format 9,999.99 >>> > >>> column TotGets format 999,999,999,999 HEADING 'Logical I/O' >>> > >>> column phyRds format 999,999,999 HEADING 'Physical I/O' >>> > >>> column total_memory format 999,999,999 HEADING 'MEM/KB' >>> > >>> -- >>> > >>> SELECT >>> > >>> substr(a.username,1,15) "LOGIN" >>> > >>> , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS >>> > >>> "SID/serial#" >>> > >>> , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE" >>> > >>> , substr(a.machine,1,10) HOST >>> > >>> , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID" >>> > >>> , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client >>> PID" >>> > >>> , substr(a.program,1,15) PROGRAM >>> > >>> --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours" >>> > >>> , ( >>> > >>> select round(sum(ss.value)/1024) from v$sesstat ss, >>> > >>> v$statname sn >>> > >>> where ss.sid = a.sid and >>> > >>> sn.statistic# = ss.statistic# and >>> > >>> -- sn.name in ('session pga memory') >>> > >>> sn.name in ('session pga memory','session >>> uga >>> > >>> memory') >>> > >>> ) AS total_memory >>> > >>> , (b.block_gets + b.consistent_gets) TotGets >>> > >>> , b.physical_reads phyRds >>> > >>> , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS >>> > >>> , CASE WHEN a.sid in (select sid from v$mystat where >>> rownum = 1) >>> > >>> THEN '<-- YOU' ELSE ' ' END "INFO" >>> > >>> FROM >>> > >>> v$process p >>> > >>> ,v$session a >>> > >>> ,v$sess_io b >>> > >>> WHERE >>> > >>> a.paddr = p.addr >>> > >>> AND p.background IS NULL >>> > >>> --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) >>> > >>> AND a.sid = b.sid >>> > >>> AND a.username is not null >>> > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') >>> > >>> --AND CURRENT_DATE - logon_time > 0 >>> > >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- >>> > >>> exclude me >>> > >>> --AND (b.block_gets + b.consistent_gets) > 0 >>> > >>> ORDER BY a.username; >>> > >>> exit >>> > >>> >>> > >>> HTH >>> > >>> >>> > >>> Dr Mich Talebzadeh >>> > >>> >>> > >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> > >>> < >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >* >>> > >>> >>> > >>> http://talebzadehmich.wordpress.com >>> > >>> >>> > >>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>> for >>> > >>> any loss, damage or destruction of data or any other property >>> which may >>> > >>> arise from relying on this email's technical content is explicitly >>> > >>> disclaimed. The author will in no case be liable for any monetary >>> damages >>> > >>> arising from such loss, damage or destruction. >>> > >>> >>> > >>> >>> > >>> >>> > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijin...@gmail.com> >>> wrote: >>> > >>> >>> > >>> Hi all, >>> > >>> >>> > >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster >>> mode on >>> > >>> mesos. I am ingesting from an oracle database using >>> spark.read.jdbc. I am >>> > >>> seeing a strange issue where spark just hangs and does nothing, not >>> > >>> starting any new tasks. Normally this job finishes in 30 stages but >>> > >>> sometimes it stops at 29 completed stages and doesn’t start the >>> last stage. >>> > >>> The spark job is idling and there is no pending or active task. >>> What could >>> > >>> be the problem? Thanks. >>> > >>> -- >>> > >>> Cheers, >>> > >>> Ruijing Li >>> > >>> >>> > >>> -- >>> > > Cheers, >>> > > Ruijing Li >>> > > >>> >> -- >> Cheers, >> Ruijing Li >> > -- > Cheers, > Ruijing Li > -- Cheers, Ruijing Li