Strangely enough I found an old issue that is the exact same issue as mine https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343
However I’m using spark 2.4.4 so the issue should have been solved by now. Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3. On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezh...@outlook.com> wrote: > The Thread dump result table of Spark UI can provide some clues to find > out thread locks issue, such as: > > Thread ID | Thread Name | Thread State | Thread Locks > 13 | NonBlockingInputStreamThread | WAITING | Blocked by > Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951}) > 48 | Thread-16 | RUNNABLE | > Monitor(jline.internal.NonBlockingInputStream@103008951}) > > And echo thread row can show the call stacks after being clicked, then you > can check the root cause of holding locks like this(Thread 48 of above): > > org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method) > > org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811) > > org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842) > > org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97) > jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222) > <snip...> > > Hope it can help you. > > -- > Cheers, > -z > > On Thu, 16 Apr 2020 16:36:42 +0900 > Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > > > Do thread dump continuously, per specific period (like 1s) and see the > > change of stack / lock for each thread. (This is not easy to be done in > UI > > so maybe doing manually would be the only option. Not sure Spark UI will > > provide the same, haven't used at all.) > > > > It will tell which thread is being blocked (even it's shown as running) > and > > which point to look at. > > > > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijin...@gmail.com> > wrote: > > > > > Once I do. thread dump, what should I be looking for to tell where it > is > > > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is > > > also being blocked by spark UI. If there are no tasks, is there a > point to > > > do thread dump of executors? > > > > > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi < > gabor.g.somo...@gmail.com> > > > wrote: > > > > > >> The simplest way is to do thread dump which doesn't require any fancy > > >> tool (it's available on Spark UI). > > >> Without thread dump it's hard to say anything... > > >> > > >> > > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe > <janethor...@aol.com.invalid> > > >> wrote: > > >> > > >>> Here a is another tool I use Logic Analyser 7:55 > > >>> https://youtu.be/LnzuMJLZRdU > > >>> > > >>> you could take some suggestions for improving performance queries. > > >>> > https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1 > > >>> > > >>> > > >>> Jane thorpe > > >>> janethor...@aol.com > > >>> > > >>> > > >>> -----Original Message----- > > >>> From: jane thorpe <janethor...@aol.com.INVALID> > > >>> To: janethorpe1 <janethor...@aol.com>; mich.talebzadeh < > > >>> mich.talebza...@gmail.com>; liruijing09 <liruijin...@gmail.com>; > user < > > >>> user@spark.apache.org> > > >>> Sent: Mon, 13 Apr 2020 8:32 > > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing > Removing > > >>> Guess work from trouble shooting > > >>> > > >>> > > >>> > > >>> This tool may be useful for you to trouble shoot your problems away. > > >>> > > >>> > > >>> > https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html > > >>> > > >>> > > >>> "APM tools typically use a waterfall-type view to show the blocking > > >>> time of different components cascading through the control flow > within an > > >>> application. > > >>> These types of visualizations are useful, and AppOptics has them, but > > >>> they can be difficult to understand for those of us without a PhD." > > >>> > > >>> Especially helpful if you want to understand through visualisation > and > > >>> you do not have a phD. > > >>> > > >>> > > >>> Jane thorpe > > >>> janethor...@aol.com > > >>> > > >>> > > >>> -----Original Message----- > > >>> From: jane thorpe <janethor...@aol.com.INVALID> > > >>> To: mich.talebzadeh <mich.talebza...@gmail.com>; liruijing09 < > > >>> liruijin...@gmail.com>; user <user@spark.apache.org> > > >>> CC: user <user@spark.apache.org> > > >>> Sent: Sun, 12 Apr 2020 4:35 > > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing > > >>> > > >>> You seem to be implying the error is intermittent. > > >>> You seem to be implying data is being ingested via JDBC. So the > > >>> connection has proven itself to be working unless no data is > arriving from > > >>> the JDBC channel at all. If no data is arriving then one could say > it > > >>> could be the JDBC. > > >>> If the error is intermittent then it is likely a resource involved > in > > >>> processing is filling to capacity. > > >>> Try reducing the data ingestion volume and see if that completes, > then > > >>> increase the data ingested incrementally. > > >>> I assume you have run the job on small amount of data so you have > > >>> completed your prototype stage successfully. > > >>> > > >>> ------------------------------ > > >>> On Saturday, 11 April 2020 Mich Talebzadeh < > mich.talebza...@gmail.com> > > >>> wrote: > > >>> Hi, > > >>> > > >>> Have you checked your JDBC connections from Spark to Oracle. What is > > >>> Oracle saying? Is it doing anything or hanging? > > >>> > > >>> set pagesize 9999 > > >>> set linesize 140 > > >>> set heading off > > >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, > > >>> 'MON DD YYYY HH:MI AM') from v$database; > > >>> set heading on > > >>> column spid heading "OS PID" format a6 > > >>> column process format a13 heading "Client ProcID" > > >>> column username format a15 > > >>> column sid format 999 > > >>> column serial# format 99999 > > >>> column STATUS format a3 HEADING 'ACT' > > >>> column last format 9,999.99 > > >>> column TotGets format 999,999,999,999 HEADING 'Logical I/O' > > >>> column phyRds format 999,999,999 HEADING 'Physical I/O' > > >>> column total_memory format 999,999,999 HEADING 'MEM/KB' > > >>> -- > > >>> SELECT > > >>> substr(a.username,1,15) "LOGIN" > > >>> , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS > > >>> "SID/serial#" > > >>> , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE" > > >>> , substr(a.machine,1,10) HOST > > >>> , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID" > > >>> , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client > PID" > > >>> , substr(a.program,1,15) PROGRAM > > >>> --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours" > > >>> , ( > > >>> select round(sum(ss.value)/1024) from v$sesstat ss, > > >>> v$statname sn > > >>> where ss.sid = a.sid and > > >>> sn.statistic# = ss.statistic# and > > >>> -- sn.name in ('session pga memory') > > >>> sn.name in ('session pga memory','session > uga > > >>> memory') > > >>> ) AS total_memory > > >>> , (b.block_gets + b.consistent_gets) TotGets > > >>> , b.physical_reads phyRds > > >>> , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS > > >>> , CASE WHEN a.sid in (select sid from v$mystat where rownum > = 1) > > >>> THEN '<-- YOU' ELSE ' ' END "INFO" > > >>> FROM > > >>> v$process p > > >>> ,v$session a > > >>> ,v$sess_io b > > >>> WHERE > > >>> a.paddr = p.addr > > >>> AND p.background IS NULL > > >>> --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) > > >>> AND a.sid = b.sid > > >>> AND a.username is not null > > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') > > >>> --AND CURRENT_DATE - logon_time > 0 > > >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) -- > > >>> exclude me > > >>> --AND (b.block_gets + b.consistent_gets) > 0 > > >>> ORDER BY a.username; > > >>> exit > > >>> > > >>> HTH > > >>> > > >>> Dr Mich Talebzadeh > > >>> > > >>> LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > >>> < > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > >* > > >>> > > >>> http://talebzadehmich.wordpress.com > > >>> > > >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for > > >>> any loss, damage or destruction of data or any other property which > may > > >>> arise from relying on this email's technical content is explicitly > > >>> disclaimed. The author will in no case be liable for any monetary > damages > > >>> arising from such loss, damage or destruction. > > >>> > > >>> > > >>> > > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijin...@gmail.com> > wrote: > > >>> > > >>> Hi all, > > >>> > > >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster > mode on > > >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. > I am > > >>> seeing a strange issue where spark just hangs and does nothing, not > > >>> starting any new tasks. Normally this job finishes in 30 stages but > > >>> sometimes it stops at 29 completed stages and doesn’t start the last > stage. > > >>> The spark job is idling and there is no pending or active task. What > could > > >>> be the problem? Thanks. > > >>> -- > > >>> Cheers, > > >>> Ruijing Li > > >>> > > >>> -- > > > Cheers, > > > Ruijing Li > > > > -- Cheers, Ruijing Li