Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-25 Thread Nuria Ruiz
Hello: Following up on this issue, We think many of neil's issues come from the fact that a kerberos ticket expires after 24 hours and once it does your spark session would not work anymore. We will be extending expiration of tickets somewhat to 2/3 days but main point to take home is that

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-20 Thread Luca Toscano
Hi Neil, I added the Analytics tag to https://phabricator.wikimedia.org/T245097, and also thanks for filing https://phabricator.wikimedia.org/T245713. We periodically review tasks in our incoming queue, so we should be able to help soon, but it may depend on priorities. Luca Il giorno gio 20

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-19 Thread Neil Shah-Quinn
Another update: I'm continuing to encounter these Spark errors and have trouble recovering from them, even when I use proper settings. I've filed T245713 to discuss this further. The specific errors and behavior I'm seeing (for example, whether

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-19 Thread Neil Shah-Quinn
Bump! Analytics team, I'm eager to have input from y'all about the best Spark settings to use. On Fri, 14 Feb 2020 at 18:30, Neil Shah-Quinn wrote: > I ran into this problem again, and I found that neither session.stop or > newSession got rid of the error. So it's still not clear how to

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-14 Thread Neil Shah-Quinn
I ran into this problem again, and I found that neither session.stop or newSession got rid of the error. So it's still not clear how to recover from a crashed(?) Spark session. On the other hand, I did figure out why my sessions were crashing in the first place, so hopefully recovering from that

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Leila Zia
On Fri, Feb 7, 2020 at 12:45 PM Nuria Ruiz wrote: > > and the verdict (supported by you) was that we should use this list or > the public IRC channel. > Indeed, eh? I suggest we revisit that to send questions to > analytics-internal but if others disagree, I am fine with either. > my 2 cents: I

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Nuria Ruiz
> and the verdict (supported by you) was that we should use this list or the public IRC channel. Indeed, eh? I suggest we revisit that to send questions to analytics-internal but if others disagree, I am fine with either. On Fri, Feb 7, 2020 at 12:17 PM Neil Shah-Quinn wrote: > Good

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Neil Shah-Quinn
Good suggestions, Andrew! I'll try those if I encounter this again. Nuria, we had a discussion about the appropriate places to ask questions about internal systems in October 2018, and the verdict (supported by you) was that we should use this list or the public IRC channel. If you want to

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Nuria Ruiz
Hello, Probably this discussion is not of wide interest to this public list, I suggest to move it to analytics-internal? Thanks, Nuria On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto wrote: > Hm, interesting! I don't think many of us have used > SparkSession.builder.getOrCreate > repeatedly in

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Andrew Otto
Hm, interesting! I don't think many of us have used SparkSession.builder.getOrCreate repeatedly in the same process. What happens if you manually stop the spark session first, (session.stop()

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-06 Thread Neil Shah-Quinn
Hi Luca! Those were separate Yarn jobs I started later. When I got this error, I found that the Yarn job corresponding to the SparkContext was marked as "successful", but I still couldn't get SparkSession.builder.getOrCreate to open a new one. Any idea what might have caused that or how I could

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-05 Thread Luca Toscano
Hey Neil, there were two Yarn jobs running related to your notebooks, I just killed them, let's see if it solves the problem (you might need to restart again your notebook). If not, let's open a task and investigate :) Luca Il giorno gio 6 feb 2020 alle ore 02:08 Neil Shah-Quinn <

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-05 Thread Neil Shah-Quinn
Whoa—I just got the same stopped SparkContext error on the query even after restarting the notebook, without an intermediate Java heap space error. That seems very strange to me. On Wed, 5 Feb 2020 at 16:09, Neil Shah-Quinn wrote: > Hey there! > > I was running SQL queries via PySpark (using

[Analytics] SparkContext stopped and cannot be restarted

2020-02-05 Thread Neil Shah-Quinn
Hey there! I was running SQL queries via PySpark (using the wmfdata package ) on SWAP when one of my queries failed with "java.lang.OutofMemoryError: Java heap space". After that, when I tried to call the spark.sql function again