unsubscribe

2021-01-22 Thread Michal Sankot




-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Bridging gap between Spark UI and Code

2020-07-21 Thread Michal Sankot
And to be clear. Yes, execution plans show what exactly it's doing. The 
problem is that it's unclear how it's related to the actual Scala/Python 
code.


On 7/21/20 15:45, Michal Sankot wrote:
Yes, the problem is that DAGs only refer to code line (action) that 
inovked it. It doesn't provide information about how individual 
transformations link to the code.


So you can have dozen of stages, each with the same code line which 
invoked it, doing different stuff. And then we guess what it's 
actually doing.



On 7/21/20 15:36, Russell Spitzer wrote:
Have you looked in the DAG visualization? Each block refer to the 
code line invoking it.


For Dataframes the execution plan will let you know explicitly which 
operations are in which stages.


On Tue, Jul 21, 2020, 8:18 AM Michal Sankot 
 wrote:


Hi,
when I analyze and debug our Spark batch jobs executions it's a
pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the
actual
Scala code that we write and how much time they take. Would there
be a
way to somehow instruct compiler or something and get this
information
into Spark UI?

At the moment linking Spark UI elements with our code is a guess
work
driven by adding and removing lines of code and reruning the job,
which
is tedious. A possibility to make our life easier e.g. by running
Spark
jobs in dedicated debug mode where this information would be
available
would be greatly appreciated. (Though I don't know whether it's
possible
at all).

Thanks,
Michal

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>



--
<https://www.voxnest.com>
<https://twitter.com/voxnest><https://www.facebook.com/voxnest/><https://www.instagram.com/voxnest><https://www.linkedin.com/company/voxnest/>
MichalSankot
BigData Engineer
E: michal.san...@voxnest.com <mailto:michal.san...@voxnest.com>


Re: Bridging gap between Spark UI and Code

2020-07-21 Thread Michal Sankot
Yes, the problem is that DAGs only refer to code line (action) that 
inovked it. It doesn't provide information about how individual 
transformations link to the code.


So you can have dozen of stages, each with the same code line which 
invoked it, doing different stuff. And then we guess what it's actually 
doing.



On 7/21/20 15:36, Russell Spitzer wrote:
Have you looked in the DAG visualization? Each block refer to the code 
line invoking it.


For Dataframes the execution plan will let you know explicitly which 
operations are in which stages.


On Tue, Jul 21, 2020, 8:18 AM Michal Sankot 
 wrote:


Hi,
when I analyze and debug our Spark batch jobs executions it's a
pain to
find out how blocks in Spark UI Jobs/SQL tab correspond to the actual
Scala code that we write and how much time they take. Would there
be a
way to somehow instruct compiler or something and get this
information
into Spark UI?

At the moment linking Spark UI elements with our code is a guess work
driven by adding and removing lines of code and reruning the job,
which
is tedious. A possibility to make our life easier e.g. by running
Spark
jobs in dedicated debug mode where this information would be
available
would be greatly appreciated. (Though I don't know whether it's
possible
at all).

Thanks,
Michal

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>



Bridging gap between Spark UI and Code

2020-07-21 Thread Michal Sankot

Hi,
when I analyze and debug our Spark batch jobs executions it's a pain to 
find out how blocks in Spark UI Jobs/SQL tab correspond to the actual 
Scala code that we write and how much time they take. Would there be a 
way to somehow instruct compiler or something and get this information 
into Spark UI?


At the moment linking Spark UI elements with our code is a guess work 
driven by adding and removing lines of code and reruning the job, which 
is tedious. A possibility to make our life easier e.g. by running Spark 
jobs in dedicated debug mode where this information would be available 
would be greatly appreciated. (Though I don't know whether it's possible 
at all).


Thanks,
Michal

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Strange WholeStageCodegen UI values

2020-07-10 Thread Michal Sankot
Hey guys,
Thanks for insights.

Bobby, I see that it guesses those values from run time of the whole task.
But as the whole task took 6.6 minutes, how can it come up with 7.27 hours?

Sean, yes there is a data skew. One task taking tens of minutes while other
take tens of seconds. What gave it away in the chart? Wild differences
between min/med/max values of WholeStageCodegen?

Cheers,
Michal


Dne čt 9. 7. 2020 23:47 uživatel Sean Owen  napsal:

> It sounds like you have huge data skew?
>
> On Thu, Jul 9, 2020 at 4:15 PM Bobby Evans  wrote:
> >
> > Sadly there isn't a lot you can do to fix this.  All of the operations
> take iterators of rows as input and produce iterators of rows as output.
> For efficiency reasons, the timing is not done for each individual row. If
> we did that in many cases it would take longer to measure how long
> something took then it would to just do the operation. So most operators
> actually end up measuring the lifetime of the operator which often is the
> time of the entire task minus how long it took for the first task to get to
> that operator. This is also true of WholeStageCodeGen.
> >
> > On Thu, Jul 9, 2020 at 11:55 AM Michal Sankot <
> michal.san...@spreaker.com.invalid> wrote:
> >>
> >> Hi,
> >> I'm checking execution of SQL queries in Spark UI, trying to find a
> >> bottleneck and values that are displayed in WholeStageCodegen blocks are
> >> confusing.
> >>
> >> In attached example whole query took 6.6 minutes and upper left
> >> WholeStageCodegen block says that median value is 7.8 minutes and
> >> maximum 7.27h :O
> >>
> >> What does it mean? Do those number have any real meaning? Is there a way
> >> to find out how long individual blocks really took?
> >>
> >> Thanks,
> >> Michal
> >> Spark 2.4.4 on AWS EMR
> >>
> >> -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


[jira] [Closed] (LIVY-712) EMR 5.23/5.27 - Livy does not recognise that Spark job failed

2019-11-28 Thread Michal Sankot (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michal Sankot closed LIVY-712.
--
Resolution: Workaround

Issue seems to be external - concretely a problem in AWS EMR customizations of 
hadoop libraries.

> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -
>
> Key: LIVY-712
> URL: https://issues.apache.org/jira/browse/LIVY-712
> Project: Livy
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.5.0, 0.6.0
> Environment: AWS EMR 5.23/5.27, Scala
>    Reporter: Michal Sankot
>Priority: Major
>  Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 -> 
> 2.4.0) and an issue appears that when there is an exception thrown during 
> Spark job execution, Spark shuts down as if there was no problem and job 
> appears as Completed in EMR. So we're not notified when system crashes. The 
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
>  And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-712) EMR 5.23/5.27 - Livy does not recognise that Spark job failed

2019-11-28 Thread Michal Sankot (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984437#comment-16984437
 ] 

Michal Sankot commented on LIVY-712:


It seems that issue was present in EMR 5.23/5.27 (hadoop libraries 
2.8.5-amzn-3/2.8.5-amzn-4) and is not present in EMR 5.28 anynore (hadoop 
libraries 2.8.5-amzn-5).

I'm thus closing the issue.

> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -
>
> Key: LIVY-712
> URL: https://issues.apache.org/jira/browse/LIVY-712
> Project: Livy
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.5.0, 0.6.0
> Environment: AWS EMR 5.23/5.27, Scala
>    Reporter: Michal Sankot
>Priority: Major
>  Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 -> 
> 2.4.0) and an issue appears that when there is an exception thrown during 
> Spark job execution, Spark shuts down as if there was no problem and job 
> appears as Completed in EMR. So we're not notified when system crashes. The 
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
>  And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-712) EMR 5.23/5.27 - Livy does not recognise that Spark job failed

2019-11-26 Thread Michal Sankot (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982583#comment-16982583
 ] 

Michal Sankot commented on LIVY-712:


After further investigation, it seems that problem is unrelated to Livy.

And that it's instead an issue of AWS-custom hadoop-* libraries used in EMR. 
I'll keep you posted with results of further investigations.

> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -
>
> Key: LIVY-712
> URL: https://issues.apache.org/jira/browse/LIVY-712
> Project: Livy
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.5.0, 0.6.0
> Environment: AWS EMR 5.23/5.27, Scala
>    Reporter: Michal Sankot
>Priority: Major
>  Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 -> 
> 2.4.0) and an issue appears that when there is an exception thrown during 
> Spark job execution, Spark shuts down as if there was no problem and job 
> appears as Completed in EMR. So we're not notified when system crashes. The 
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
>  And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (LIVY-712) EMR 5.23/5.27 - Livy does not recognise that Spark job failed

2019-11-26 Thread Michal Sankot (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982583#comment-16982583
 ] 

Michal Sankot edited comment on LIVY-712 at 11/26/19 3:15 PM:
--

After further investigation, it seems that problem is unrelated to Livy.

It seems to be related to an issue of AWS-custom hadoop-* libraries used in 
EMR. I'll keep you posted with results of further investigations.


was (Author: sankotm):
After further investigation, it seems that problem is unrelated to Livy.

And that it's instead an issue of AWS-custom hadoop-* libraries used in EMR. 
I'll keep you posted with results of further investigations.

> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -
>
> Key: LIVY-712
> URL: https://issues.apache.org/jira/browse/LIVY-712
> Project: Livy
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.5.0, 0.6.0
> Environment: AWS EMR 5.23/5.27, Scala
>    Reporter: Michal Sankot
>Priority: Major
>  Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 -> 
> 2.4.0) and an issue appears that when there is an exception thrown during 
> Spark job execution, Spark shuts down as if there was no problem and job 
> appears as Completed in EMR. So we're not notified when system crashes. The 
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
>  And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-712) EMR 5.23/5.27 - Livy does not recognise that Spark job failed

2019-11-25 Thread Michal Sankot (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981471#comment-16981471
 ] 

Michal Sankot commented on LIVY-712:


Sure,
 * create a Scala Spark job that throws NullPointerException (to make sure job 
fails)
 * submit job through Livy to AWS EMR 5.23/5.27
 * check state of job execution through Livy API

Instead of failed state it should state "state": "success".

Are those steps sufficient to reproduce it or would you need more detail?

> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -
>
> Key: LIVY-712
> URL: https://issues.apache.org/jira/browse/LIVY-712
> Project: Livy
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.5.0, 0.6.0
> Environment: AWS EMR 5.23/5.27, Scala
>Reporter: Michal Sankot
>Priority: Major
>  Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 -> 
> 2.4.0) and an issue appears that when there is an exception thrown during 
> Spark job execution, Spark shuts down as if there was no problem and job 
> appears as Completed in EMR. So we're not notified when system crashes. The 
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
>  And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[google-appengine] Can Google Apps for Work account be closed as Custom domain SSL support is natively in GAE now?

2015-09-24 Thread Michal Sankot


Hey,

Some time ago we've setup custom domain SSL access to our GAE application 
through Google Apps for Work account as it was the only way to do it.


Now Google has implemented Custom domains SSL directly in Google Cloud 
Platform (Announcement 
). 


Is it safe to close down original Google Apps for Work account now and will 
Custom domain SSL access to our GAE application still work?


thanks,

Michal

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/e28d22d1-87d1-4882-b0fd-658d5f2b4f52%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Java automatic scaling min-idle-instances not working?

2014-09-17 Thread Michal Sankot


I have Java GAE app with modules. Default front-end module is marked as

automatic-scaling
  min-idle-instances1/min-idle-instances/automatic-scaling

however when I check chart of instances for last 24 hours, I see that there 
is a period where no instance was running. I would expect that 
min-idle-instances sets a minimal number of running instances.

Is min-idle-instances not working ? Or is instance chart not working ? (By 
instance chart I mean chart accessible from Dashboad). Or do I get the 
concept of min-idle-instances wrong ?

Current GAE version is 1.9.11

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


SessionScoped not working on GAE+Jersey+Guice

2014-06-12 Thread Michal Sankot
Hey,
I have a GAE Java project where I use Jersey (1.17) and Guice (3.0). 
SessionScoped beans work in local dev, but don't work when deployed on GAE. 
The problem is that they don't keep session state.

Sessions are enabled in web.xml:  sessions-enabledtrue/sessions-enabled

My Session bean (SessionService) is:

@SessionScoped
public class SessionService implements Serializable {
@Inject transient Logger log;
private Locale locale = Locale.US;
public synchronized Locale getLocale() { return locale; }
public synchronized void setLocale(Locale locale) { this.locale = 
locale; }
}

and it's bound to Session scope in ServletModule : 
bind(SessionService.class).in(ServletScopes.SESSION);

Controller where I use it is:

@Path(/settings)  
public class SettingsController {
@Inject SessionService sessionService;

@GET
@Path(/setLocale)
public Object setLocale(@QueryParam(languageTag) String languageTag) {
sessionService.setLocale(Locale.forLanguageTag(languageTag));
return OK;
}

@GET
@Path(/getLocale)
public Object getLocale() { return 
sessionService.getLocale().getLanguage(); }
}

With local dev server it works fine. When deployed on GAE (1.9.5) it sets 
the locale the first time and then it stays the same forever even though I 
call setLocale again and again. Why does it not work ?

Strangely enough, I found an obscure way to make it work, but I don't know 
why it makes it work. To have it running, it's necessary to touch 
HttpSession before setting locale. Like 
request.getSession(true).setAttribute(whatever, bar). As if server 
needed to be recalled that SessionService wants to do something with 
Session. Why is that?

Cheers,
Michal

-- 
You received this message because you are subscribed to the Google Groups 
google-guice group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-guice+unsubscr...@googlegroups.com.
To post to this group, send email to google-guice@googlegroups.com.
Visit this group at http://groups.google.com/group/google-guice.
For more options, visit https://groups.google.com/d/optout.


Re: problem with SAX - repetitive lines returned by characters()

2004-01-19 Thread Michal Sankot
ok,
I tried to isolate problematic part of code and found the culprit. I was
using two different buffers for saving incomming data, where the forst one
was saving data returned by characters and second appended sone other text
to the contents of first one, after parsing finished. Problem was that I
appended content of first buff to the second every(!) time characters were
called. Thus when there was more of contents, it was called more times and
caused repetition described in previous mails. It

Thanks,
Michal

 That's not impossible, but it seems unlikely unless you're perhaps
 using a very old version of Xerces. It seems a lot more likely
 there's a bug in your code. If you can provide a complete test case
 that demonstrates the alleged bug (the simpler the better) then I'd
 be more inclined to believe the bug is in Xerces, and perhaps someone
 could offer a fix.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



problem with SAX - repetitive lines returned by characters()

2004-01-16 Thread Michal Sankot
Hi,
I have problem with SAX bit of Xerces. I use SAX to get lines of an element
of specified tag and print them out.
I was using older version of Xerces with which it run fine. When I replaced
old xerces.jar with new xercesImpl.jar SAX starts to behave wierd.

CDATA element content which is 1\n2\n3\n4\n is returned by characters()
method as
1\n
1\n
2\n

1\n

2\n
3\n
1\n
2\n
3\n
4\n
strange (and frustrating), isn't it ?

what can be done with it ?

Michal



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]