Re: Profiler and Cost Functions

Kaustubh Beedkar Sun, 04 Jun 2023 22:00:53 -0700

Dear Josep,

Let me try to answer these. Please see my response inline below



On Wed, May 31, 2023 at 4:37 PM Josep Sampe Domenech
<[email protected]> wrote:

> Thanks Jorge, this helps a lot to clarify the points related to the
> Genetic Optimizer.
>
>
> I have a few additional questions on the subject matter:
>
>
>   1.  Regarding The platforms: Do you consider adding support for using
> multiple Spark or Postgres instances simultaneously? I noticed there is a
> branch on GitHub dedicated to this purpose, specifically implemented for
> Spark. I'm curious to know if this is just a proof of concept or if it's
> something you plan to incorporate in the future.
>
> In theory, Wayang can support multiple instances of the same platform.
However, this would require a unique identifier for each platform and
subsequent changes. This is very much in our scheme of things for the near
future.


>
>   1.  Regarding the operators: In the Postgres platform, I can see the
> Executor, Filter, Projection, and TableSource operators. Currently, when I
> read two tables from Postgres and perform a JOIN operation, it appears that
> the JOIN is executed locally within the Wayang environment using the Java
> streams platform, rather than running the JOIN operation directly within
> Postgres itself. Is it because the Join operator in Postgres has not been
> implemented yet? Or is it because, based on the cost functions, it is
> considered more cost-effective to execute the JOIN locally? Or am I missing
> something?
>
In this case, the join operator is not yet implemented. We are in the
process of supporting join pushdowns as a part of Wayang SQL API.

>
>
>
>   1.  Regarding the cost functions: To clarify some things related to the
> section 4 of the paper: are you considering by default the cost of moving
> data between platforms? Is the cost of moving data between platforms taken
> into account in the conversion operators, like the SqlToStreamOperator?  If
> so, Should I add a custom cost-function template in the “network” key of
> the wayang.postgres.sqltostream.load.output.template to take this data
> movement into account? Or the data transfer cost between platforms is
> considered in a different place and I should do it in a different way?
>
I am not 100% sure about this but
https://github.com/apache/incubator-wayang/blob/80170b543469172438bb603dd6b5fbb2bd5dae64/wayang-commons/wayang-core/src/main/java/org/apache/wayang/core/optimizer/channels/DefaultChannelConversion.java#L181
could be a pointer.

Best,
Kaustubh




>
> Thanks & best regards,
> Josep
>
>
> From: Jorge Arnulfo Quiané Ruiz <[email protected]>
> Date: Friday, 26 May 2023 at 11:55
> To: [email protected] <[email protected]>
> Subject: [EXTERNAL] Re: Profiler and Cost Functions
> Hello Josep,
>
> Replying with a bit of delay because I have been travelling this week :)
>
> Regarding your second point, we basically have two ways of learning the
> cost parameters of the execution operators: by analysing execution logs
> (using the genetic optimizer) or by profiling individual operators. The
> package you refer to is for the latter (profiling individual execution
> operators). This was our original idea to get the cost parameters but we
> quickly found out that this was going to be very off from the real costs
> because most big data platforms exploit operator pipelining which makes it
> hard to profile individually. So, you cannot use the output of this
> individual profiler for the genetic algorithm.
>
> So, let us now discuss your first point which is regarding the Genetic
> Optimizer. So this was our solution to tackle the problem of the individual
> operator profiling approach. The genetic optimizer, instead, tries to get
> the operator costs by analysing execution logs. For this, it requires both
> a cost function template per execution operator (which should be specified
> in a json format:
> https://github.com/apache/incubator-wayang/blob/80170b543469172438bb603dd6b5fbb2bd5dae64/wayang-platforms/wayang-spark/code/main/resources/wayang-spark-defaults.properties
> ) and wayang execution logs (i.e. running jobs via Wayang). The genetic
> optimizer will learn the coefficients (denoted by ? In the template
> function). To actually understand how it does so, our VLDBJ paper (also in
> Arxiv) gives a bit more details about and a pointer for the genetic
> optimization we use:
> https://arxiv.org/pdf/1805.03533.pdfSection  3.2 and Figure 4.
>
> Let us know if that helps.
>
> Best,
> Jorge
>
> > On 24 May 2023, at 11.12, Josep Sampe Domenech <
> [email protected]> wrote:
> >
> > Hello dev,
> >
> >
> >
> > We recently started our exploration of the Wayang project and we would
> like to gain a deeper understanding of the profiler tool and its
> functionalities, specifically about the collection and use of metrics.
> >
> >
> >
> > To enhance our comprehension, we would appreciate your assistance in
> addressing the following queries:
> >
> >
> >
> >  1.  Could you please provide us with an explanation of how the
> GeneticOptimizerApp works? Specifically, we would like to understand which
> information from the executions.json file is taken into consideration when
> calculating the "?" parameters in the cost functions. Additionally, we are
> interested in learning more about the methodology employed to calculate the
> "?" values.
> >
> >
> >
> >  1.  We are also curious about the purpose of the profiler.spark
> package. What is the purpose of this package? Does it serve a specific
> objective?, and can the results obtained from this profiler.spark be
> utilized or integrated into the GeneticOptimizerApp?
> >
> >
> >
> >
> >
> > Thank you in advance for your time and attention. We look forward to
> your response.
> >
> >
> >
> > Best regards,
> >
> > Josep
> >
>

Re: Profiler and Cost Functions

Reply via email to