Hello Josep,

Replying with a bit of delay because I have been travelling this week :)

Regarding your second point, we basically have two ways of learning the cost 
parameters of the execution operators: by analysing execution logs (using the 
genetic optimizer) or by profiling individual operators. The package you refer 
to is for the latter (profiling individual execution operators). This was our 
original idea to get the cost parameters but we quickly found out that this was 
going to be very off from the real costs because most big data platforms 
exploit operator pipelining which makes it hard to profile individually. So, 
you cannot use the output of this individual profiler for the genetic algorithm.

So, let us now discuss your first point which is regarding the Genetic 
Optimizer. So this was our solution to tackle the problem of the individual 
operator profiling approach. The genetic optimizer, instead, tries to get the 
operator costs by analysing execution logs. For this, it requires both a cost 
function template per execution operator (which should be specified in a json 
format: 
https://github.com/apache/incubator-wayang/blob/80170b543469172438bb603dd6b5fbb2bd5dae64/wayang-platforms/wayang-spark/code/main/resources/wayang-spark-defaults.properties)
 and wayang execution logs (i.e. running jobs via Wayang). The genetic 
optimizer will learn the coefficients (denoted by ? In the template function). 
To actually understand how it does so, our VLDBJ paper (also in Arxiv) gives a 
bit more details about and a pointer for the genetic optimization we use:
https://arxiv.org/pdf/1805.03533.pdfSection 3.2 and Figure 4.

Let us know if that helps.

Best,
Jorge

> On 24 May 2023, at 11.12, Josep Sampe Domenech 
> <[email protected]> wrote:
> 
> Hello dev,
> 
> 
> 
> We recently started our exploration of the Wayang project and we would like 
> to gain a deeper understanding of the profiler tool and its functionalities, 
> specifically about the collection and use of metrics.
> 
> 
> 
> To enhance our comprehension, we would appreciate your assistance in 
> addressing the following queries:
> 
> 
> 
>  1.  Could you please provide us with an explanation of how the 
> GeneticOptimizerApp works? Specifically, we would like to understand which 
> information from the executions.json file is taken into consideration when 
> calculating the "?" parameters in the cost functions. Additionally, we are 
> interested in learning more about the methodology employed to calculate the 
> "?" values.
> 
> 
> 
>  1.  We are also curious about the purpose of the profiler.spark package. 
> What is the purpose of this package? Does it serve a specific objective?, and 
> can the results obtained from this profiler.spark be utilized or integrated 
> into the GeneticOptimizerApp?
> 
> 
> 
> 
> 
> Thank you in advance for your time and attention. We look forward to your 
> response.
> 
> 
> 
> Best regards,
> 
> Josep
> 

Reply via email to