[
https://issues.apache.org/jira/browse/COMDEV-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bertty Contreras updated COMDEV-474:
------------------------------------
Remaining Estimate: 350h (was: 5h 50m)
Original Estimate: 350h (was: 5h 50m)
> Apache Wayang(Incubating): ML-based Query Optimization
> ------------------------------------------------------
>
> Key: COMDEV-474
> URL: https://issues.apache.org/jira/browse/COMDEV-474
> Project: Community Development
> Issue Type: New Feature
> Components: GSoC/Mentoring ideas
> Reporter: Bertty Contreras
> Priority: Critical
> Labels: gsoc, gsoc2022, machine_learning
> Original Estimate: 350h
> Remaining Estimate: 350h
>
> *Synopsis*
> The current Apache Wayang (Incubating) uses a cost model to compute the right
> platforms and optimize the plans; however, calibrating cost models is one of
> the hardest problems in practice and the main cause for a system to
> underperform. Therefore, the goal is to create a new optimizer component that
> has ML at its core: the entire plan enumeration is guided and powered by a ML
> model.
>
> *Benefits to Community*
> The benefit for the community will be getting an ML optimizer, which means
> that the optimization quality will depend on the data used for training the
> model, instead of a human trying to figure out the best calibration of the
> cost model. The ML-based Query Optimizer will result in more people using
> Apache Wayang(Incubating) with almost no effort in terms of configurations.
> This will also inspire other projects to incorporate similar optimization
> modules into their systems.
>
> *Deliverables*
> The delivery expected is an adaptation for the paper "ML-based Cross-Platform
> Query Optimization"[1], where the authors proposed a Machine learning model
> that can be used as the Query optimizer inside of Apache Wayang(Incubating)
>
> The step expected are the following:
> * Understand the paper [1]
> * Get into the internals of the optimizer of Apache Wayang(Incubating)
> * Discuss and design the process for the ML Query Optimizer
> * Implement the new ML-based Query Optimizer
>
> *Related Work*
> [1] [ML-based Cross-Platform Query
> Optimization]([https://wayang.apache.org/assets/pdf/paper/icde20.pdf])
> [2] [RHEEMix in the data jungle: a cost-based optimizer for cross-platform
> systems]([https://wayang.apache.org/assets/pdf/paper/journal_vldb.pdf])
>
> *Biographical Information of possible mentors*
> Rodrigo Pardo-Meza is a Senior Software Engineer at Databloom Inc. He is one
> of the PPMC of Apache Wayang(Incubating). He has many years of experience
> developing applications that support Big Data processing, with experience
> implementing ETL processes over distributed systems to optimize inventories
> in supply chains. He was a research engineer at the Qatar Computing Research
> Institute, where he specialized in human interface interaction with big data
> analytics. During this time, he co-develop an ML-based cross-platform query
> optimizer.
>
> Bertty Contreras-Rojas is a Senior Software Engineer at Databloom Inc. He is
> one of the PPMC of Apache Wayang(Incubating). He has many years of experience
> developing intensive processing data systems for several industries, such as
> banking systems. He was a research engineer at the Qatar Computing Research
> Institute, where he was responsible for developing the declarative query
> engine for Rheem and adding new underlying platforms to Rheem.
> Jorge Quiané is the head of the Big Data Systems research group at the Berlin
> Institute for the Foundations of Learning and Data (BIFOLD) and a Principal
> Researcher at DIMA (TU Berlin). He also acts as the Scientific Coordinator of
> the IAM group at the German Research Center for ArtificialIntelligence
> (DFKI). His current research is in the broad area of big data: mainly in
> federated data analytics, scalable data infrastructures, and distributed
> query processing. He has published numerous research papers on data
> management and novel system architectures. He has recently been honoured with
> the 2022 ACM SIGMOD Research Highlight Award and the Best Paper Award at ICDE
> 2021 for his work on “EfficientControl Flow in Dataflow Systems”. He holds
> five patents in core database areas and on machine learning. Earlier in his
> career, he was a Senior Scientist at the Qatar Computing Research Institute
> (QCRI) and a Postdoctoral Researcher at Saarland University. He obtained his
> PhD in computer science from INRIA (Nantes University).
>
> *Name and Contact Information*
> Name: Rodrigo Pardo-Meza
> email: rpardomeza (at) apache.org
> community: dev (at) wayang.apache.org
> website: [https://wayang.apache.org|https://wayang.apache.org/]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]