Re: modifying spark's optimizer for research

2021-04-22 Thread Walter Cai
Hi Cheng Su and All,

Thanks for your reply; the change I'm attempting to make would be a
significant philosophical change to how optimizers currently handle
cardinality estimation. With that in mind, I think it might be wiser to
first perform a prototype/proof of concept as versus the traditional pull
request and review workflow.

For some more context on my method: the central idea of my work is to lean
heavily towards overestimation
 during the cardinality
estimation process using the elegant entropic bounding
 framework. Particularly for
multi-join queries this avoids the underestimation problem that pervades
modern systems. So far my work has focused on single node DBs and scaling
to multi-node systems presents new hurdles; hence why I'm here.

Thanks,
Walter

On Wed, Apr 21, 2021 at 11:46 PM Cheng Su  wrote:

> Hello Walter,
>
>
>
> Just FYI - https://spark.apache.org/contributing.html is the general
> guide for how to contributing in Spark.
>
>
>
> > implement a prototype modification to spark's optimizer to
> exhibit/experiment some of my PhD work
>
>
>
> Maybe you could share some links or pointers for the work you have done?
> So this can help give people some basic ideas and provide help more
> specifically.
>
>
>
> Thanks,
>
> Cheng Su
>
>
>
> *From: *Walter Cai 
> *Date: *Wednesday, April 21, 2021 at 6:09 PM
> *To: *"dev@spark.apache.org" 
> *Subject: *modifying spark's optimizer for research
>
>
>
> Hi,
>
>
>
> I'm Walter, a PhD student at the University of Washington. My goal is to
> implement a prototype modification to spark's optimizer to
> exhibit/experiment some of my PhD work. I was hoping to set up a chat with
> somebody who is familiar with catalyst and the best place to start
> modifying.
>
>
>
> Thanks,
>
> Walter
>
> wal...@cs.washington.edu
>


Re: modifying spark's optimizer for research

2021-04-22 Thread Cheng Su
Hello Walter,

Just FYI - https://spark.apache.org/contributing.html is the general guide for 
how to contributing in Spark.

> implement a prototype modification to spark's optimizer to exhibit/experiment 
> some of my PhD work

Maybe you could share some links or pointers for the work you have done? So 
this can help give people some basic ideas and provide help more specifically.

Thanks,
Cheng Su

From: Walter Cai 
Date: Wednesday, April 21, 2021 at 6:09 PM
To: "dev@spark.apache.org" 
Subject: modifying spark's optimizer for research

Hi,

I'm Walter, a PhD student at the University of Washington. My goal is to 
implement a prototype modification to spark's optimizer to exhibit/experiment 
some of my PhD work. I was hoping to set up a chat with somebody who is 
familiar with catalyst and the best place to start modifying.

Thanks,
Walter
wal...@cs.washington.edu