Absolutely, I'll work on producing some results. Also, it's not just a matter of joining table, even pointed queries on individual tables like task_instance, dag_run, fag_failure will be faster with integer identifier.
On Thu, Aug 9, 2018 at 7:59 PM Ash Berlin-Taylor <ash_apa...@firemirror.com> wrote: > Since this is a big change that would touch much of the code base, before > we do this we need to see some hard numbers - timing or benchmarks of > queries etc. > > Also how often do we actually do such a join etc? > > -ash > > > On 9 Aug 2018, at 13:04, vardangupta...@gmail.com <mailto: > vardangupta...@gmail.com> wrote: > > > > Thanks Ash for your reply, I am aligned with what you're saying. > > > > I was not proposing to take away human readable dag_id instead I was > thinking, why can't we create another field like dag_name which will hold > this information at all front facing sites while dag_id is changed to > integer, this will help in making joins work faster in metastore. Though, > currently dag_id is indexed but still indexing int (4 bytes) vs > varchar(250) are going to take more index blocks and therefore more look up > time. Also, if dag_id is not trivial to change to int, let it be present > and let's introduce another col which is actually integer in type and let > joining happen on this column across all tables.