The change on perf for the DAG table would be extremely negligible.

Maybe for task_instances (large table with millions of rows, 3 fields
composite key) it *could* be a decent idea. Though you'd then need to have
two indexes to store and maintain and we may have to change the code to
actually use and reference that new more efficient pk in places where it's
more efficient to use that index (some of it SQLAlchemy would do right out
of the box).

This mostly affects the index size (btree(id) is much smaller than
btree(dag_id, task_id, execution_date)), not the key lookup time much as it
is log(n). We'd still have to use the composite btree when we want to do
range scans, which we use frequently to get sets of tasks for a dag or
specific dag task. Since lookups are log(n), and that we need to maintain
that composite btree anyways for range scans, I don't see where that would
really help. It would be a better index (less pages, less memory usage,
...) if we didn't need that other composite one, which we do.

Max

On Thu, Aug 9, 2018 at 8:05 AM Vardan Gupta <vardangupta...@gmail.com>
wrote:

> Point well taken on backward compatibility, we will have to take this
> change very diligently, if implemented.
>
> On Thu, Aug 9, 2018 at 7:29 PM Юли Волкова <xnuins...@gmail.com> wrote:
>
> > Because in case what you described nothing about backward compatibility.
> > You think what all who use need to change all theirs DAG's? It's very
> > strange, because you propose one of the most critical change and it will
> > side everyone. If you want id - call it dag_metadata_id and add it. But
> not
> > propose change what hasn't backward compatibility. It's to strange.
> >
> > On Thu, Aug 9, 2018 at 7:04 AM vardangupta...@gmail.com <
> > vardangupta...@gmail.com> wrote:
> >
> > >
> > >
> > > On 2018/08/09 11:55:11, Ash Berlin-Taylor <a...@apache.org> wrote:
> > > > Absolutely - there will still need to be a human-readable DAG id,
> even
> > > we end up with an auto-icrementing integer ID column internally and for
> > > table join performance reasons.
> > > >
> > > > -ash
> > > >
> > > > > On 9 Aug 2018, at 12:35, Юли Волкова <xnuins...@gmail.com> wrote:
> > > > >
> > > > > How will you understand what your DAG 00002 doing enter to it? For
> > > each of
> > > > > 100, for example?
> > > > > Especially, if you are not a developer, who create it. You are a
> > > support
> > > > > team and have 120 DAGs.
> > > > >
> > > > > The first time, when want to also send the answer to dev-mail list.
> > > Please,
> > > > > don't do it.
> > > > >
> > > > > I think it's will be really bad to all who use dag_id like a saying
> > > name of
> > > > > dag. If I will be looked at 0329313 this does not say anything
> useful
> > > for
> > > > > me and it will be very very complicated to identify for which
> process
> > > dag
> > > > > using.  It could be another id for the indexes in DB if it's real
> > > problem
> > > > > for somebody. But, please, do not change dag_id.
> > > > >
> > > > > On Mon, Aug 6, 2018 at 1:32 AM vardangupta...@gmail.com <
> > > > > vardangupta...@gmail.com> wrote:
> > > > >
> > > > >> Hi Everyone,
> > > > >>
> > > > >> Do we have any plan to change type of dag_id from String to
> Number,
> > > this
> > > > >> will make queries on metadata more performant, proposal could be
> > > generating
> > > > >> an auto-incremental value in dag table and this id getting used in
> > > rest of
> > > > >> the other tables?
> > > > >>
> > > > >>
> > > > >> Regards,
> > > > >> Vardan Gupta
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > _________
> > > > >
> > > > > С уважением, Юлия Волкова
> > > > > Тел. : +7 (911) 116-71-82
> > > >
> > > >
> > >
> > > Thanks Ash for your reply, I am aligned with what you're saying.
> > >
> > > I was not proposing to take away human readable dag_id instead I was
> > > thinking, why can't we create another field like dag_name which will
> hold
> > > this information at all front facing sites while dag_id is changed to
> > > integer, this will help in making joins work faster in metastore.
> Though,
> > > currently dag_id is indexed but still indexing int (4 bytes) vs
> > > varchar(250) are going to take more index blocks and therefore more
> look
> > up
> > > time. Also, if dag_id is not trivial to change to int, let it be
> present
> > > and let's introduce another col which is actually integer in type and
> let
> > > joining happen on this column across all tables.
> > >
> >
> >
> > --
> > _________
> >
> > С уважением, Юлия Волкова
> > Тел. : +7 (911) 116-71-82
>

Reply via email to