These performance characteristics are metadata database backend dependent as well. If there are benchmarks, I would hope we look at them across sqlite, mysql, postgresql, and any other supported backends before we take action.
On Thu, Aug 9, 2018 at 12:41 PM Maxime Beauchemin < maximebeauche...@gmail.com> wrote: > The change on perf for the DAG table would be extremely negligible. > > Maybe for task_instances (large table with millions of rows, 3 fields > composite key) it *could* be a decent idea. Though you'd then need to have > two indexes to store and maintain and we may have to change the code to > actually use and reference that new more efficient pk in places where it's > more efficient to use that index (some of it SQLAlchemy would do right out > of the box). > > This mostly affects the index size (btree(id) is much smaller than > btree(dag_id, task_id, execution_date)), not the key lookup time much as it > is log(n). We'd still have to use the composite btree when we want to do > range scans, which we use frequently to get sets of tasks for a dag or > specific dag task. Since lookups are log(n), and that we need to maintain > that composite btree anyways for range scans, I don't see where that would > really help. It would be a better index (less pages, less memory usage, > ...) if we didn't need that other composite one, which we do. > > Max > > On Thu, Aug 9, 2018 at 8:05 AM Vardan Gupta <vardangupta...@gmail.com> > wrote: > > > Point well taken on backward compatibility, we will have to take this > > change very diligently, if implemented. > > > > On Thu, Aug 9, 2018 at 7:29 PM Юли Волкова <xnuins...@gmail.com> wrote: > > > > > Because in case what you described nothing about backward > compatibility. > > > You think what all who use need to change all theirs DAG's? It's very > > > strange, because you propose one of the most critical change and it > will > > > side everyone. If you want id - call it dag_metadata_id and add it. But > > not > > > propose change what hasn't backward compatibility. It's to strange. > > > > > > On Thu, Aug 9, 2018 at 7:04 AM vardangupta...@gmail.com < > > > vardangupta...@gmail.com> wrote: > > > > > > > > > > > > > > > On 2018/08/09 11:55:11, Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > Absolutely - there will still need to be a human-readable DAG id, > > even > > > > we end up with an auto-icrementing integer ID column internally and > for > > > > table join performance reasons. > > > > > > > > > > -ash > > > > > > > > > > > On 9 Aug 2018, at 12:35, Юли Волкова <xnuins...@gmail.com> > wrote: > > > > > > > > > > > > How will you understand what your DAG 00002 doing enter to it? > For > > > > each of > > > > > > 100, for example? > > > > > > Especially, if you are not a developer, who create it. You are a > > > > support > > > > > > team and have 120 DAGs. > > > > > > > > > > > > The first time, when want to also send the answer to dev-mail > list. > > > > Please, > > > > > > don't do it. > > > > > > > > > > > > I think it's will be really bad to all who use dag_id like a > saying > > > > name of > > > > > > dag. If I will be looked at 0329313 this does not say anything > > useful > > > > for > > > > > > me and it will be very very complicated to identify for which > > process > > > > dag > > > > > > using. It could be another id for the indexes in DB if it's real > > > > problem > > > > > > for somebody. But, please, do not change dag_id. > > > > > > > > > > > > On Mon, Aug 6, 2018 at 1:32 AM vardangupta...@gmail.com < > > > > > > vardangupta...@gmail.com> wrote: > > > > > > > > > > > >> Hi Everyone, > > > > > >> > > > > > >> Do we have any plan to change type of dag_id from String to > > Number, > > > > this > > > > > >> will make queries on metadata more performant, proposal could be > > > > generating > > > > > >> an auto-incremental value in dag table and this id getting used > in > > > > rest of > > > > > >> the other tables? > > > > > >> > > > > > >> > > > > > >> Regards, > > > > > >> Vardan Gupta > > > > > >> > > > > > > > > > > > > > > > > > > -- > > > > > > _________ > > > > > > > > > > > > С уважением, Юлия Волкова > > > > > > Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82> > > > > > > > > > > > > > > > > > > Thanks Ash for your reply, I am aligned with what you're saying. > > > > > > > > I was not proposing to take away human readable dag_id instead I was > > > > thinking, why can't we create another field like dag_name which will > > hold > > > > this information at all front facing sites while dag_id is changed to > > > > integer, this will help in making joins work faster in metastore. > > Though, > > > > currently dag_id is indexed but still indexing int (4 bytes) vs > > > > varchar(250) are going to take more index blocks and therefore more > > look > > > up > > > > time. Also, if dag_id is not trivial to change to int, let it be > > present > > > > and let's introduce another col which is actually integer in type and > > let > > > > joining happen on this column across all tables. > > > > > > > > > > > > > -- > > > _________ > > > > > > С уважением, Юлия Волкова > > > Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82> > > >