Sounds like this may be a good candidate for an Airflow Improvement Plan...
On 20 September 2018 at 22:45, Brian Greene <br...@heisenbergwoodworking.com> wrote: > Prior to using airflow for much, on first inspection, I think I may have > agreed with you. > > After a bit of use I’d agree with Fokko and others - this isn’t really a > problem, and separating them seems to do more harm than good related to > deployment. > > I was gonna stop there, but why? > > You can add a task to a dag that’s deployed and has run and still view > history. The “new” task shows up white Squares in the old dags. nobody said > you’re required to also rename the dag when you do so this. If your process > or desire or design determines you need to rename it, well then by > definition... isn’t it a new thing without a history? Airflow is > implementing exactly that. > > One could argue that renaming to reflect exact purpose is good practice. > Yes, I’d agree, but again following that logic if it’s a small enough change > to “slip in” then the name likely shouldn’t change. If it’s big enough I > want to change the name then it’s a big enough change that I’m functionally > running something “new”, and I expect to need to account for that. Airflow > is enforcing that logic by coupling the name to the deployment of what you > said was a new process. > > One might put forth that changing the name to be more descriptive In the ui > makes it easier for support staff. I think perhaps if that’s your challenge > it’s not airflow that’s a problem. Dags are of course documented elsewhere > besides their name, right? Yeah it’s self documenting (and the graphs are > cool), but I have to assume there’s something besides the NAME to tell people > what it does. Additionally, far more than the name is required for even an > operator or monitor watcher to take action - you don’t expect them to know > which tasks to rerun or how to troubleshoot failures just based on your “now > most descriptive name in the UI” do you? > > I spent time In an informatica shop where all the jobs were numbered. > Numbered. Let’s be more exact... their NAMES were NUMBERS like 56709. > Terrible, but 100% worked, because while a descriptive name would have been > useful, the name is the thing that’s supposed to NOT CHANGE (see code of > Abibarshim), and all the other information can attach to that in places where > you write... other information. People would curse a number “F’ing 6291 > failed again” - everyone knew what they were talking about.. I digress. > > You might decide to document “dag ID 12” or just “12” on your wiki - I’m > going to document “daily_sales_import”. And when things start failing at 3am > it’s not my dag “56” that’s failing, it’s the sales_export dag. But if you > document “12”, that’s still it’s name, and it’d better be 12 in all your > environments and documents. This also means the actual db IDs from your > proposal are almost certainly NOT the same across your environments, making > the 12 unchangeable name! > > There are lots of languages (most of them) where the name of a thing is > important and hard to change. It’s not a bad thing, and I’d assume that > deploying a thing by name has some significance in many systems. Go rename a > class in... pick a language... tell me how that should be easier to do > willy-nilly so it’s easier In the UI. > > I suppose you could view it as a limitation, But i don’t think you’ve > illuminated a single use case where it’s an actual technical constraint or > limitation. > > The BEST argument against the current implementation is db performance. It’s > a hogwash argument. Basic key indexes on low cardinality string columns are > plenty fast for the airflow workload, and if your task load is so high > airflow can’t keep up or your seeing super-fast tasks and airflow db/tracking > latency is too much... perhaps a messaging or queue processing solution is > better suited to those workloads. We see scheduler bottlenecks long before > the database for our “quick task” scenarios. Additionally, reading through > this list you’ll find people running airflow at substantial scale - I’ve not > seen anyone complaining of production performance issues based on this design > decision. At first I hated it. String keys are dirty, we’re all taught > that as good little programmers. Except when performance won’t be a huge > consideration since it’s not OLTP and easy of queryabilty is more important > because it’s a growing system... good decision - whoever made it. > > How does filename matter? Frankly I wish the filename was REQUIRED to be the > dag name so people would quit confusing themselves by mismatching them ! > We’ve renamed dag files with no issue as long as the content doesn’t change, > so again, not a real use case. And really - name your stuff careful before > you get to prod man. > > I gotta ask - airflowuser - are you gonna use airflow for anything, or just > poke it with a stick from a distance and ask semi-inane questions of these > fine folks that wrote and spend time working on this cool piece of kit? > > B > > Sent from a device with less than stellar autocorrect > >> On Sep 20, 2018, at 3:12 PM, Driesprong, Fokko <fo...@driesprong.frl> wrote: >> >> I like the dag_id for both the name and as an unique identifier. If you >> change the dag in such a way, that it deserves a new name, you probably >> want to create a new dag anyway. If you want to give some additional >> context, you can use the description field: >> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3131-L3132 >> >> The name of the file of dag does not have any influence. >> >> My 2¢ >> >> Cheers, Fokko >> >> Op do 20 sep. 2018 om 19:40 schreef James Meickle >> <jmeic...@quantopian.com.invalid>: >> >>> I'm personally against having some kind of auto-increment numeric ID for >>> DAGs. While this makes a lot of sense for systems where creation is a >>> database activity (like a POST request), in Airflow, DAG creation is >>> actually a code ship activity. There are all kinds of complex scenarios >>> around that: >>> >>> - I revert a commit and a DAG disappears or is renamed >>> - I run the same file, twice, with multiple parameters to create two DAGs >>> - I create the DAG in both staging and prod, but they wind up with >>> different IDs >>> >>> It's just too hard to automatically track these scenarios. >>> >>> If we really wanted to put something like this in place, it would first >>> make more sense to decouple DAG creation from code shipping, and instead >>> prefer creation of a DAG outside of code (but with a definition that >>> references which git repo/committish/file/arguments/etc. to use). Then if >>> you do something like rename a file, the DAG breaks, but at least still >>> exists in the db with that ID and history still makes sense once you update >>> the DAG definition with the new code location. >>> >>> On Thu, Sep 20, 2018 at 4:52 AM airflowuser >>> <airflowu...@protonmail.com.invalid> wrote: >>> >>>> Hi, >>>> though this could have been explained on Jira I think this should be >>>> discussed first. >>>> >>>> The problem: >>>> Airflow mixes DAG name with id. It uses same filed for both purposes. >>>> >>>> I assume that most of you use the dag_id to describe what the DAG >>> actually >>>> does. >>>> For example: >>>> >>>> dag = DAG( >>>> dag_id='cost_report_daily', >>>> ... >>>> ) >>>> >>>> This dag_id is reflected to the dag id column in the UI. >>>> Now, lets say that you want to add another task to this specific dag - >>> You >>>> are to be extremely careful when you change the dag_id to represent the >>> new >>>> functionality for example : dag_id='cost_expenses_reports_daily' . This >>>> will break the history of the DAG. >>>> >>>> Or even with simpler use case.. the user just want to change the name he >>>> sees on the UI. >>>> >>>> I suggest to have a discussion if the dag_id should be split into id (an >>>> actual id) and name to reflect what it does. When the "connection" is >>> done >>>> by id's - names can change as much as you want without breaking >>> anything. >>>> essentially it becomes a field uses for display purpose only. >>>> >>>> * I didn't mention also the issue of DAG file name which can also cause >>>> trouble if someone wants to change it. >>>> >>>> Sent with [ProtonMail](https://protonmail.com) Secure Email. >>>