Which time zone 1pm Friday :) ? Answering the question - I think there are lots of horror stories about using the EFS (regarding the quota, guaranteed throughput). I've not tried NFS, but with the lack of cloud-nativeness and characteristics, problems might be the same. From the engineering POV, those are exactly the kind of problems you can expect when you synchronize a bunch of files in the hope that they will stay consistent across a folder. There is a rather bad experience about using GCS-fuse in a similar setup (but mostly anecdotal). The main problem with those is very little control you have over the transfer and delays, some scenarios which are resulting in huge delays and lack of consistency. Git solves pretty much all of it. Nearly atomic DAG sync (all files have to be present locally to be checked out, so while checkout is not transactional atomic, it is as close as it can be when it comes to remote file sync). Minimizing the traffic based on git diff analysis also helps. So I think this is pretty nice characteristics for DAG sync overall
J. On Tue, Nov 3, 2020 at 9:55 PM Ry Walker <[email protected]> wrote: > Let’s go w/ Friday option > > On Tue, Nov 3, 2020 at 3:05 PM Alan K Chin <[email protected]> wrote: > >> @Tomek - Thanks for the link to the Airflow talk ! Checking it out now. >> >> @Jarek - It sounds like git-sync is or rather should be the default way >> users add/modify DAGs. With that said, have you had any experience with >> customers syncing their dags to other forms of dag storage (S3 etc.) and >> what the outcomes were? >> >> I spoke with Luciano and we're both available after anytime after noon on >> Thursday and Friday to chat about this effort. >> @Ry - Looking at your calendar Thurs@12:30 and Fri@1pm both look open, >> which one would fit your schedule best? >> >> Look forward to chatting. >> >> -- >> Alan Chin >> CODAIT, San Francisco >> Email - [email protected] >> >> >> ----- Original message ----- >> From: Tomasz Urbaszek <[email protected]> >> To: [email protected] >> Cc: "[email protected]" <[email protected]>, "[email protected]" < >> [email protected]> >> Subject: [EXTERNAL] Re: A Visual Editor for Airflow pipelines >> Date: Tue, Nov 3, 2020 3:29 AM >> >> I think the visual DAG editor would be a thing! >> >> Not sure if you are aware of this Airflow Summit talk about visual DAG >> editor and the integration between Airflow and CWL: >> >> https://www.youtube.com/watch?v=I4nFCqEnOJc&list=PLGudixcDaxY3RGLSlWoN_cEEXhIT1OPmj&index=19 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DI4nFCqEnOJc-26list-3DPLGudixcDaxY3RGLSlWoN-5FcEEXhIT1OPmj-26index-3D19&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=RZHlZzsdqSjQy2fMfUR1RBROmSi6vC1jw7g-ulLp1KI&e=> >> >> Cheers, >> Tomek >> >> On Tue, Nov 3, 2020 at 11:42 AM Jarek Potiuk <[email protected]> >> wrote: >> >> Agree - maybe 2.1 or 2.2 :). >> >> After some experiences with big customers deployments, I personally think >> GitSync at least for now is the best approach out there. It requires git >> repo + authorization, but this has all the added benefits of code change >> tracking, it is a very standard interface, most of the git repos provide >> some ways of manual review if needed and most have some kind of integration >> with CI/automated code analysis. >> >> I personally think it should be the default, for any serious deployment >> as it provides so many benefits with very limited extra. You just need an >> extra "box" - git repo (which is pretty much a given in any organization). >> It uses a standard interface that is highly customizable (branches/folder >> structures, whatnot) and we already have git-sync container support in the >> helm chart. >> >> J. >> >> >> On Tue, Nov 3, 2020 at 11:27 AM Ash Berlin-Taylor <[email protected]> wrote: >> >> Wishfull thinking at the moment Gerard -- the task execution still needs >> files on disk to run the tasks. >> >> This was always in my long term plan for DAG serialization, but we aren't >> there yet. And Custom operators makes this a non-straight forward problem >> to solve. >> >> -ash >> >> On Nov 3 2020, at 12:18 am, Gerard Casas Saez >> <[email protected]> wrote: >> >> Would be interested to also know possible ways to do what Luciano >> described. Hopefully w the serialized DAG and the new API we can start just >> pushing the DAG to the DB (wishful thinking)? >> >> Gerard Casas Saez >> Twitter | Cortex | @casassaez >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_casassaez&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=TRFzqZAGPxdwbYD4a2k_L9devsCrVoHDra_jpjBOe5c&e=> >> >> On Mon, Nov 2, 2020 at 2:06 PM Jarek Potiuk <[email protected]> >> wrote: >> >> Cool!. I also think it's an interesting one:). But it would be great to >> have such integration possible from Elyra :). Let us know what comes out of >> it :). >> >> J. >> >> >> On Mon, Nov 2, 2020 at 10:02 PM Ry Walker <[email protected]> wrote: >> >> Hi Luciano - >> >> Elyra looks like an interesting project — we'd love to connect and talk >> through the opportunity. >> >> You can compare your cal to mine and grab a slot here: >> https://calendly.com/ryw/60min >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__calendly.com_ryw_60min&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=f8ijoSoMQ0xgE7EeFcRY9HG0dvXeDY2t5PStPcQI0NU&e=> >> — >> and I'll be sure to get a few of the Airflow PMC members to join as well. >> >> -Ry >> >> Ry Walker >> Founder/CTO of Astronomer + Airflow Committer >> >> >> On Mon, Nov 2, 2020 at 12:00 AM Luciano Resende <[email protected]> >> wrote: >> >> Hi All, >> >> As mentioned in the user list [1] we are working on a visual editor >> for pipelines and adding Airflow as one of the supported backends. >> >> https://elyra.readthedocs.io/en/latest/user_guide/pipelines.html >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__elyra.readthedocs.io_en_latest_user-5Fguide_pipelines.html&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=Zc2Gn-FE20z6ShPCPYrUo7lgTmpV6izvmUDL-VcoKPw&e=> >> >> As you are the Airflow devs, we would invite you to help us implement >> the best integration possible, in two steps: >> >> 1) Getting a solid integration for building and running pipelines with >> python scripts and jupyter notebooks >> >> 2) Expand the available list component types and enable more generic >> operators >> >> One of the questions raised in the original e-mail is related to how >> to best submit the pipeline dag to be executed by the Airflow runtime, >> we have tried a few different options, starting from the experimental >> REST API, S3 bucket syncs and these seem to not be the ideal solution, >> will be looking into git-sync next, but would really appreciate some >> suggestions on the best options, particularly if someone has already >> done some external integration similar to this. >> >> Feel free to create issues for discussion and or more details >> >> https://github.com/elyra-ai/elyra/issues >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_elyra-2Dai_elyra_issues&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=JmRvVcY5ICgzHm299LUrcP10D6Yw92A8fjC-MNKR8-0&e=> >> >> Or use this thread for suggestions >> >> [1] >> https://lists.apache.org/thread.html/r19ca5e61a90910a6b5de6feea186d9138a4cd47c91ea34dd4cce6ff9%40%3Cusers.airflow.apache.org%3E >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_r19ca5e61a90910a6b5de6feea186d9138a4cd47c91ea34dd4cce6ff9-2540-253Cusers.airflow.apache.org-253E&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=51idLXv3vU7jUPVsRZaXNb_2CY8dadrqUs-IQdk9mPc&e=> >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_lresende1975&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=O7PBvuGPX4hCZer2guxof7kqzYbfnaZRQWRb3XbUJQ8&e=> >> http://lresende.blogspot.com/ >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lresende.blogspot.com_&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=IrrouExrZxjMSpsrnRQXBqngSBPfdFFD2C521sVzGEI&e=> >> >> >> >> -- >> >> Jarek Potiuk >> Polidea >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=Bti6AwFi8_yNtjHW5JYqHnkfXz-iNEukXNEGzJ2Fl_s&e=> >> | >> Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=Bti6AwFi8_yNtjHW5JYqHnkfXz-iNEukXNEGzJ2Fl_s&e=> >> >> >> >> -- >> >> Jarek Potiuk >> Polidea >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=Bti6AwFi8_yNtjHW5JYqHnkfXz-iNEukXNEGzJ2Fl_s&e=> >> | >> Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=9sHnXRUkfGz7tZzoTnwwsCnQBT7JEWS9x3yvHWFZpSo&m=2XxJ21Px6dUbjMcXagRxWE1CbiyJeVeQJ88qfDCfFNk&s=Bti6AwFi8_yNtjHW5JYqHnkfXz-iNEukXNEGzJ2Fl_s&e=> >> >> >> >> -- > Sent from Gmail Mobile > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
