Re: Naming things: What should the imports in dag files for DAG etc. be?

2024-09-03 Thread Julian LaNeve
Chiming in here mostly from the DAG author perspective!

I like `airflow.sdk` best. It makes it super clear what the user is supposed to 
interact with and what Airflow’s “public” interface is. Importing from 
`airflow.models` has always felt weird because it feels like you’re going into 
Airflow’s internals, and importing from things like `airflow.utils` just added 
to the confusion because it was always super unclear what a normal user is 
supposed to interact with vs what’s internal and subject to change.

The only slight downside (imo) to `airflow.sdk` is that an SDK is traditionally 
used to manage/interact with APIs (e.g. the Stripe SDK), so you could make the 
case that an “Airflow SDK” should be a library to interact with Airflow’s API. 
We’ve run into this before with Astro, where we published the Astro SDK as an 
Airflow provider for doing ETL. Then we were considering releasing a separate 
tool for interacting with Astro’s API (creating deployments, etc), which we 
would’ve called an “Astro SDK” but that name was already taken. I don’t think 
we’ll run into that here because we already have the `clients` concept to 
interact with the API.

The `airflow.definitions` pattern feels odd because it’s not something I’ve 
seen elsewhere, so a user would have to learn/remember the pattern just for 
Airflow. The top level option also feels nice but the “user” of Airflow is more 
than just a DAG author, so I wouldn’t want to restrict top-level imports just 
to one audience.

--
Julian LaNeve
CTO

Email: jul...@astronomer.io
 <mailto:jul...@astronomer.io>Mobile: 330 509 5792

> On Sep 2, 2024, at 6:46 AM, Jarek Potiuk  wrote:
> 
> Yep so. If we do not have side-effects from import airflow -> my vote would
> be "airflow.sdk" :)
> 
> On Mon, Sep 2, 2024 at 10:29 AM Ash Berlin-Taylor  wrote:
> 
>> Yes, strongly agreed on the “no side-effects form `import airflow`”.
>> 
>> To summarise the options so far:
>> 
>> 1. `from airflow import DAG, TaskGroup` — have the imports be from the top
>> level airflow module
>> 2. `from airflow.definitions import DAG, TaskGroup`
>> 3. `from airflow.sdk import DAG, TaskGroup`
>> 
>>> On 31 Aug 2024, at 23:07, Jarek Potiuk  wrote:
>>> 
>>> Should be:
>>> 
>>> ```
>>> @configure_settings
>>> @configure_worker_plugins
>>> def cli_worker():
>>>   pass
>>> ```
>>> 
>>> On Sun, Sep 1, 2024 at 12:05 AM Jarek Potiuk  wrote:
>>> 
>>>> Personally for me "airflow.sdk" is best and very straightforward. And we
>>>> have not yet used that for other things before, so it's free to use.
>>>> 
>>>> "Models" and similar carried more (often misleading) information - they
>>>> were sometimes database models, sometimes they were not. This caused a
>> lot
>>>> of confusion.
>>>> 
>>>> IMHO explicitly calling something "sdk" is a clear indication "this is
>>>> what you are expected to use". And makes it very clear what is and what
>> is
>>>> not a public interface. We should aim to make everything in
>> "airflow."
>>>> (or whatever we choose) "public" and everything else "private". That
>> should
>>>> also reduce the need of having to have a separate description of "what
>> is
>>>> public and what is not".
>>>> 
>>>> Actually - if we continue doing import initialization as we do today - I
>>>> would even go as far as the "airflow_sdk" package - unless we do
>> something
>>>> else that we have had a problem with for a long time - getting rid of
>> side
>>>> effects of "airflow" import.
>>>> 
>>>> It's a bit tangential but actually related - as part of this work we
>>>> should IMHO get rid of all side-effects of "import airflow" that we
>>>> currently have. If we stick to sub-package of airflow  - it is almost a
>>>> given thing since "airflow.sdk"  (or whatever we choose) will be
>>>> available to "worker", "dag file processor" and "triggerer" but the
>> rest of
>>>> the "airlfow","whatever" will not be, and they won't be able to use DB,
>>>> where scheduler, api_server will.
>>>> 
>>>> So having side effects - such as connecting to the DB, configuring
>>>> settings, plugin manager initialization when you do "import" caused a
>> lot
>>>> of pain, cyclic imports and a number of other problems.
&

Re: Using AI / Dosu to help us with triaging issues

2024-06-26 Thread Julian LaNeve
Not sure if I get an official vote here but we've been working with Devin and 
the Dosu team for Cosmos ( https://github.com/astronomer/astronomer-cosmos ) 
and it's been working great. Excited to see this in Airflow itself!

Plus they use Airflow to power their data platform behind the scenes so they 
have a vested interest in this one :)

--

*Julian LaNeve*
CTO

Email: jul...@astronomer.io
( jul...@astronomer.io ) Mobile: 330 509 5792

On Wed, Jun 26, 2024 at 4:58 PM, Vikram Koka < vik...@astronomer.io.invalid > 
wrote:

> 
> 
> 
> +1
> 
> 
> 
> Love it!
> 
> 
> 
> On Wed, Jun 26, 2024 at 1:23 PM Vincent Beck < vincbeck@ apache. org (
> vincb...@apache.org ) > wrote:
> 
> 
>> 
>> 
>> Fantastic idea!
>> 
>> 
>> 
>> On 2024/06/26 20:12:43 Jarek Potiuk wrote:
>> 
>> 
>>> 
>>> 
>>> Hello everyone,
>>> 
>>> 
>>> 
>>> Together with Elad, Kaxil, and the Dosu team [1], we’ve been looking into
>>> employing AI / Natural Language processing to help us triage issues for
>>> Apache Airflow. We do not want to go “all-in” into getting a chatbot to
>>> respond to all our issues because we believe this is not how the
>>> 
>>> 
>> 
>> 
>> 
>> community
>> 
>> 
>>> 
>>> 
>>> is being built. We looked at various ways we can start exploring the
>>> capabilities of the new ML/AI/Natural Language processing available.
>>> 
>>> 
>>> 
>>> We worked with the Dosu team. They are approved by the Apache Software
>>> Foundation infrastructure as Github integration and few ASF projects
>>> already use it (including our friends at Superset) - they have a
>>> 
>>> 
>> 
>> 
>> 
>> fantastic
>> 
>> 
>>> 
>>> 
>>> offer to provide free service for open-source projects like Airflow.
>>> Together we evaluated what we can start with and initially we have a
>>> proposal to use auto-labeling of issues created in the Airflow
>>> 
>>> 
>> 
>> 
>> 
>> repository.
>> 
>> 
>>> 
>>> 
>>> We have a number of rules that are established for the triage team [2]
>>> 
>>> 
>> 
>> 
>> 
>> but
>> 
>> 
>>> 
>>> 
>>> those rules are mundane and difficult to follow, and generally a lot of
>>> 
>>> 
>> 
>> 
>> 
>> our
>> 
>> 
>>> 
>>> 
>>> issues are either not classified or badly classified, and currently we
>>> cannot rely on the classification.
>>> 
>>> 
>>> 
>>> What we want to start with is to re-classify our issues and apply the
>>> labels retro-actively for all past issues as well as start applying them
>>> automatically for new issues.
>>> 
>>> 
>>> 
>>> The risk of doing it is low, and it will allow us to explore integration
>>> and follow up with more elaborated integration. We have some options such
>>> as getting automated proposals for answers for similar questions, as well
>>> as “chat-bot generated/maintainer approved” answers - but we definitely
>>> 
>>> 
>> 
>> 
>> 
>> do
>> 
>> 
>>> 
>>> 
>>> not want to have bots starting to answer automatically on PRs and issues.
>>> 
>>> 
>>> 
>>> We think that this will allow us to explore more ways how we can make
>>> maintainers and triagers time more efficient - and help us while we are
>>> focusing also on Airflow 3 development soon.
>>> 
>>> 
>>> 
>>> The Dosu founder - Devin, will send some more information soon and is
>>> available for questions here and in the #triage-team channel on Slack.
>>> 
>>> 
>>> 
>>> Unless we hear some complaints, we will apply labelling changes in a few
>>> days, I think this stage is not really controversial, and we will run a
>>> LAZY CONSENSUS in a few days.
>>> 
>>> 
>>> 
>>> J. E. K. (and the Dosu team).
>>> 
>>> 
>>> 
>>> [1] https:/ / dosu. dev/ ( https://dosu.dev/ )
>>> 
>>> 
>>> 
>>> [2]
>>> 
>>> 
>> 
>> 
>> 
>> https:/ / github. com/ apache/ airflow/ blob/ main/ ISSUE_TRIAGE_PROCESS. 
>> rst#labels
>> (
>> https://github.com/apache/airflow/blob/main/ISSUE_TRIAGE_PROCESS.rst#labels
>> )
>> 
>> 
>> 
>> - To
>> unsubscribe, e-mail: dev-unsubscribe@ airflow. apache. org (
>> dev-unsubscr...@airflow.apache.org ) For additional commands, e-mail: 
>> dev-help@
>> airflow. apache. org ( dev-h...@airflow.apache.org )
>> 
>> 
> 
> 
>

Re: Call with Nielsen team demoing their DAG debugging feature

2024-06-10 Thread Julian LaNeve
+1 I'd love to hear about it but unfortunately can't make the meeting!

--

*Julian LaNeve*
CTO

Email: jul...@astronomer.io
( jul...@astronomer.io ) Mobile: 330 509 5792

On Mon, Jun 10, 2024 at 3:05 PM, Constance Martineau < 
consta...@astronomer.io.invalid > wrote:

> 
> 
> 
> Hello again,
> 
> 
> 
> Given all the enthusiasm - assuming Nielsen is ok with this - what if
> someone recorded the meeting so that it could be shared with those that
> are interested?
> 
> 
> 
> Constance
> 
> 
> 
> On Mon, Jun 10 , 2024 at 1:48 PM Constance Martineau < constance@ astronomer.
> io ( consta...@astronomer.io ) > wrote:
> 
> 
>> 
>> 
>> Hi Jarek,
>> 
>> 
>> 
>> Same :)
>> 
>> 
>> 
>> Thanks,
>> Constance
>> 
>> 
>> 
>> On Mon, Jun 10 , 2024 at 9:57 AM Amogh Desai < amoghdesai. oss@ gmail. com
>> ( amoghdesai@gmail.com ) > wrote:
>> 
>> 
>>> 
>>> 
>>> Hello Jarek,
>>> 
>>> 
>>> 
>>> Please add me to the invite as well.
>>> 
>>> 
>>> 
>>> Thanks & Regards,
>>> Amogh Desai
>>> 
>>> 
>>> 
>>> On Mon, Jun 10 , 2024 at 11:22 AM Abhishek Bhakat
>>> < abhishek. bhakat@ astronomer. io. invalid (
>>> abhishek.bha...@astronomer.io.invalid ) > wrote:
>>> 
>>> 
>>>> 
>>>> 
>>>> Hi Jarek,
>>>> 
>>>> 
>>>> 
>>>> I would also like to join as well, please.
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Avi
>>>> 
>>>> 
>>>> 
>>>> On Sat, Jun 8 , 2024 at 3:32 PM Buğra Öztürk < ozturkbugra93@ gmail. com (
>>>> ozturkbugr...@gmail.com ) > wrote:
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Hello Jarek,
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks for sharing! It sounds very interesting. I would like to join.
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Could
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> you please forward to me as well?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat , 8 Jun 2024 , 17:28 Jed Cunningham, < jedcunningham@ apache. org (
>>>>> jedcunning...@apache.org ) > wrote:
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Interesting. Can you forward to me as well Jarek? Thanks!
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
>