Re: System Test Dashboards - Phase Two??

Pankaj Koti Tue, 25 Jun 2024 01:24:17 -0700

For context, the Astronomer LLM providers dashboard operates as follows:

1. Fetch the latest source code for providers and system tests/example DAGs
from the Airflow repository, deploy them to an Airflow instance, and
execute the
DAGs.
2. Use the Airflow API to retrieve the DAG run statuses and produce a JSON
output of these statuses.
3. The dashboard, hosted on GitHub Pages, consumes the JSON data
generated in step 2.


We are willing to adopt and adhere to a JSON or XML specification and a
model HTML view if one is established.

Best regards,

*Pankaj Koti*
Senior Software Engineer (Airflow OSS Engineering team)
Location: Pune, Maharashtra, India
Timezone: Indian Standard Time (IST)


On Mon, Jun 24, 2024 at 11:40 PM Ferruzzi, Dennis
<ferru...@amazon.com.invalid> wrote:

> >  The information in our database is similar to the structure of the AWS
> providers json file
> >  https://aws-mwaa.github.io/open-source/system-tests/dashboard.json + a
> field for logs.
> >  We also have an extra field that specifies the commit-id against which
> the CI was run,
> >  which I believe is helpful in case users want to know whether their PR
> was merged before
> >  or after a failure.
>
> The commit ID is a handy addition for sure, I may look into adding that to
> the AWS dashboard.  I haven't had a chance to look into junit-xml yet, but
> I think what we could do is agree on a minimum structure and allow for
> extras.   For example, logs are great, but if Google provides them and AWS
> doesn't, that shouldn't break anything for the user trying to fetch logs.
> But the test name, timestamp, and success/fail state are definitely among
> the required minimum fields.
>
> > we could consider enforcing the presence of *some* dashboard that shows
> results of regular system tests executions for any new provider.
>
> The issue there is that smaller providers come and go, and are often added
> by community members, not even necessarily with the provider's knowledge.
>  We can't force them to provide any support.  If Random Contributor adds
> support for a new provider, neither the contributor nor the provider can be
> required to provide hosting for a dashboard and infrastructure to run the
> tests.  So (for the foreseeable future) the dashboards need to be an opt-in
> project by/for the providers.   Maybe some day the project might be able to
> provide hosting for the smaller dashboards or something, but I think the
> infrastructure to run the tests will always be optional and at the expense
> (and effort) of some other interested party (almost certainly the provider
> themselves, but who knows... ).
>
>
>  - ferruzzi
>
>
> ________________________________
> From: Michał Modras <michalmod...@google.com.INVALID>
> Sent: Monday, June 24, 2024 5:20 AM
> To: dev@airflow.apache.org
> Subject: RE: [EXT] System Test Dashboards - Phase Two??
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> Hi,
>
> +1 to this idea. I think standardizing the format of the presented test run
> results makes sense. I also agree that we don't necessarily need to enforce
> it in any hard way. However, given that we have dashboards of these three
> major providers, we could consider enforcing the presence of *some*
> dashboard
> that shows results of regular system tests executions for any new provider.
> WDYT?
>
> Best,
> Michal
>
> On Sun, Jun 23, 2024 at 10:09 PM Freddy Demiane
> <fdemi...@google.com.invalid>
> wrote:
>
> > Hello,
> >
> > Thank you for the comments! Indeed, +1 to the idea, I believe this would
> be
> > a good step to increase the quality of providers. From our (Google) side,
> > the dashboard's CI outputs the results to a database, which are then used
> > to generate an HTML page. Yet, generating and publishing a JSON or a
> JUnit
> > XML style file would be a simple task for us. The information in our
> > database is similar to the structure of the AWS providers json file
> > https://aws-mwaa.github.io/open-source/system-tests/dashboard.json +
> > a field for logs. We also have an extra field that specifies the
> commit-id
> > against which the CI was run, which I believe is helpful in case users
> want
> > to know whether their PR was merged before or after a failure.
> > If we want to go with the junit-xml style format (I checked this
> reference
> > <
> >
> https://www.ibm.com/docs/en/developer-for-zos/16.0?topic=formats-junit-xml-format
> > >),
> > one thing I could think of is to make each "Dashboard CI run" generate an
> > xml file where each test is represented by a testcase, which as Jarek
> > mentioned, could be used in some way in the canary builds.
> > Let me know what you think.
> >
> > Best,
> > Freddy
> >
> >
> > On Fri, Jun 21, 2024 at 11:12 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > This is a fantastic idea! I love it !
> > >
> > > It also has some very far reaching possible spin-offs in the future -
> > > literally few days ago, when I discussed some of the future security
> > > related work that we might want to do, there was a concept of having a
> > sort
> > > of CI of all CIs where we (and by we I mean wider Python ecosystem)
> could
> > > gather a status of pre-release versions of dependencies before they hit
> > > release stage, and some kind of interchange between those CI systems
> that
> > > will be machine-parseable is pretty much prerequisite for that. So we
> > could
> > > generally try it out and sort out some issues, see how it works in our
> > > small "airflow" world, but in the future we might be able to use
> similar
> > > mechanisms to get alerts for a number of our dependencies - and even
> > > further than that, we could make such approach much more wide-spread (I
> > am
> > > discussing it with people from Python Software Foundation/Packaging
> team
> > /
> > > Python security, so there is a chance this might actually materialize
> in
> > a
> > > long term). This would be the first step.
> > >
> > > I think the first step for it could be rather simple and we do not have
> > to
> > > invent our own standard - we could easily start with junit-xml style
> > output
> > > produced by each dashboard and available under some URL that we could
> > pull
> > > in our canary builds and have a step in our canary builds that could
> > > aggregate multiple xmlunit files coming from various dashboards,
> display
> > > them as the output, and fail the job in case some tests are failing
> (with
> > > maybe some thresholds). Pytest and a number of tools natively supports
> > the
> > > junit-xml format, it's pretty established as machine-readable test
> > results,
> > > and I think it has all we need to start with
> > >
> > >
> >
> https://docs.pytest.org/en/latest/how-to/usage.html#creating-junitxml-format-files
> > > .
> > > There is a lot of tooling around this format - including easy ways how
> we
> > > could possibly integrate it with Github Actions output (think links to
> > the
> > > tests that failed directly in GitHub UI), showing logs of failed tests
> > etc.
> > > etc.
> > >
> > > If we can get the Astronomer, Amazon and Google team on board with it,
> we
> > > could likely implement a simple version quickly and iterate over it -
> > later
> > > we could think about possibly evolving that into a more extensible
> > > approach.
> > >
> > > J.
> > >
> > >
> > >
> > >
> > > On Thu, Jun 20, 2024 at 11:27 PM Ferruzzi, Dennis
> > > <ferru...@amazon.com.invalid> wrote:
> > >
> > > > Congrats to the Google team for getting their dashboard live, it
> looks
> > > > great!  I've been thinking of something for a while and thought I'd
> > > mention
> > > > it here.  I'm wearing a few different hats here so I'll try to
> clarify
> > > > context on my plural pronouns the best I can.
> > > >
> > > > Now that we [Providers] have a couple of big dashboards up, I'm
> curious
> > > if
> > > > we [Airflow dev community] might collaborate on a community "optional
> > > > guideline" for a json (or yaml or whatever) format output on the
> > > dashboards
> > > > for any providers interested in participating.  I'm not interested in
> > (or
> > > > trying to) impose any kind of hard-line policy or standard here, but
> I
> > > > wonder if we [owners of the existing dashboards] might set some
> > > non-binding
> > > > precedent for future providers to join.  If others don't follow suit,
> > > then
> > > > they wouldn't benefit from whatever uses folks come up with for the
> > data,
> > > > but I personally don't think we [Airflow] can or should try to impose
> > > this
> > > > on providers.
> > > >
> > > > To my knowledge there are three provider-owned system test dashboards
> > > > currently live, and I look forward to seeing more in time:
> > > >
> > > > Astronomer (found this LLM-specific one, not sure if there is another
> > > > one): https://astronomer.github.io/llm-dags-dashboard/
> > > > AWS:
> > https://aws-mwaa.github.io/open-source/system-tests/dashboard.html
> > > > and
> > https://aws-mwaa.github.io/open-source/system-tests/dashboard.json
> > > > Google:
> > > >
> https://storage.googleapis.com/providers-dashboard-html/dashboard.html
> > > >
> > > > Each was developed independently, and the path/name of the Google one
> > may
> > > > hint that there is already an alternative to the html view that I'm
> > just
> > > > not familiar with, so maybe we [the three providers] could
> collaborate
> > on
> > > > some precedent that others could follow?  We [AWS] already have ours
> > > > exporting in json so discussion might start there and see where it
> > goes?
> > > > Either way... Even if we [Airflow] don't do anything with the json, I
> > > bet a
> > > > user could find interesting things to build if we give them the
> tools.
> > > >  Maybe aggregating a dashboard which monitors (and alerts?) the
> status
> > of
> > > > the system tests which cover the operators their workflow depends on,
> > > > maybe?  Who knows what someone may come up with once they have the
> > tools
> > > to
> > > > mix and match the data from various providers.
> > > >
> > > > Is there any interest in the idea of a "standard json schema" for
> these
> > > > and any future system test dashboards?
> > > >
> > > >
> > > >  - ferruzzi
> > > >
> > >
> >
>

Re: System Test Dashboards - Phase Two??

Reply via email to