For context, the Astronomer LLM providers dashboard operates as follows: 1. Fetch the latest source code for providers and system tests/example DAGs from the Airflow repository, deploy them to an Airflow instance, and execute the DAGs. 2. Use the Airflow API to retrieve the DAG run statuses and produce a JSON output of these statuses. 3. The dashboard, hosted on GitHub Pages, consumes the JSON data generated in step 2.
We are willing to adopt and adhere to a JSON or XML specification and a model HTML view if one is established. Best regards, *Pankaj Koti* Senior Software Engineer (Airflow OSS Engineering team) Location: Pune, Maharashtra, India Timezone: Indian Standard Time (IST) On Mon, Jun 24, 2024 at 11:40 PM Ferruzzi, Dennis <ferru...@amazon.com.invalid> wrote: > > The information in our database is similar to the structure of the AWS > providers json file > > https://aws-mwaa.github.io/open-source/system-tests/dashboard.json + a > field for logs. > > We also have an extra field that specifies the commit-id against which > the CI was run, > > which I believe is helpful in case users want to know whether their PR > was merged before > > or after a failure. > > The commit ID is a handy addition for sure, I may look into adding that to > the AWS dashboard. I haven't had a chance to look into junit-xml yet, but > I think what we could do is agree on a minimum structure and allow for > extras. For example, logs are great, but if Google provides them and AWS > doesn't, that shouldn't break anything for the user trying to fetch logs. > But the test name, timestamp, and success/fail state are definitely among > the required minimum fields. > > > we could consider enforcing the presence of *some* dashboard that shows > results of regular system tests executions for any new provider. > > The issue there is that smaller providers come and go, and are often added > by community members, not even necessarily with the provider's knowledge. > We can't force them to provide any support. If Random Contributor adds > support for a new provider, neither the contributor nor the provider can be > required to provide hosting for a dashboard and infrastructure to run the > tests. So (for the foreseeable future) the dashboards need to be an opt-in > project by/for the providers. Maybe some day the project might be able to > provide hosting for the smaller dashboards or something, but I think the > infrastructure to run the tests will always be optional and at the expense > (and effort) of some other interested party (almost certainly the provider > themselves, but who knows... ). > > > - ferruzzi > > > ________________________________ > From: Michał Modras <michalmod...@google.com.INVALID> > Sent: Monday, June 24, 2024 5:20 AM > To: dev@airflow.apache.org > Subject: RE: [EXT] System Test Dashboards - Phase Two?? > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que > le contenu ne présente aucun risque. > > > > Hi, > > +1 to this idea. I think standardizing the format of the presented test run > results makes sense. I also agree that we don't necessarily need to enforce > it in any hard way. However, given that we have dashboards of these three > major providers, we could consider enforcing the presence of *some* > dashboard > that shows results of regular system tests executions for any new provider. > WDYT? > > Best, > Michal > > On Sun, Jun 23, 2024 at 10:09 PM Freddy Demiane > <fdemi...@google.com.invalid> > wrote: > > > Hello, > > > > Thank you for the comments! Indeed, +1 to the idea, I believe this would > be > > a good step to increase the quality of providers. From our (Google) side, > > the dashboard's CI outputs the results to a database, which are then used > > to generate an HTML page. Yet, generating and publishing a JSON or a > JUnit > > XML style file would be a simple task for us. The information in our > > database is similar to the structure of the AWS providers json file > > https://aws-mwaa.github.io/open-source/system-tests/dashboard.json + > > a field for logs. We also have an extra field that specifies the > commit-id > > against which the CI was run, which I believe is helpful in case users > want > > to know whether their PR was merged before or after a failure. > > If we want to go with the junit-xml style format (I checked this > reference > > < > > > https://www.ibm.com/docs/en/developer-for-zos/16.0?topic=formats-junit-xml-format > > >), > > one thing I could think of is to make each "Dashboard CI run" generate an > > xml file where each test is represented by a testcase, which as Jarek > > mentioned, could be used in some way in the canary builds. > > Let me know what you think. > > > > Best, > > Freddy > > > > > > On Fri, Jun 21, 2024 at 11:12 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > This is a fantastic idea! I love it ! > > > > > > It also has some very far reaching possible spin-offs in the future - > > > literally few days ago, when I discussed some of the future security > > > related work that we might want to do, there was a concept of having a > > sort > > > of CI of all CIs where we (and by we I mean wider Python ecosystem) > could > > > gather a status of pre-release versions of dependencies before they hit > > > release stage, and some kind of interchange between those CI systems > that > > > will be machine-parseable is pretty much prerequisite for that. So we > > could > > > generally try it out and sort out some issues, see how it works in our > > > small "airflow" world, but in the future we might be able to use > similar > > > mechanisms to get alerts for a number of our dependencies - and even > > > further than that, we could make such approach much more wide-spread (I > > am > > > discussing it with people from Python Software Foundation/Packaging > team > > / > > > Python security, so there is a chance this might actually materialize > in > > a > > > long term). This would be the first step. > > > > > > I think the first step for it could be rather simple and we do not have > > to > > > invent our own standard - we could easily start with junit-xml style > > output > > > produced by each dashboard and available under some URL that we could > > pull > > > in our canary builds and have a step in our canary builds that could > > > aggregate multiple xmlunit files coming from various dashboards, > display > > > them as the output, and fail the job in case some tests are failing > (with > > > maybe some thresholds). Pytest and a number of tools natively supports > > the > > > junit-xml format, it's pretty established as machine-readable test > > results, > > > and I think it has all we need to start with > > > > > > > > > https://docs.pytest.org/en/latest/how-to/usage.html#creating-junitxml-format-files > > > . > > > There is a lot of tooling around this format - including easy ways how > we > > > could possibly integrate it with Github Actions output (think links to > > the > > > tests that failed directly in GitHub UI), showing logs of failed tests > > etc. > > > etc. > > > > > > If we can get the Astronomer, Amazon and Google team on board with it, > we > > > could likely implement a simple version quickly and iterate over it - > > later > > > we could think about possibly evolving that into a more extensible > > > approach. > > > > > > J. > > > > > > > > > > > > > > > On Thu, Jun 20, 2024 at 11:27 PM Ferruzzi, Dennis > > > <ferru...@amazon.com.invalid> wrote: > > > > > > > Congrats to the Google team for getting their dashboard live, it > looks > > > > great! I've been thinking of something for a while and thought I'd > > > mention > > > > it here. I'm wearing a few different hats here so I'll try to > clarify > > > > context on my plural pronouns the best I can. > > > > > > > > Now that we [Providers] have a couple of big dashboards up, I'm > curious > > > if > > > > we [Airflow dev community] might collaborate on a community "optional > > > > guideline" for a json (or yaml or whatever) format output on the > > > dashboards > > > > for any providers interested in participating. I'm not interested in > > (or > > > > trying to) impose any kind of hard-line policy or standard here, but > I > > > > wonder if we [owners of the existing dashboards] might set some > > > non-binding > > > > precedent for future providers to join. If others don't follow suit, > > > then > > > > they wouldn't benefit from whatever uses folks come up with for the > > data, > > > > but I personally don't think we [Airflow] can or should try to impose > > > this > > > > on providers. > > > > > > > > To my knowledge there are three provider-owned system test dashboards > > > > currently live, and I look forward to seeing more in time: > > > > > > > > Astronomer (found this LLM-specific one, not sure if there is another > > > > one): https://astronomer.github.io/llm-dags-dashboard/ > > > > AWS: > > https://aws-mwaa.github.io/open-source/system-tests/dashboard.html > > > > and > > https://aws-mwaa.github.io/open-source/system-tests/dashboard.json > > > > Google: > > > > > https://storage.googleapis.com/providers-dashboard-html/dashboard.html > > > > > > > > Each was developed independently, and the path/name of the Google one > > may > > > > hint that there is already an alternative to the html view that I'm > > just > > > > not familiar with, so maybe we [the three providers] could > collaborate > > on > > > > some precedent that others could follow? We [AWS] already have ours > > > > exporting in json so discussion might start there and see where it > > goes? > > > > Either way... Even if we [Airflow] don't do anything with the json, I > > > bet a > > > > user could find interesting things to build if we give them the > tools. > > > > Maybe aggregating a dashboard which monitors (and alerts?) the > status > > of > > > > the system tests which cover the operators their workflow depends on, > > > > maybe? Who knows what someone may come up with once they have the > > tools > > > to > > > > mix and match the data from various providers. > > > > > > > > Is there any interest in the idea of a "standard json schema" for > these > > > > and any future system test dashboards? > > > > > > > > > > > > - ferruzzi > > > > > > > > > >