Great stats Kamil :). I have not realized there is so big imbalance when it comes to the number of operators :).
I fully agree 66% +sounds like great value. And the stats tell me that maybe we are not that far away from testing everything :) J. On Mon, Apr 20, 2020 at 1:01 PM Kamil Breguła <[email protected]> wrote: > > Hello, > > Thanks Jarek, that you deal with this topic. It is very important for > our users. Many users want to use new operators, but this is not > possible. > > In my opinion, we should not only look at the package name, but their > content is more important. We should base our decisions on hard data. > For this reason, I have prepared some statistics. I counted how many > operators are in each package. > > 298 google > 49 amazon > 27 apache > 13 microsoft > 6 yandex > 6 qubole > 4 mysql > 3 slack > 3 redis > 3 jira > 3 cncf > 2 snowflake > 2 sftp > 2 salesforce > 2 oracle > 2 http > 2 ftp > 2 docker > 2 databricks > 1 vertica > 1 ssh > 1 sqlite > 1 singularity > 1 segment > 1 postgres > 1 papermill > 1 opsgenie > 1 mongo > 1 jenkins > 1 jdbc > 1 imap > 1 grpc > 1 exasol > 1 email > 1 discord > 1 dingding > 1 datadog > 1 celery > > So we have > 298 operators in google package (66% of total) > 152 operators in other packages > > Here is a list of all operators in Airflow master: > https://pastebin.com/GyARtGRC > To generate statistics I use the following command: > cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | uniq > -c | sort -n -r > cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | uniq > -c | sort -n -r | grep google | awk '{sum += $1} END {print sum}' > cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | uniq > -c | sort -n -r | grep -v google | awk '{sum += $1} END {print sum}' > > Now we can ask another question - should we release packages with 66+% > operators? If not, what percentage will be appropriate? > > In my opinion, we should release tested packages as soon as possible. > This allows users to become better acquainted with this idea, and in > the long run, encourage more people to test other services as well. > > Some operators for Google services that are in Airflow 1.10 have bugs > that make it difficult or impossible to use them. Many operators have > also never been released in any Airflow 1.10 release Many users write > to me who want to use Airflow 2.0 operators and I don't have good news > for them. If I can't solve all the problems then I would like to be > able to solve the problem only for a few people, but don't stay in one > place. Users expect that they will be able to use these operators now, > so if there are no technical obstacles then we should do it as soon as > possible. > > Best regards, > Kamil > > On Mon, Apr 20, 2020 at 10:06 AM Jarek Potiuk <[email protected]> > wrote: > > > > I would like to focus this week on releasing backport packages. And I > > would like to ask you for opinions on what should be the first "bunch > > of packages" to release: > > > > The current status snapshot is here: > > https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series > > > > We have a project in Github: > > https://github.com/apache/airflow/projects/2 where I keep the status > > of the packages and if you drill down to issues you will see that we > > have very well defined criteria for each of the packages to be > > "ready-to-release". > > > > I think adding system tests and actual testing is a slow process. We > > completed it for "google" "Postgres" "MySQL" packages and I am > > planning to complete it for "HTTP" - possibly few simpler ones like > > "sftp" "ssh" myself this week. We also need to re-test it for 1.10.10 > > but since we have semi-automated system tests, it will be easy and I > > might even be able to automate it with Github Actions. > > > > However, the two important ones "Microsoft" and "Amazon" are still > > quite far from completion (or even starting for "Microsoft"). > > > > I might try to engage more people to do the testing, but I think there > > also might be a value in releasing some first packages so that people > > start using them and maybe then this will be a bigger incentive to do > > more testing and implement system tests for other packages. > > > > I think about two scenarios of release: > > > > 1) Google + postgres + mysql + http + ssh +sftp > > > > 2) Same as above but we wait for "amazon" "microsoft" to complete > > > > What do you think - should we release the first bunch of operators > > now? I personally think we should do that. > > > > J. > > > > > > > > -- > > Jarek Potiuk > > Polidea | Principal Software Engineer > > > > M: +48 660 796 129 -- Jarek Potiuk Polidea | Principal Software Engineer M: +48 660 796 129
