Re: [DISCUSSION] UTests and embedded backends
Hi guys, I just submitted the PR: https://github.com/apache/beam/pull/7751. It contains refactorings, tests improvements/fixes and production code fixing. I wanted to give a little feedback because replacing the mock by a real instance allowed to - improve the tests: fix bad tests- add missing split test - and more important to discover a bug in the production code of the split and fix it. => So I would love if we all agree to avoid mocks when possible. Of course, as mentioned, some times mocks cannot be avoided e.g. for hosted backends. Etienne Le lundi 28 janvier 2019 à 11:16 +0100, Etienne Chauchot a écrit : > Guys, > I will try using mocks where I see it is needed. As there is a current PR > opened on Cassandra, I will take this > opportunity to add the embedded cassandra server > (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket > was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164 > Etienne > Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit : > > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > > yes? > > Yes, something like TestPipeline that buffers up the pipelines andthen > > executes on class teardown (details TBD). > > A lighter-weight fake, like using something in-process sharing a Java > > interface (versus today a locally running > > service sharing an RPC interface) is still much better than a mock. > > +1 > > > > Kenn > > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré > > wrote: > > Hi, > > it makes sense to use embedded backend when: > > 1. it's possible to easily embed the backend2. when the backend is > > "predictable". > > If it's easy to embed and the backend behavior is predictable, then itmakes > > sense.In other cases, we can fallback to > > mock. > > RegardsJB > > On 21/01/2019 10:07, Etienne Chauchot wrote:Hi guys, > > Lately I have been fixing various Elasticsearch flakiness issues in > > theUTests by: introducing timeouts, countdown > > latches, force refresh,embedded cluster size decrease ... > > These flakiness issues are due to the embedded Elasticsearch not copingwell > > with the jenkins overload. Still, IMHO I > > believe that havingembedded backend for UTests are a lot better than mocks. > > Even if theyare less tolerant to load, I > > prefer having UTests 100% representative ofreal backend and add > > countermeasures to protect against jenkins overload. > > WDYT ? > > Etienne > > > > > > --Jean-Baptiste Onofréjbonofre@apache.orghttp://blog.nanthrax.netTalend - > > http://www.talend.com
Re: [DISCUSSION] UTests and embedded backends
Hi Robert, Yes, this is something I really believe in: test coverage offered by embedded instances are worth some temporary flakiness (due to resource over consumption). I also deeply agree with your point on maintenance: some mocks could hide bugs in production code that would cost a lot in the long term. Etienne Le lundi 28 janvier 2019 à 11:44 +0100, Robert Bradshaw a écrit : > I strongly agree with your original assessment "IMHO I believe thathaving > embedded backend for UTests are a lot better > than mocks." Mocksare sometimes necessary, but in my experience they are > often anexpensive (in production and > maintenance) way to get what amounts tolow true coverage. > On Mon, Jan 28, 2019 at 11:16 AM Etienne Chauchot > wrote: > > Guys, > I will try using mocks where I see it is needed. As there is a current PR > opened on Cassandra, I will take this > opportunity to add the embedded cassandra server > (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket > was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164 > Etienne > Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit : > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? > > Yes, something like TestPipeline that buffers up the pipelines and > then executes on class teardown (details TBD). > > A lighter-weight fake, like using something in-process sharing a Java > interface (versus today a locally running > service sharing an RPC interface) is still much better than a mock. > > +1 > > > Kenn > > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré > wrote: > > Hi, > > it makes sense to use embedded backend when: > > 1. it's possible to easily embed the backend > 2. when the backend is "predictable". > > If it's easy to embed and the backend behavior is predictable, then it > makes sense. > In other cases, we can fallback to mock. > > Regards > JB > > On 21/01/2019 10:07, Etienne Chauchot wrote: > Hi guys, > > Lately I have been fixing various Elasticsearch flakiness issues in the > UTests by: introducing timeouts, countdown latches, force refresh, > embedded cluster size decrease ... > > These flakiness issues are due to the embedded Elasticsearch not coping > well with the jenkins overload. Still, IMHO I believe that having > embedded backend for UTests are a lot better than mocks. Even if they > are less tolerant to load, I prefer having UTests 100% representative of > real backend and add countermeasures to protect against jenkins overload. > > WDYT ? > > Etienne > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com
Re: [DISCUSSION] UTests and embedded backends
I strongly agree with your original assessment "IMHO I believe that having embedded backend for UTests are a lot better than mocks." Mocks are sometimes necessary, but in my experience they are often an expensive (in production and maintenance) way to get what amounts to low true coverage. On Mon, Jan 28, 2019 at 11:16 AM Etienne Chauchot wrote: > > Guys, > > I will try using mocks where I see it is needed. As there is a current PR > opened on Cassandra, I will take this opportunity to add the embedded > cassandra server (https://github.com/jsevellec/cassandra-unit) to the UTests. > Ticket was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164 > > Etienne > > Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit : > > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? > > > Yes, something like TestPipeline that buffers up the pipelines and > > then executes on class teardown (details TBD). > > > A lighter-weight fake, like using something in-process sharing a Java > interface (versus today a locally running service sharing an RPC interface) > is still much better than a mock. > > > +1 > > > > Kenn > > > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré > wrote: > > > Hi, > > > it makes sense to use embedded backend when: > > > 1. it's possible to easily embed the backend > > 2. when the backend is "predictable". > > > If it's easy to embed and the backend behavior is predictable, then it > > makes sense. > > In other cases, we can fallback to mock. > > > Regards > > JB > > > On 21/01/2019 10:07, Etienne Chauchot wrote: > > Hi guys, > > > Lately I have been fixing various Elasticsearch flakiness issues in the > > UTests by: introducing timeouts, countdown latches, force refresh, > > embedded cluster size decrease ... > > > These flakiness issues are due to the embedded Elasticsearch not coping > > well with the jenkins overload. Still, IMHO I believe that having > > embedded backend for UTests are a lot better than mocks. Even if they > > are less tolerant to load, I prefer having UTests 100% representative of > > real backend and add countermeasures to protect against jenkins overload. > > > WDYT ? > > > Etienne > > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com
Re: [DISCUSSION] UTests and embedded backends
Guys, I will try using mocks where I see it is needed. As there is a current PR opened on Cassandra, I will take this opportunity to add the embedded cassandra server (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164 id="-x-evo-selection-start-marker"> Etienne Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit : > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? > Yes, something like TestPipeline that buffers up the pipelines andthen > executes on class teardown (details TBD). > A lighter-weight fake, like using something in-process sharing a Java > interface (versus today a locally running > service sharing an RPC interface) is still much better than a mock. > +1 > > Kenn > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré > wrote: > > Hi, > it makes sense to use embedded backend when: > 1. it's possible to easily embed the backend2. when the backend is > "predictable". > If it's easy to embed and the backend behavior is predictable, then itmakes > sense.In other cases, we can fallback to > mock. > RegardsJB > On 21/01/2019 10:07, Etienne Chauchot wrote: > Hi guys, > Lately I have been fixing various Elasticsearch flakiness issues in theUTests > by: introducing timeouts, countdown > latches, force refresh,embedded cluster size decrease ... > These flakiness issues are due to the embedded Elasticsearch not copingwell > with the jenkins overload. Still, IMHO I > believe that havingembedded backend for UTests are a lot better than mocks. > Even if theyare less tolerant to load, I > prefer having UTests 100% representative ofreal backend and add > countermeasures to protect against jenkins overload. > WDYT ? > Etienne > > > --Jean-Baptiste Onofréjbonofre@apache.orghttp://blog.nanthrax.netTalend - > http://www.talend.com
Re: [DISCUSSION] UTests and embedded backends
On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? Yes, something like TestPipeline that buffers up the pipelines and then executes on class teardown (details TBD). > A lighter-weight fake, like using something in-process sharing a Java > interface (versus today a locally running service sharing an RPC interface) > is still much better than a mock. +1 > > Kenn > > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré > wrote: >> >> Hi, >> >> it makes sense to use embedded backend when: >> >> 1. it's possible to easily embed the backend >> 2. when the backend is "predictable". >> >> If it's easy to embed and the backend behavior is predictable, then it >> makes sense. >> In other cases, we can fallback to mock. >> >> Regards >> JB >> >> On 21/01/2019 10:07, Etienne Chauchot wrote: >> > Hi guys, >> > >> > Lately I have been fixing various Elasticsearch flakiness issues in the >> > UTests by: introducing timeouts, countdown latches, force refresh, >> > embedded cluster size decrease ... >> > >> > These flakiness issues are due to the embedded Elasticsearch not coping >> > well with the jenkins overload. Still, IMHO I believe that having >> > embedded backend for UTests are a lot better than mocks. Even if they >> > are less tolerant to load, I prefer having UTests 100% representative of >> > real backend and add countermeasures to protect against jenkins overload. >> > >> > WDYT ? >> > >> > Etienne >> > >> > >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com
Re: [DISCUSSION] UTests and embedded backends
Robert - you meant this as a mostly-automatic thing that we would engineer, yes? A lighter-weight fake, like using something in-process sharing a Java interface (versus today a locally running service sharing an RPC interface) is still much better than a mock. Kenn On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré wrote: > Hi, > > it makes sense to use embedded backend when: > > 1. it's possible to easily embed the backend > 2. when the backend is "predictable". > > If it's easy to embed and the backend behavior is predictable, then it > makes sense. > In other cases, we can fallback to mock. > > Regards > JB > > On 21/01/2019 10:07, Etienne Chauchot wrote: > > Hi guys, > > > > Lately I have been fixing various Elasticsearch flakiness issues in the > > UTests by: introducing timeouts, countdown latches, force refresh, > > embedded cluster size decrease ... > > > > These flakiness issues are due to the embedded Elasticsearch not coping > > well with the jenkins overload. Still, IMHO I believe that having > > embedded backend for UTests are a lot better than mocks. Even if they > > are less tolerant to load, I prefer having UTests 100% representative of > > real backend and add countermeasures to protect against jenkins overload. > > > > WDYT ? > > > > Etienne > > > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: [DISCUSSION] UTests and embedded backends
Hi, it makes sense to use embedded backend when: 1. it's possible to easily embed the backend 2. when the backend is "predictable". If it's easy to embed and the backend behavior is predictable, then it makes sense. In other cases, we can fallback to mock. Regards JB On 21/01/2019 10:07, Etienne Chauchot wrote: > Hi guys, > > Lately I have been fixing various Elasticsearch flakiness issues in the > UTests by: introducing timeouts, countdown latches, force refresh, > embedded cluster size decrease ... > > These flakiness issues are due to the embedded Elasticsearch not coping > well with the jenkins overload. Still, IMHO I believe that having > embedded backend for UTests are a lot better than mocks. Even if they > are less tolerant to load, I prefer having UTests 100% representative of > real backend and add countermeasures to protect against jenkins overload. > > WDYT ? > > Etienne > > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: [DISCUSSION] UTests and embedded backends
Thanks Robert for your answer.Grouping tests is a good idea, thanks for the reminder. I'll use that if new flakiness show up and if I have no countermeasures left :) Etienne Le lundi 21 janvier 2019 à 12:39 +0100, Robert Bradshaw a écrit : > I am of the same opinion, this is the approach we're taking for Flinkas well. > Various mitigations (e.g. capping the > parallelism at 2 ratherthan the default of num cores) have helped. > Several times the idea has been proposed to group unit tests togetherfor > "expensive" backends. E.g. for self-contained > tests one can createa single pipeline that contains all the tests with their > asserts, andthen run that once to > amortize the overhead (which is quitesignificant when you're only > manipulating literally bytes of data).Only on > failure would it exercise them individually (eithersequentially, or via a > binary search). > On Mon, Jan 21, 2019 at 10:07 AM Etienne Chauchot > wrote: > > Hi guys, > Lately I have been fixing various Elasticsearch flakiness issues in the > UTests by: introducing timeouts, countdown > latches, force refresh, embedded cluster size decrease ... > These flakiness issues are due to the embedded Elasticsearch not coping well > with the jenkins overload. Still, IMHO I > believe that having embedded backend for UTests are a lot better than mocks. > Even if they are less tolerant to load, I > prefer having UTests 100% representative of real backend and add > countermeasures to protect against jenkins overload. > WDYT ? > Etienne >
Re: [DISCUSSION] UTests and embedded backends
I am of the same opinion, this is the approach we're taking for Flink as well. Various mitigations (e.g. capping the parallelism at 2 rather than the default of num cores) have helped. Several times the idea has been proposed to group unit tests together for "expensive" backends. E.g. for self-contained tests one can create a single pipeline that contains all the tests with their asserts, and then run that once to amortize the overhead (which is quite significant when you're only manipulating literally bytes of data). Only on failure would it exercise them individually (either sequentially, or via a binary search). On Mon, Jan 21, 2019 at 10:07 AM Etienne Chauchot wrote: > > Hi guys, > > Lately I have been fixing various Elasticsearch flakiness issues in the > UTests by: introducing timeouts, countdown latches, force refresh, embedded > cluster size decrease ... > > These flakiness issues are due to the embedded Elasticsearch not coping well > with the jenkins overload. Still, IMHO I believe that having embedded backend > for UTests are a lot better than mocks. Even if they are less tolerant to > load, I prefer having UTests 100% representative of real backend and add > countermeasures to protect against jenkins overload. > > WDYT ? > > Etienne > >