That summary phrase looks good to me, thanks for writing it up. S
On Thu, Jun 1, 2017 at 3:08 AM Ismaël Mejía <[email protected]> wrote: > Stephen I agree with you that the most important thing is not to lose > functionality for the integration tests, so it is important to keep at > least one of the two (Cassandra or Elasticsearch) to do a real > integration test for HIFIO. > > Your proposal of making the IT tests for the native IOs parallelized > with WriteThenRead seems excellent, we have to sync to see how we can > do this, my only doubt is that the setup takes still long time (at > least for the bounded sources, it will all depend on the size of the > generated data). > > I think we are good in our discussion (at least the missing part of > the tests) so please confirm me if you agree with this phrase to > summarize the thread: > > New IOs based on HIFIO won’t be merged as tests of HIFIO or as > independent IOs. However we still encourage these contributions as > documentation on how to use HIFIO with different data stores, these > contributions will be part of the documentation in the Beam website. > > > On Tue, May 30, 2017 at 11:16 PM, Stephen Sisk <[email protected]> > wrote: > > Ah, thanks for clarifying ismael. > > > > I think you would agree that we need to have integration testing of > HIFIO. > > Cassandra and ES are currently the only ITs for HIFIO. If we want to > write > > ITs for HIFIO that don't rely on ES/Cassandra with the idea that we'd > > remove ES/Cassandra, I could be okay with that. The data store in > question > > would need to have both small & large k8s cluster scripts so that we can > do > > small & large integration tests (since that's what's currently supported > > with HIFIO today and I don't think we should go backwards.) > > > > The reason I hesitate to use a data store that doesn't have a native > > implementation is that we can use ES/Cassandra's native write transform > to > > eventually switch HIFIO ITs to the new writeThenRead style IO IT [1] that > > will *drastically* simplify maintenance requirements for the HIFIO tests. > > WriteThenRead writes the test data inside of the test, thus removing the > > requirement for a separate data loading step outside of the step. We > > *could* write inside the test setup code (thus running only on one > > machine), but for larger data amounts, that takes too long - it's easier > to > > do the write using the IO, which runs in parallel, and thus is a lot > > quicker. That means we need a data store that has a native, > parallelizable > > write. > > > > What do you think? Basically, I agree with you in principal, but given > that > > using a data store without a native implementation either a separate data > > loading step or slower tests, I'd strongly prefer to keep using > > ES/Cassandra. (you could make the case that we should remove one of them. > > I'm not attached to keeping both.) > > > > > >> having [ES/Cassandra HIFIO read-code] in the source code base would > [not] > > be > > consistent with the ideas of the previous paragraph. > > I do agree with this. If we keep the ES/Cassandra HIFIO test code, I'd > > propose that we add comments in there directing people to the correct > > native source. > > > > S > > [1] writeThenRead style IO IT - > > > https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E > > > > On Tue, May 30, 2017 at 1:47 PM Ismaël Mejía <[email protected]> wrote: > > > >> The whole goal of this discussion is that we define what shall we do > >> when someone wants to add a new IO that uses HIFIO. The consensus so > >> far following the PR comments + this thread is that it should be > >> discouraged and those contribution be included as documentation in the > >> website, and that we should give priority to the native > >> implementations, which seems reasonable (e,g, to encourage better > >> implementations and avoid the maintenance burden). > >> > >> So, I was wondering what would be a good rule to justify that we have > >> tests for some data stores as part of the tests of HIFIO and I don't > >> see a strong reason to do this, in particular once those have native > >> implementations, to be more clear, in the current case we have HIFIO > >> tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are > >> not covered by the native IOs yet. However once the native IOs for > >> both systems are merged I don't see any reason to keep the extra tests > >> in HIFIO, because we will be doing a double effort to test an IO that > >> is not native, and that does not support Write, so I think we should > >> remove those. Also not having this in the source code base would be > >> consistent with the ideas of the previous paragraph. > >> > >> But well maybe I am missing something here, do you see any strong > >> reason to keep them. > >> >
