Stephen I agree with you that the most important thing is not to lose functionality for the integration tests, so it is important to keep at least one of the two (Cassandra or Elasticsearch) to do a real integration test for HIFIO.
Your proposal of making the IT tests for the native IOs parallelized with WriteThenRead seems excellent, we have to sync to see how we can do this, my only doubt is that the setup takes still long time (at least for the bounded sources, it will all depend on the size of the generated data). I think we are good in our discussion (at least the missing part of the tests) so please confirm me if you agree with this phrase to summarize the thread: New IOs based on HIFIO won’t be merged as tests of HIFIO or as independent IOs. However we still encourage these contributions as documentation on how to use HIFIO with different data stores, these contributions will be part of the documentation in the Beam website. On Tue, May 30, 2017 at 11:16 PM, Stephen Sisk <s...@google.com.invalid> wrote: > Ah, thanks for clarifying ismael. > > I think you would agree that we need to have integration testing of HIFIO. > Cassandra and ES are currently the only ITs for HIFIO. If we want to write > ITs for HIFIO that don't rely on ES/Cassandra with the idea that we'd > remove ES/Cassandra, I could be okay with that. The data store in question > would need to have both small & large k8s cluster scripts so that we can do > small & large integration tests (since that's what's currently supported > with HIFIO today and I don't think we should go backwards.) > > The reason I hesitate to use a data store that doesn't have a native > implementation is that we can use ES/Cassandra's native write transform to > eventually switch HIFIO ITs to the new writeThenRead style IO IT [1] that > will *drastically* simplify maintenance requirements for the HIFIO tests. > WriteThenRead writes the test data inside of the test, thus removing the > requirement for a separate data loading step outside of the step. We > *could* write inside the test setup code (thus running only on one > machine), but for larger data amounts, that takes too long - it's easier to > do the write using the IO, which runs in parallel, and thus is a lot > quicker. That means we need a data store that has a native, parallelizable > write. > > What do you think? Basically, I agree with you in principal, but given that > using a data store without a native implementation either a separate data > loading step or slower tests, I'd strongly prefer to keep using > ES/Cassandra. (you could make the case that we should remove one of them. > I'm not attached to keeping both.) > > >> having [ES/Cassandra HIFIO read-code] in the source code base would [not] > be > consistent with the ideas of the previous paragraph. > I do agree with this. If we keep the ES/Cassandra HIFIO test code, I'd > propose that we add comments in there directing people to the correct > native source. > > S > [1] writeThenRead style IO IT - > https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E > > On Tue, May 30, 2017 at 1:47 PM Ismaël Mejía <ieme...@gmail.com> wrote: > >> The whole goal of this discussion is that we define what shall we do >> when someone wants to add a new IO that uses HIFIO. The consensus so >> far following the PR comments + this thread is that it should be >> discouraged and those contribution be included as documentation in the >> website, and that we should give priority to the native >> implementations, which seems reasonable (e,g, to encourage better >> implementations and avoid the maintenance burden). >> >> So, I was wondering what would be a good rule to justify that we have >> tests for some data stores as part of the tests of HIFIO and I don't >> see a strong reason to do this, in particular once those have native >> implementations, to be more clear, in the current case we have HIFIO >> tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are >> not covered by the native IOs yet. However once the native IOs for >> both systems are merged I don't see any reason to keep the extra tests >> in HIFIO, because we will be doing a double effort to test an IO that >> is not native, and that does not support Write, so I think we should >> remove those. Also not having this in the source code base would be >> consistent with the ideas of the previous paragraph. >> >> But well maybe I am missing something here, do you see any strong >> reason to keep them. >>