That summary phrase looks good to me, thanks for writing it up.

S

On Thu, Jun 1, 2017 at 3:08 AM Ismaël Mejía <[email protected]> wrote:

> Stephen I agree with you that the most important thing is not to lose
> functionality for the integration tests, so it is important to keep at
> least one of the two (Cassandra or Elasticsearch) to do a real
> integration test for HIFIO.
>
> Your proposal of making the IT tests for the native IOs parallelized
> with WriteThenRead seems excellent, we have to sync to see how we can
> do this, my only doubt is that the setup takes still long time (at
> least for the bounded sources, it will all depend on the size of the
> generated data).
>
> I think we are good in our discussion (at least the missing part of
> the tests) so please confirm me if you agree with this phrase to
> summarize the thread:
>
> New IOs based on HIFIO won’t be merged as tests of HIFIO or as
> independent IOs. However we still encourage these contributions as
> documentation on how to use HIFIO with different data stores, these
> contributions will be part of the documentation in the Beam website.
>
>
> On Tue, May 30, 2017 at 11:16 PM, Stephen Sisk <[email protected]>
> wrote:
> > Ah, thanks for clarifying ismael.
> >
> > I think you would agree that we need to have integration testing of
> HIFIO.
> > Cassandra and ES are currently the only ITs for HIFIO. If we want to
> write
> > ITs for HIFIO that don't rely on ES/Cassandra with the idea that we'd
> > remove ES/Cassandra, I could be okay with that. The data store in
> question
> > would need to have both small & large k8s cluster scripts so that we can
> do
> > small & large integration tests (since that's what's currently supported
> > with HIFIO today and I don't think we should go backwards.)
> >
> > The reason I hesitate to use a data store that doesn't have a native
> > implementation is that we can use ES/Cassandra's native write transform
> to
> > eventually switch HIFIO ITs to the new writeThenRead style IO IT [1] that
> > will *drastically* simplify maintenance requirements for the HIFIO tests.
> > WriteThenRead writes the test data inside of the test, thus removing the
> > requirement for a separate data loading step outside of the step. We
> > *could* write inside the test setup code (thus running only on one
> > machine), but for larger data amounts, that takes too long - it's easier
> to
> > do the write using the IO, which runs in parallel, and thus is a lot
> > quicker. That means we need a data store that has a native,
> parallelizable
> > write.
> >
> > What do you think? Basically, I agree with you in principal, but given
> that
> > using a data store without a native implementation either a separate data
> > loading step or slower tests, I'd strongly prefer to keep using
> > ES/Cassandra. (you could make the case that we should remove one of them.
> > I'm not attached to keeping both.)
> >
> >
> >> having [ES/Cassandra HIFIO read-code] in the source code base would
> [not]
> > be
> > consistent with the ideas of the previous paragraph.
> > I do agree with this. If we keep the ES/Cassandra HIFIO test code, I'd
> > propose that we add comments in there directing people to the correct
> > native source.
> >
> > S
> > [1] writeThenRead style IO IT -
> >
> https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E
> >
> > On Tue, May 30, 2017 at 1:47 PM Ismaël Mejía <[email protected]> wrote:
> >
> >> The whole goal of this discussion is that we define what shall we do
> >> when someone wants to add a new IO that uses HIFIO. The consensus so
> >> far following the PR comments + this thread is that it should be
> >> discouraged and those contribution be included as documentation in the
> >> website, and that we should give priority to the native
> >> implementations, which seems reasonable (e,g, to encourage better
> >> implementations and avoid the maintenance burden).
> >>
> >> So, I was wondering what would be a good rule to justify that we have
> >> tests for some data stores as part of the tests of HIFIO and I don't
> >> see a strong reason to do this, in particular once those have native
> >> implementations, to be more clear, in the current case we have HIFIO
> >> tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are
> >> not covered by the native IOs yet. However once the native IOs for
> >> both systems are merged I don't see any reason to keep the extra tests
> >> in HIFIO, because we will be doing a double effort to test an IO that
> >> is not native, and that does not support Write, so I think we should
> >> remove those. Also not having this in the source code base would be
> >> consistent with the ideas of the previous paragraph.
> >>
> >> But well maybe I am missing something here, do you see any strong
> >> reason to keep them.
> >>
>

Reply via email to