Re: How to implement AbstractRecordWriter

Paul Rogers Fri, 31 May 2019 12:19:35 -0700

Hi Nicolas,

To address your last issue about the wide variety of ways we have to write 
tests... Yes, you are right that there is a wonderful variety of techniques 
that evolved over the life of the project. Unlike Spark, we do not enjoy an 
over-abundance of contributors, so we've pretty much left the old-style tests 
(and code) unchanged, just added newer ones on top. This also explains why 
"plug-ins" are not actually pluggable. Ugly, yes, but the best the team can do 
with limited resources. (We're always looking for volunteers!)



For the MapR DB tests, one approach is to just do whatever your predecessors 
did on the read side.

On the other hand, the path of least resistance is to follow the patterns in 
the CSV tests (and in ExampleTest) to use the newer frameworks for setup, the 
newer tools for running queries and capturing results, and the Row Set 
framework to verify the results. I find I can whip out a unit test in just a 
few minutes using these newer tools.

Also, please do contribute improvements where you can, or at least file JIRA 
tickets with your suggestions so that the project benefits from your experience 
learning how to contribute to Drill.

Thanks,
- Paul

 

    On Friday, May 31, 2019, 10:10:14 AM PDT, Nicolas A Perez 
<anicola...@gmail.com> wrote:  
 
 One of the issues I have is that I haven’t found a way to debug my tests
from intelliJ. It continues to say that some constructs from other modules
are missing.

Also, I haven’t  found *simple* examples of how to write *simple* tests.
Every time i look at the existing code, the tests are done in a different
way.

Now, on the other hand, pluggings should be independent from drill core
modules. If you think about, i can easily write a library that can be
injected into Spark without touching Spark code. For instance, the
DataSource API will load the required parts from my code at run time. Drill
does the same, but the problem is the coupling between drill and it’s
extension points.

On the tests side, you have another problem, you cannot easily tests your
new modules unless they are within drill core code. Maybe it is time to
decoupling the test framework from drill itself, too.

On Fri, May 31, 2019 at 18:38 Paul Rogers <par0...@yahoo.com.invalid> wrote:

> Hi Nicolas,
>
> Charles outlined the choices quite well.
>
> Let's talk about your observation that you find it annoying to deal with
> the full Drill code. There may be some tricks here that can help you.
>
> As you know, I've been revising the text reader and the "EVF" (row set
> framework). Doing so requires a series of pull requests. To move fast, I've
> found the following workflow to be helpful:
>
> * Use a machine with an SSD. A Mac is ideal. A Linux desktop also works
> (mine uses Linux Mint.) The SSD makes rebuilds very fast.
>
> * Use unit tests for all your testing. For example, I created dozens of
> unit tests for CSV files to exercise the text reader, and many more to
> exercise the EVF. All development and testing consists of adding/changing
> code, adding/changing tests, and stepping through the unit test and
> underlying code to find bugs.
>
> * Use JUnit categories to run selected unit tests as a group.
>
> In most cases, you let your IDE do the build; you don't need Maven nor do
> you need to build jar files. Edit a file, run a unit test from your IDE and
> step through code. My edit/compile/debug cycle tends to be seconds.
>
> If, however, you find yourself using Maven to build Drill, then are
> running unit tests from Maven, and attaching a debugger, then your
> edit/compile/debug cycle will be 5+ minutes, which is going to be
> irritating.
>
> If you are doing a full build so you can use SqlLine to test, then this
> suggests it is time to write a unit test case for that issue so you can run
> it from the IDE. Using the RowSet stuff makes such tests easy. See
> TestCsvWithHeaders [1] for some examples.
>
> If you run from the IDE, and find things don't work then perhaps there is
> a config issue. Do we have code that looks for a file in
> $DRILL_HOME/whatever rather than using the class path? Is a required native
> library not on the LD_LIBRARY_PATH for the IDE?
>
> Most unit tests are designed to be stateless. They read a file stored in
> resources, or they write a test file, read the file, and discard the file
> when done.
>
> You are using MapRDB to insert data, which, of course, is stateful. So,
> perhaps your test can put the DB into a known start state, insert some
> records, read those records, compare them with the expected results, and
> clean up the state so you are ready for the next test run. Your target is
> that edit/compile/debug cycle of a few seconds.
>
>
> Overall, if you can master the art of running Drill, using unit tests, in
> your IDE, you can move forward very quickly.
>
> Use Maven builds, and run tests via Maven, only when getting ready to
> submit a PR. If you change, say, only the contrib module, you only need
> build and test that module. If you also change exec, say, then you can just
> build those two modules.
>
> To use categories, tag your tests as follows:
>
> @Category(RowSetTests.class) class MyTest ...
>
> (I'll send the Maven command line separately; I'm not on that machine at
> the moment.)
>
>
> Thanks much to the team members who helped make this happen. I've since
> worked on other projects that don't have this power and it is truly a
> grueling experience to wait for long builds and deploys after ever change.
>
>
> Thanks,
> - Paul
>
> [1]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvWithHeaders.java
>
>
>
>
>
>    On Friday, May 31, 2019, 5:17:40 AM PDT, Charles Givre <
> cgi...@gmail.com> wrote:
>
>  Hi Nicolas,
>
> You have two options:
> 1.  You can develop format plugins and UDFs in Drill by adding them to the
> contrib/ folder and then test them with unit tests.  Take a look at this PR
> as an example[1].  If you're intending to submit your work to Drill for
> inclusion, this would be my recommendation as you can write the unit tests
> as you go, and it doesn't take very long to build and you can debug.
> 2.  Alternatively, you can package the code separately as shown here[2].
> However, this option requires you to build it, then copy the jars over to
> DRILL_HOME/jars/3rd_party along with any dependencies, then run Drill.  I'm
> not sure how you could write unit tests this way.
>
> I hope this helps.
>
>
> [1]: https://github.com/apache/drill/pull/1749
> [2]: https://github.com/cgivre/drill-excel-plugin
>
>
> > On May 31, 2019, at 8:06 AM, Nicolas A Perez <anicola...@gmail.com>
> wrote:
> >
> > Paul,
> >
> > Is it possible to develop my plugin outside of the drill code, let's say
> in
> > my own repository and then package it and add it to the location where
> the
> > plugins live? Does that work, too? I just find annoying to deal with the
> > full drill code in order to develop a plugin. At the same time, I might
> > want to detach the development of plugins from the drill life cycle
> itself.
> >
> > Please advise.
> >
> > Best Regards,
> >
> > Nicolas A Perez
> >
> > On Thu, May 30, 2019 at 9:58 PM Paul Rogers <par0...@yahoo.com.invalid>
> > wrote:
> >
> >> Hi Nicolas,
> >>
> >> A quick check of the code suggests that AbstractWriter is a
> >> Json-serialized description of the physical plan. It represents the
> >> information sent from the planner to the execution engine, and is
> >> interpreted by the scan operator. That is, it is the "physical plan."
> >>
> >> The question is, how does the execution engine translate create the
> actual
> >> writer based on the physical plan? The only good example seems to be for
> >> the FileSystemPlugin. That particular storage plugin is complicated by
> the
> >> additional layer of the format plugins.
> >>
> >> There is a bit of magic here. Briefly, Drill uses a BatchCreator to
> create
> >> your writer. It does so via some Java introspection magic. Drill looks
> for
> >> all subclases of BatchCreator, the uses the type of the second argument
> to
> >> the getBatch() method to find the correct class. This may mean that you
> >> need to create one with MapRDBFormatPluginConfig as the type of the
> second
> >> argument.
> >>
> >> The getBatch() method then creates the CloseableRecordBatch
> >> implementation. This is a full Drill operator, meaning it must handle
> the
> >> Volcano iterator protocol. Looks like you can perhaps use
> WriterRecordBatch
> >> as the writer operator itself. (See EasyWriterBatchCreator and follow
> the
> >> code to understand the plumbing.)
> >>
> >> You create a RecordWriter to do the actual work. AFAIK, MapRDB supports
> >> JSON data model (at least in some form). If this is the version you are
> >> working on, the fastest development path might just be to copy the
> >> JsonRecordWriter, and replace the writes to JSON with writes to MapRDB.
> At
> >> least this gives you a place to start looking.
> >>
> >>
> >> A more general solution would be to build the writer using some of the
> >> recent additions to Drill such as the row set mechanisms for reading a
> >> record batch. But, since copying the JSON approach provides a quick &
> dirty
> >> solution, perhaps that is good enough for this particular use case.
> >>
> >>
> >> In our book, we recommend building each step one-by-one and doing a
> quick
> >> test to verify that each step works as you expect. If you create your
> >> BatchCreator, but not the writer, things won't actually work, but you
> can
> >> set a breakpoint in the getBatch() method to verify the Drill did find
> your
> >> class. And so on.
> >>
> >>
> >> Thanks,
> >> - Paul
> >>
> >>
> >>
> >>    On Thursday, May 30, 2019, 3:05:39 AM PDT, Nicolas A Perez <
> >> anicola...@gmail.com> wrote:
> >>
> >> Can anyone give me an overview of how to implement AbstractRecordWriter?
> >>
> >> What are the mechanics it follows, what should I do and so on? It will
> very
> >> helpful.
> >>
> >> Best Regards,
> >>
> >> Nicolas A Perez
> >> --
> >>
> >>
> --------------------------------------------------------------------------------------------
> >> Sent by Nicolas A Perez from my GMAIL account.
> >>
> >>
> --------------------------------------------------------------------------------------------
> >>
> >
> >
> >
> > --
> >
> --------------------------------------------------------------------------------------------
> > Sent by Nicolas A Perez from my GMAIL account.
> >
> --------------------------------------------------------------------------------------------
>

-- 
Nicolas A Perez from GMAIL MOBILE

Re: How to implement AbstractRecordWriter

Reply via email to