See responses inline... > On Jan 3, 2020, at 9:18 PM, Paul Rogers <[email protected]> wrote: > > Hi All, > > Perhaps we can wrap up the HDF5 and REST plugin PRs in their current form. > Making small improvements seems to work better than trying to do too much in > any one PR.
Thanks! > > Over time, we can use the V2 JSON reader in the REST plugin, but we should do > so step-by-step after making sure the basics work. Since Vitalii graciously > ran all the tests with the V2 reader enabled, I've got a list of things to > fix. The REST plugin PR should not be blocked on that work. That was my plan. The current implementation basically extends the V1 reader and made a few mods to it so that it can accept an InputStream. > > The REST plugin is one that Charles picked up from a previous contributor (I > believe.) It uses the same copy-and-paste model as our other plugins. Most of > the filter push-down implementations even have the same bugs. The API plugin was based on a few plugins I found on github. I did a lot of work to get it to work with Drill 1.17. I'd say it's about 50% old code 50% completely new. I also completely redid the configuration and added some new functionality. > > Over time, we should fix these, but again we can do it step-by-step. Let's > commit what we have, then we Charles and I can work together to convert the > REST plugin to use the new "Base" framework. Doing so will probably reveal > changes I need to make to that framework. I removed all the filter pushdown code from the API plugin. My thought was that after you explained how difficult it is to get right AND that you were working on the "Base" framework, it wasn't worth it for me to try to get it to work, only to have the Base framework available in a few months. Also, I just wanted to get the plugin to work. My plan is to get the plugin committed and tested and in the next iteration, add the filter pushdown when the Base framework is ready. > > In short, I'd suggest that we wrap up the in-flight PRs, then plot out the > next round of improvements that builds on this set. > > All that said, I agree with Arina in another e-mail chain: we should enforce > high standards on whatever code we do commit. So we do need to get both > plugins to a high level of quality. I'm hoping we can do that with the > existing code. I completely agree that we should enforce high standards for testing our code. My original question was more relating to the specifics of testing storage plugins. In looking through the plugins that are already there, there was a lot of variation from the Kudu plugin (woefully inadequate), to HBase. I was wondering about what is the best way (IE Docker container, or some other method) to test plugins to external systems. Also, is it preferable to write tests for each of the classes, or is it acceptable to write a suite of tests for each class (IE GroupScan, SubScan etc). > > > I'll make a point to review the PRs in question to help move them along. > > Thanks, > - Paul > > > > On Friday, January 3, 2020, 9:10:18 AM PST, Charles Givre > <[email protected]> wrote: > > Hi Arina, > Thanks for getting back with me. We have 2 separate PRs... > > API Storage: > 1. From the conversation, Paul seemed to be ok with going ahead without > waiting for those PRs. (@Paul, please correct me if I'm wrong, but that was > how I read your comment) > 2. From what I've seen, the V2 JSON reader isn't going to be 100% ready, > even when the PR is committed, and there will be issues such as the complex > data type handling that aren't going to be fully ready. > 3. To get the API plugin to work, I had to make a minor mod to the V1 reader > to enable it to accept InputStreams as input not just files. The V2 reader > does not support this yet either. > 4. I removed (for the time being) the filter pushdown which was the major > advantage of the Base Storage PR. My with this plugin is to get an MVP > committed. If there is interest (and need) I'll add filter pushdown (using > Paul's Base Storage PR) as well as OAUTH2 authentication and other features > as needed. There's been a lot of interest in this at my firm, so I want to > get it committed so we can start using it. > 5. I'm fine with committing to updating this to the V2 reader once it is > 100% ready. > > > HDF5 Format Plugin: > 1. I believe we were just waiting on Drill 1.17 to be released for this. > 2. I've addressed the review comments (and thank you for taking the time to > review this!) I know it is a very lengthy and complicated plugin and I > really appreciate the effort. If I should ever be in Kiev, I'd definitely > buy you a beer or beverage or two of your choice ;-) > 3. The HDF5 library (which this plugin uses), how shall we say, is not the > most usable. Some of the code is less than ideal because I had to work around > the HDF5 library. For example, the HDF5 library does not currently accept > InputStreams as input. You have to pass it a File object. However, to make > this work in Drill, I had to copy the InputStream from Drill to a file in the > temp directory and then open that. I've already spoken with the team that > maintains the HDF5 libraries and they are working on adding InputStream > support, and if and when they make that available, I will update the Drill > plugin accordingly. > > Thanks! > -- C > > > >> On Jan 3, 2020, at 4:48 AM, Arina Yelchiyeva <[email protected]> >> wrote: >> >> I though we agreed in the PR that API storage plugin won’t be submitted >> until #1914 and #1913 are committed. >> Since it would required API plugin code rewrite. I think there is no need >> spending time reviewing it until final changes are done. >> >> Kind regards, >> Arina >> >>> On Jan 2, 2020, at 7:08 PM, Charles Givre <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hello all, >>> Now that we've released Drill 1.17, I wanted to ask if you could please >>> take a look at the HDF5 format plugin >>> (https://github.com/apache/drill/pull/1778 >>> <https://github.com/apache/drill/pull/1778>) I submitted as well as the >>> API storage plugin (https://github.com/apache/drill/pull/1892 >>> <https://github.com/apache/drill/pull/1892>). >>> Thanks, >>> -- C >>> >>> >>> >>
