Re: [DISCUSS] Format plugins in contrib module

2019-02-06 Thread Paul Rogers
+1

Moving forward, we'd like to evolve the format plugin API to use the new scan 
framework based on the result set loader. Doing so will abstract away all the 
vector-twiddling headaches that several people have had fun with over the last 
couple of years. The framework will enable integration with the schema work 
that the team is doing.

I would like to point out one additional consideration: unit testing. We should 
make sure that code in the contrib module has visibility into all the test 
frameworks and test resources available in the exec module. This was not true a 
year ago or so. When I tried to create a simple reader in contrib, I could not 
use the exec test resources, which forced me to move the code into exec. I 
believe Tim may have fixed all this (thanks, Tim!), but let's double-check.

Thanks,
- Paul

 

On Wednesday, February 6, 2019, 4:09:11 AM PST, Arina Yelchiyeva 
 wrote:  
 
 Created Jira - https://issues.apache.org/jira/browse/DRILL-7030

Kind regards,
Arina

On Tue, Feb 5, 2019 at 4:58 PM Vitalii Diravka  wrote:

> Absolutely agree with Arina.
>
> I think the core Format Plugins for Parquet, Json and CSV, TSV, PSV files
> (which are used for creating Drill tables) can be left in current config
> file
> and the rest ones should be factored out to the separate config files along
> with creating separate modules in Drill *contrib *module.
>
> Therefore the process of creating the new plugins will be more transparent.
>
> Kind regards
> Vitalii
>
>
> On Tue, Feb 5, 2019 at 3:12 PM Charles Givre  wrote:
>
> > I’d concur with Arina’s suggestion.  I do think this would be useful and
> > make it easier to make plugins “pluggable”.
> > In the meantime, should we recommend that developers of format-plugins
> > include their plugins in the bootstrap-storage-plugins.json?  I was
> > thinking also that we might want to have some guidelines for unit tests
> for
> > format plugins.  I’m doing some work on the HTTPD format plugin and found
> > some issues which cause it to throw NPEs.
> > — C
> >
> >
> > > On Feb 5, 2019, at 06:40, Arina Yelchiyeva  >
> > wrote:
> > >
> > > Hi all,
> > >
> > > Before we were adding new formats / plugins into the exec module.
> > Eventually we came up to the point that exec package size is growing and
> > adding plugin and format contributions is better to separate out in the
> > different module.
> > > Now we have contrib module where we add such contributions. Plugins are
> > pluggable, there are added automatically by means of having
> > drill-module.conf file which points to the scanning packages.
> > > Format plugins are using the same approach, the only problem is that
> > they are not added into bootstrap-storage-plugins.json. So when adding
> new
> > format plugin, in order for it to automatically appear in Drill Web UI,
> > developer has to update bootstrap file which is in the exec module.
> > > My suggestion we implement some functionality that would merge format
> > config with the bootstrap one. For example, each plugin would have to
> have
> > bootstrap-format.json file with the information to which plugin format
> > should be added (structure the same as in
> bootstrap-storage-plugins.json):
> > > Example:
> > >
> > > {
> > >  "storage":{
> > >    dfs: {
> > >      formats: {
> > >        "psv" : {
> > >          type: "msgpack",
> > >          extensions: [ "mp" ]
> > >        }
> > >      }
> > >    }
> > >  }
> > > }
> > >
> > > Then during Drill start up such bootstrap-format.json files will be
> > merged with bootstrap-storage-plugins.json.
> > >
> > >
> > > Current open PR for adding new format plugins:
> > > Format plugin for LTSV files -
> https://github.com/apache/drill/pull/1627
> > > SYSLOG (RFC-5424) Format Plugin -
> > https://github.com/apache/drill/pull/1530
> > > Msgpack format reader - https://github.com/apache/drill/pull/1500
> > >
> > > Any suggestions?
> > >
> > > Kind regards,
> > > Arina
> >
> >
>  

Re: [DISCUSS] Format plugins in contrib module

2019-02-06 Thread Arina Yelchiyeva
Created Jira - https://issues.apache.org/jira/browse/DRILL-7030

Kind regards,
Arina

On Tue, Feb 5, 2019 at 4:58 PM Vitalii Diravka  wrote:

> Absolutely agree with Arina.
>
> I think the core Format Plugins for Parquet, Json and CSV, TSV, PSV files
> (which are used for creating Drill tables) can be left in current config
> file
> and the rest ones should be factored out to the separate config files along
> with creating separate modules in Drill *contrib *module.
>
> Therefore the process of creating the new plugins will be more transparent.
>
> Kind regards
> Vitalii
>
>
> On Tue, Feb 5, 2019 at 3:12 PM Charles Givre  wrote:
>
> > I’d concur with Arina’s suggestion.  I do think this would be useful and
> > make it easier to make plugins “pluggable”.
> > In the meantime, should we recommend that developers of format-plugins
> > include their plugins in the bootstrap-storage-plugins.json?  I was
> > thinking also that we might want to have some guidelines for unit tests
> for
> > format plugins.  I’m doing some work on the HTTPD format plugin and found
> > some issues which cause it to throw NPEs.
> > — C
> >
> >
> > > On Feb 5, 2019, at 06:40, Arina Yelchiyeva  >
> > wrote:
> > >
> > > Hi all,
> > >
> > > Before we were adding new formats / plugins into the exec module.
> > Eventually we came up to the point that exec package size is growing and
> > adding plugin and format contributions is better to separate out in the
> > different module.
> > > Now we have contrib module where we add such contributions. Plugins are
> > pluggable, there are added automatically by means of having
> > drill-module.conf file which points to the scanning packages.
> > > Format plugins are using the same approach, the only problem is that
> > they are not added into bootstrap-storage-plugins.json. So when adding
> new
> > format plugin, in order for it to automatically appear in Drill Web UI,
> > developer has to update bootstrap file which is in the exec module.
> > > My suggestion we implement some functionality that would merge format
> > config with the bootstrap one. For example, each plugin would have to
> have
> > bootstrap-format.json file with the information to which plugin format
> > should be added (structure the same as in
> bootstrap-storage-plugins.json):
> > > Example:
> > >
> > > {
> > >  "storage":{
> > >dfs: {
> > >  formats: {
> > >"psv" : {
> > >  type: "msgpack",
> > >  extensions: [ "mp" ]
> > >}
> > >  }
> > >}
> > >  }
> > > }
> > >
> > > Then during Drill start up such bootstrap-format.json files will be
> > merged with bootstrap-storage-plugins.json.
> > >
> > >
> > > Current open PR for adding new format plugins:
> > > Format plugin for LTSV files -
> https://github.com/apache/drill/pull/1627
> > > SYSLOG (RFC-5424) Format Plugin -
> > https://github.com/apache/drill/pull/1530
> > > Msgpack format reader - https://github.com/apache/drill/pull/1500
> > >
> > > Any suggestions?
> > >
> > > Kind regards,
> > > Arina
> >
> >
>


Re: [DISCUSS] Format plugins in contrib module

2019-02-05 Thread Vitalii Diravka
Absolutely agree with Arina.

I think the core Format Plugins for Parquet, Json and CSV, TSV, PSV files
(which are used for creating Drill tables) can be left in current config
file
and the rest ones should be factored out to the separate config files along
with creating separate modules in Drill *contrib *module.

Therefore the process of creating the new plugins will be more transparent.

Kind regards
Vitalii


On Tue, Feb 5, 2019 at 3:12 PM Charles Givre  wrote:

> I’d concur with Arina’s suggestion.  I do think this would be useful and
> make it easier to make plugins “pluggable”.
> In the meantime, should we recommend that developers of format-plugins
> include their plugins in the bootstrap-storage-plugins.json?  I was
> thinking also that we might want to have some guidelines for unit tests for
> format plugins.  I’m doing some work on the HTTPD format plugin and found
> some issues which cause it to throw NPEs.
> — C
>
>
> > On Feb 5, 2019, at 06:40, Arina Yelchiyeva 
> wrote:
> >
> > Hi all,
> >
> > Before we were adding new formats / plugins into the exec module.
> Eventually we came up to the point that exec package size is growing and
> adding plugin and format contributions is better to separate out in the
> different module.
> > Now we have contrib module where we add such contributions. Plugins are
> pluggable, there are added automatically by means of having
> drill-module.conf file which points to the scanning packages.
> > Format plugins are using the same approach, the only problem is that
> they are not added into bootstrap-storage-plugins.json. So when adding new
> format plugin, in order for it to automatically appear in Drill Web UI,
> developer has to update bootstrap file which is in the exec module.
> > My suggestion we implement some functionality that would merge format
> config with the bootstrap one. For example, each plugin would have to have
> bootstrap-format.json file with the information to which plugin format
> should be added (structure the same as in bootstrap-storage-plugins.json):
> > Example:
> >
> > {
> >  "storage":{
> >dfs: {
> >  formats: {
> >"psv" : {
> >  type: "msgpack",
> >  extensions: [ "mp" ]
> >}
> >  }
> >}
> >  }
> > }
> >
> > Then during Drill start up such bootstrap-format.json files will be
> merged with bootstrap-storage-plugins.json.
> >
> >
> > Current open PR for adding new format plugins:
> > Format plugin for LTSV files - https://github.com/apache/drill/pull/1627
> > SYSLOG (RFC-5424) Format Plugin -
> https://github.com/apache/drill/pull/1530
> > Msgpack format reader - https://github.com/apache/drill/pull/1500
> >
> > Any suggestions?
> >
> > Kind regards,
> > Arina
>
>


Re: [DISCUSS] Format plugins in contrib module

2019-02-05 Thread Charles Givre
I’d concur with Arina’s suggestion.  I do think this would be useful and make 
it easier to make plugins “pluggable”.  
In the meantime, should we recommend that developers of format-plugins include 
their plugins in the bootstrap-storage-plugins.json?  I was thinking also that 
we might want to have some guidelines for unit tests for format plugins.  I’m 
doing some work on the HTTPD format plugin and found some issues which cause it 
to throw NPEs.  
— C


> On Feb 5, 2019, at 06:40, Arina Yelchiyeva  wrote:
> 
> Hi all,
> 
> Before we were adding new formats / plugins into the exec module. Eventually 
> we came up to the point that exec package size is growing and adding plugin 
> and format contributions is better to separate out in the different module.
> Now we have contrib module where we add such contributions. Plugins are 
> pluggable, there are added automatically by means of having drill-module.conf 
> file which points to the scanning packages.
> Format plugins are using the same approach, the only problem is that they are 
> not added into bootstrap-storage-plugins.json. So when adding new format 
> plugin, in order for it to automatically appear in Drill Web UI, developer 
> has to update bootstrap file which is in the exec module.
> My suggestion we implement some functionality that would merge format config 
> with the bootstrap one. For example, each plugin would have to have 
> bootstrap-format.json file with the information to which plugin format should 
> be added (structure the same as in bootstrap-storage-plugins.json):
> Example:
> 
> {
>  "storage":{
>dfs: {
>  formats: {
>"psv" : {
>  type: "msgpack",
>  extensions: [ "mp" ]
>}
>  }
>}
>  }
> }
> 
> Then during Drill start up such bootstrap-format.json files will be merged 
> with bootstrap-storage-plugins.json.
> 
> 
> Current open PR for adding new format plugins:
> Format plugin for LTSV files - https://github.com/apache/drill/pull/1627
> SYSLOG (RFC-5424) Format Plugin - https://github.com/apache/drill/pull/1530 
> Msgpack format reader - https://github.com/apache/drill/pull/1500
> 
> Any suggestions?
> 
> Kind regards,
> Arina



[DISCUSS] Format plugins in contrib module

2019-02-05 Thread Arina Yelchiyeva
Hi all,

Before we were adding new formats / plugins into the exec module. Eventually we 
came up to the point that exec package size is growing and adding plugin and 
format contributions is better to separate out in the different module.
Now we have contrib module where we add such contributions. Plugins are 
pluggable, there are added automatically by means of having drill-module.conf 
file which points to the scanning packages.
Format plugins are using the same approach, the only problem is that they are 
not added into bootstrap-storage-plugins.json. So when adding new format 
plugin, in order for it to automatically appear in Drill Web UI, developer has 
to update bootstrap file which is in the exec module.
My suggestion we implement some functionality that would merge format config 
with the bootstrap one. For example, each plugin would have to have 
bootstrap-format.json file with the information to which plugin format should 
be added (structure the same as in bootstrap-storage-plugins.json):
Example:

{
  "storage":{
dfs: {
  formats: {
"psv" : {
  type: "msgpack",
  extensions: [ "mp" ]
}
  }
}
  }
}

Then during Drill start up such bootstrap-format.json files will be merged with 
bootstrap-storage-plugins.json.


Current open PR for adding new format plugins:
Format plugin for LTSV files - https://github.com/apache/drill/pull/1627
SYSLOG (RFC-5424) Format Plugin - https://github.com/apache/drill/pull/1530 
Msgpack format reader - https://github.com/apache/drill/pull/1500

Any suggestions?

Kind regards,
Arina