Hi,

Again, I think the whole plugin concept falls outside of Arrow.

It should be much simpler to simply allow people to override the
compression codec factory.  Then applications can define "plugins" if
they want to.

Regards

Antoine.


Le 19/10/2020 à 03:30, Xie, Qi a écrit :
> Hi, all
> 
> Again as we discussed in the previous email, We are proposing an pluggable 
> APIs to support user customized compression codec in ARROW.
> See proposal 
> https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit
> We want to redefine the scope of the pluggable API and have a discuss with 
> the community.
> 
> 1. Goal
> Through the plugin API, the end user can use the customized compression codec 
> to override the built-in compression codec. E.g. use the HW-GZIP codec to 
> replace the ARROW built-in GZIP codec to speed up the compress/decompress.
> It is not plan to add new compression codecs for Arrow.
> Currently we are focused on parquet format. In the future will support Arrow 
> format. But some components should be common to the Arrow, such as plugin 
> manager module, dynamic library loading module etc.
> 
> 2. Compatibility with the Java implementation
> Both implementations will write the plugin information to the parquet key 
> value metadata, either in parquet FileMetaData level or in the ColumnMetaData 
> level.
> The plugin information include the plugin library name used for native 
> parquet and plugin class name used for java parquet.
> E.g. plugin_library_name:libgzipplugin.so, 
> plugin_class_name:com.intel.icl.customizedGzipCodec
> we're working in progress together with Parquet community to refine our 
> proposal. https://www.mail-archive.com/dev@parquet.apache.org/msg12463.html
> 
> 3. The end user API.
> For write, the end-user should callout they want to use plugin codec, so we 
> add a compression_plugin API in parquet WriteProperties builder, when call 
> this function, the internal parquet writer will write the plugin_library_name 
> and plugin_class_name to the parquet key value metadata. The end user code 
> snippet like this:
> parquet::WriterProperties::Builder builder;
> builder.compression(parquet::Compression::GZIP);
> builder.compression_plugin("libGzipPlugin.so");
> std::shared_ptr<parquet::WriterProperties> props = builder.build();
> 
> 
> 
> For read, the internal parquet reader will first check if there are plugin 
> information in the metadata. For native parquet, it will read 
> plugin_library_name from the key value metadata, if the key exist, it will 
> load the plugin library automatically and  return the plugin codec from 
> GetReadCodec.
> 
> So no code change for read, it is transparent for end-user in parquet read 
> side.
> 
> 
> 
> Looking forward to any other suggestions or feedback.
> 
> Thanks,
> XieQi
> 
> 

Reply via email to