[ 
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949576#comment-15949576
 ] 

Andrew Wang commented on HADOOP-13200:
--------------------------------------

Hi Kai,

Would be great to file JIRAs to handle the separate small issues noticed. I 
also spent some more time thinking about this.

High-level problem: determine what raw encoder and decoder to use for a coder. 
Our current system:

{noformat}
coder --------> rawcoder factory method ---------> factory ---------> raw 
encoder / decoder
       rawcoder                         reflection          hardcode
       factory
       class name
{noformat}

If we replace the reflection step with a registry, we can save the per-rawcoder 
factory classes:

{noformat}
coder ----------> rawcoder factory registry --------> factory ----------> raw 
encoder + decoder
       rawcoder                              lookup            hardcode
       factory
       name
{noformat}

* Raw coder factories would be identified by an additional getName() interface.
* The registry is a singleton that maps coders to a map of rawcoder factories, 
keyed by getName()
* Registry is prepopulated with the built-in factories; these can be private 
nested classes of the registry, or held in a new class.
* The list of pluggable raw coder factory classes are specified in a config 
key. We classload these at startup and trigger their static initializers, which 
register them with the registry. We could enforce namespacing of pluggable raw 
coder names to future-proof.

Since nothing in the registry is config-dependent, I think it's a safe 
singleton. Config-specific logic is handled outside the Registry or in static 
methods.

I think this might also help with implementing caching later, since it's 
centralized and avoids reflection after initialization.

Thoughts?

> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13200
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13200
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may 
> be some better approach allowing to customize and configure erasure coders 
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy 
> the relevant comments here to continue the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to