[ 
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946376#comment-15946376
 ] 

Kai Zheng commented on HADOOP-13200:
------------------------------------

Thanks Andrew for the nice thoughts.

bq. I saw in a comment that the raw coders are stateless ...
Sorry I think this should be clarified a little bit and it's partly true: 1) 
When I said it's stateless, I meant given a coder instance, the encode/decode 
call is stateless, that is to say, for a certain group of data to 
encode/decode, we can have concurrent threads to call encode/decode so to speed 
up. 2) The coder instance has states, specific to schema, configuration (and 
erasure parameters for decoder).

The biggest concern to merge encoder and decoder together is, the states can be 
mixed and incur some runtime overhead. For example, both encoder and decoder 
can maintain big coding matrix in core cache; when merged, the memory 
consumption can be double (yes it can be avoided by different code path but 
then complicated). In HDFS side, a coder instance just serves one role, either 
encode or decode, not both the same time.

bq. Maybe this is what Colin wanted, since the factory classes look trivial by 
themselves. 
I guess you're right, that's most likely what Colin thought. The factory 
classes are trivial right now, but I think it can evolve to make more sense: 1) 
In HADOOP-13665 I think we can do the fallback thing here elegantly, where 
encoder/decoder creating can fail and it can try next one configured in a list; 
2) Quite some time ago when I played with micor benchmarking of the coders, I 
found cache coder instance can help in performance, and it's good to do it here 
than elsewhere like in {{CodecUtil}}. 

bq. I experimented by putting the NativeRS raw encoder and raw decoder into 
their Factory class, and it looks okay since they're pretty small.
It's a very interesting try. Yes the native RS encoder/decoder are small, for 
other coders they may not. I thought coders and coder factories may evolve to 
be bigger in future, for coders if we want to support incremental 
encoding/decoding, then more codes will be added. As HH codec indicates, if any 
party supports complex codec, the encode/decode logic can be much complex.

bq. We also should rename RSRawEncoderLegacy to RSLegacyRawEncoder ...
Right, agree.

bq. it seems like we should be creating via the appropriate factory whenever 
possible.
Can't agree any more.

bq. Overall though I think the current system is okay. The factory is the 
single entry point to configuring a RawCoder.
I'm glad the current system works for you. Do you think it's good to fix above 
problems for this issue?

Thanks!


> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13200
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13200
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may 
> be some better approach allowing to customize and configure erasure coders 
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy 
> the relevant comments here to continue the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to