subject:"\[jira\] \[Commented\] \(HDFS\-7344\) Erasure Coding worker and support in DataNode"

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-04-17 Thread Tsz Wo Nicholas Sze (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500488#comment-14500488
]

Tsz Wo Nicholas Sze commented on HDFS-7344:
---

... The required tasks listed here also under HDFS-7285 were started already
though no solid progress shows up yet.

The list here contains some tasks for phase 2. Let's remove the Phase 2 items
since this is a Phase 1 JIRA.

We should focus on phase 1 and don't let the phase 2 items slow down the
progress.

Erasure Coding worker and support in DataNode
-

Key: HDFS-7344
URL: https://issues.apache.org/jira/browse/HDFS-7344
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
Attachments: ECWorker-design-v2.pdf, HDFS ECWorker Design.pdf,
hdfs-ec-datanode.0108.zip, hdfs-ec-datanode.0108.zip

According to HDFS-7285 and the design, this handles DataNode side extension
and related support for Erasure Coding, and implements ECWorker. The handled
use cases:
* Converting replica blocks into stripping erasure coding blocks;
* Converting replica blocks into non-stripping erasure coding blocks;
* Recovering erased stripping erasure coding blocks;
* Recovering erased non-stripping erasure coding blocks;
* On-demand recovering serving client request for erased non-stripping
erasure coding blocks.
It generally needs to restore BlockGroup and schema information from coding
commands from NameNode or other entities, and construct specific coding work
to execute. The required block reader, writer, either local or remote,
encoder and decoder, will be implemented separately as sub-tasks.
This JIRA will track all the linked sub-tasks, and is responsible for general
discussions and integration for ECWorker. It won't resolve until all the
related tasks are done.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-04-16 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498862#comment-14498862
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7344:
---

Is anyone currently working on this?  This is one of missing main features.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: ECWorker-design-v2.pdf, HDFS ECWorker Design.pdf, 
 hdfs-ec-datanode.0108.zip, hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. The handled 
 use cases:
 * Converting replica blocks into stripping erasure coding blocks;
 * Converting replica blocks into non-stripping erasure coding blocks;
 * Recovering erased stripping erasure coding blocks;
 * Recovering erased non-stripping erasure coding blocks;
 * On-demand recovering serving client request for erased non-stripping 
 erasure coding blocks.
 It generally needs to restore BlockGroup and schema information from coding 
 commands from NameNode or other entities, and construct specific coding work 
 to execute. The required block reader, writer, either local or remote, 
 encoder and decoder, will be implemented separately as sub-tasks. 
 This JIRA will track all the linked sub-tasks, and is responsible for general 
 discussions and integration for ECWorker. It won't resolve until all the 
 related tasks are done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-04-16 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499030#comment-14499030
 ] 

Kai Zheng commented on HDFS-7344:
-

I forgot to mention that [~zhz] and our side are tracking this aspect. The 
required tasks listed here also under HDFS-7285 were started already though no 
solid progress shows up yet.


 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: ECWorker-design-v2.pdf, HDFS ECWorker Design.pdf, 
 hdfs-ec-datanode.0108.zip, hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. The handled 
 use cases:
 * Converting replica blocks into stripping erasure coding blocks;
 * Converting replica blocks into non-stripping erasure coding blocks;
 * Recovering erased stripping erasure coding blocks;
 * Recovering erased non-stripping erasure coding blocks;
 * On-demand recovering serving client request for erased non-stripping 
 erasure coding blocks.
 It generally needs to restore BlockGroup and schema information from coding 
 commands from NameNode or other entities, and construct specific coding work 
 to execute. The required block reader, writer, either local or remote, 
 encoder and decoder, will be implemented separately as sub-tasks. 
 This JIRA will track all the linked sub-tasks, and is responsible for general 
 discussions and integration for ECWorker. It won't resolve until all the 
 related tasks are done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-04-16 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499024#comment-14499024
 ] 

Kai Zheng commented on HDFS-7344:
-

Please note, this serves now as the DataNode side master JIRA and all its job 
were break down and assigned. The required tasks for HDFS-7285 were on-going as 
far as I know. I agree with you this would be the main part we should focus on.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: ECWorker-design-v2.pdf, HDFS ECWorker Design.pdf, 
 hdfs-ec-datanode.0108.zip, hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. The handled 
 use cases:
 * Converting replica blocks into stripping erasure coding blocks;
 * Converting replica blocks into non-stripping erasure coding blocks;
 * Recovering erased stripping erasure coding blocks;
 * Recovering erased non-stripping erasure coding blocks;
 * On-demand recovering serving client request for erased non-stripping 
 erasure coding blocks.
 It generally needs to restore BlockGroup and schema information from coding 
 commands from NameNode or other entities, and construct specific coding work 
 to execute. The required block reader, writer, either local or remote, 
 encoder and decoder, will be implemented separately as sub-tasks. 
 This JIRA will track all the linked sub-tasks, and is responsible for general 
 discussions and integration for ECWorker. It won't resolve until all the 
 related tasks are done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-03-30 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386731#comment-14386731
 ] 

Kai Zheng commented on HDFS-7344:
-

Having looked at all the related discussions, and also talked with 
[~libo-intel], I made the breakdown and have all the related tasks as linked to 
this. As we don't have full thoughts for this part, particularly for converting 
between replica and stripping or non-stripping, every sub-task is definitely 
subject to further discussion. As many of these tasks are challenging, and 
would not be easy for any party, please apologize me for that I distribute them 
to guys that would possibly like to help with, but if not, please feel free to 
mark them unassigned so for others to consider.
It would be great if we could have more thoughts and discussions on these 
aspects and tasks. I will track them down and assemble into the design docs 
finally.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. The handled 
 use cases:
 * Converting replica blocks into stripping erasure coding blocks;
 * Converting replica blocks into non-stripping erasure coding blocks;
 * Recovering erased stripping erasure coding blocks;
 * Recovering erased non-stripping erasure coding blocks;
 * On-demand recovering serving client request for erased non-stripping 
 erasure coding blocks.
 It generally needs to restore BlockGroup and schema information from coding 
 commands from NameNode or other entities, and construct specific coding work 
 to execute. The required block reader, writer, either local or remote, 
 encoder and decoder, will be implemented separately as sub-tasks. 
 This JIRA will track all the linked sub-tasks, and is responsible for general 
 discussions and integration for ECWorker. It won't resolve until all the 
 related tasks are done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-03-25 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379428#comment-14379428
 ] 

Li Bo commented on HDFS-7344:
-

Hi Nicholas,
Thanks for the heads up on this task. Once after finishing the client side 
striping  encoding and decoding work done, I will switch to this task.


 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-03-25 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379472#comment-14379472
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7344:
---

Since you are working on the client, how about letting someone else working on 
the datanode changes?

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-03-25 Thread Kai Zheng (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379785#comment-14379785
]

Kai Zheng commented on HDFS-7344:
-

Hello [~szetszwo],

Thanks for taking care of this. Let me address your comments together. Please
let know if it works. Thanks.
bq.For 1 missing block, we may not need to recover it at all since
(6,3)\-Reed-Solomon can tolerate 3 missing blocks. Also recovery is more
efficient for 2- or 3- missing blocks.
Good thoughts. I remembered we had related discussion with [~zhz]. The idea is
we have different priorities for recovery tasks considering how urgent the
erased blocks are necessarily to be recovered. As you said, 2- or 3- erased
blocks are more urgent than 1- erased so would be of higher priority when NN
schedules. Note 1- erased block is still needed to be recovered when possible
because as existing customer runs, in most cases only one block is erased and
to be recovered. Recovering 1- erased block can also be efficient, because in
such case simple XOR calculation can be used and no RS overhead will incur.

bq.Since you are working on the client, how about letting someone else working
on the datanode changes?
Good suggestion. Discussed with [~libo-intel], I will help before he can be
back to this after done with the client side. As it's going in the client side
where [~libo-intel] collaborates with [~jingzhao], [~zhz] and gets the hard
part already done, I believe we also need the very good community collaboration
here as well. How do you like this, let me update the design doc first in the
early of next week, discussing with [~umamaheswararao], [~vinayrpet] and etc.,
incorporating the discussions here by [~zhz] and you. The doc is subject to
your review and further discussion here. Meanwhile I will also update and
refine Bo's codes based on the latest design and the branch in another week, so
have concrete doable thoughts to break this whole down into smaller tasks, then
others than me and Bo can also help in parallel as you suggested. Hope this
works.

Erasure Coding worker and support in DataNode
-

Key: HDFS-7344
URL: https://issues.apache.org/jira/browse/HDFS-7344
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip,
hdfs-ec-datanode.0108.zip

According to HDFS-7285 and the design, this handles DataNode side extension
and related support for Erasure Coding, and implements ECWorker. It mainly
covers the following aspects, and separate tasks may be opened to handle each
of them.
* Process encoding work, calculating parity blocks as specified in block
groups and codec schema;
* Process decoding work, recovering data blocks according to block groups and
codec schema;
* Handle client requests for passive recovery blocks data and serving data on
demand while reconstructing;
* Write parity blocks according to storage policy.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-03-24 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379281#comment-14379281
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7344:
---

BTW, any progress on this JIRA and the related tasks?

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-03-24 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379277#comment-14379277
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7344:
---

 In most recovery cases, each ECWorker only generates 1 block. ...

For 1 missing block, we may not need to recover it at all since 
(6,3)-Reed-Solomon can tolerate 3 missing blocks.  Also recovery is more 
efficient for 2- or  3- missing blocks.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-01-12 Thread Zhe Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273774#comment-14273774
]

Zhe Zhang commented on HDFS-7344:
-

bq. In fact, the code uploaded is about non-striping encode/decode before we
decided to implement striping first. Because the basic idea is similar and
BlockReader/BlockWriter will be reused, I hope we can get some feedbacks to
help further development.
Thanks for clarifying.

bq. For DN side EC work, multiple blocks may be generated. If we send these
blocks after they're entirely generated, each EC work will consume a lot of
memory, typically 4*128M~6*128M, and there may be several EC work for a DN at
the same time. So a better choice is to allocate a buffer for each EC
work(produce-consumer model). When the buffer is full, encoder/decoder will
wait for BlockWriter to write the buffer locally or remotely.
In most recovery cases, each ECWorker only generates 1 block. In conversion
each ECWorker does need to generate multiple blocks. How about generating all
these blocks locally (on disk) and then sending them to remote DNs? The
downside is increased disk I/O. But my feeling is that the complex
{{DataStreamer}} logic is an overkill here. Would be great if other folks can
chime in.

bq. BlockReader and BlockWriter will have several sub classes, that is, operate
data locally or remotely, work in datanode or client. We can refine the logic
to get the best efficiency for different classes.
I see. We can leave it as-is then, and wait until we see the client side
implementation to discuss further details.

bq. When DN receives an encoding/decoding work from namenode, it will send it
to ECWorker.
Then the patch should modify {{DataNode}} class?

Erasure Coding worker and support in DataNode
-

Key: HDFS-7344
URL: https://issues.apache.org/jira/browse/HDFS-7344
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip,
hdfs-ec-datanode.0108.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-01-11 Thread Li Bo (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273239#comment-14273239
]

Li Bo commented on HDFS-7344:
-

Thanks Zhe for your comments.
Yes, you're right, we'd better implement client side work first. In fact, the
code uploaded is about non-striping encode/decode before we decided to
implement striping first. Because the basic idea is similar and
BlockReader/BlockWriter will be reused, I hope we can get some feedbacks to
help further development.
The ec package can be refactored to a proper place. For DN side EC work,
multiple blocks may be generated. If we send these blocks after they're
entirely generated, each EC work will consume a lot of memory, typically
4*128M~6*128M, and there may be several EC work for a DN at the same time. So a
better choice is to allocate a buffer for each EC work(produce-consumer model).
When the buffer is full, encoder/decoder will wait for BlockWriter to write the
buffer locally or remotely.
BlockReader and BlockWriter will have several sub classes, that is, operate
data locally or remotely, work in datanode or client. We can refine the logic
to get the best efficiency for different classes. Each DN has one ECWorker.
When DN receives an encoding/decoding work from namenode, it will send it to
ECWorker.
DN may contain some logic similar to BlockWriter/BlockReader, but it is complex
to extends them or reuse them. For example, BlockSender sends a block to remote
DN, but it reads the block from disk. We can replace the input stream or extend
this class, but I think completely rewrite the logic seems better.

Erasure Coding worker and support in DataNode
-

Key: HDFS-7344
URL: https://issues.apache.org/jira/browse/HDFS-7344
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip,
hdfs-ec-datanode.0108.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-01-09 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271900#comment-14271900
 ] 

Zhe Zhang commented on HDFS-7344:
-

Some quick comments:
Cosmetics:
# It seems the {{ec}} package should at least be under hdfs/?
# All test classes should be under src/test instead of src/main

Code logic:
# Looks like we need an updated design doc?
# In general I think the client implementation (HDFS-7545) should go before DN:
** Client support is needed in regular I/O, while DN is only involved in 
recovery and conversion
** I see that the DN patch here tries to reuse the client side striping/codec 
logic (e.g., {{ECRemoteBlockWriter}}). It is helpful to first finalize the 
client code itself.
# Apparently {{ECRemoteBlockWriter}} is a copy of {{DFSOutputStream}} now. Many 
complex components and logics in {{DFSOutputStream}} (e.g., {{DataStreamer}}) 
are only useful on the client side. For example, it needs a {{dataQueue}} to 
buffer packets because client might write data slowly and in small units. The 
client write pipeline is actually very complicated and should be avoided if 
possible. How much benefits are there for a DN to transfer recovered/converted 
data to peer DNs in small units, rather than after the entire block is 
recovered/converted?
# How does DN initiate ECWorker? 
# ECRemoteBlockReader extends ECBlockReaderBase, which implements 
ECBlockReader: is this abstraction necessary? I.e., except for 
ECRemoteBlockReader, what other block readers could extend ECBlockReaderBase?
# In general, rather than referring to and leveraging client side reader/writer 
code, I think we should refer to DN side transfer functions like 
{{DataNode#transferBlock}}, which are much simpler.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-01-08 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270367#comment-14270367
 ] 

Li Bo commented on HDFS-7344:
-

Thanks for your advice. I will use patch files in the following uploads.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-01-08 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269237#comment-14269237
 ] 

Li Bo commented on HDFS-7344:
-

Attachment is the datanode side code only considering non-stripping encoding 
and decoding. After importing stripping, client side also needs to do 
encode/decode work which is similar to datanode side process, so I am trying to 
implement them under one unified model. The code just uploaded may have many 
changes in the future but the basic idea remains the same.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

2015-01-08 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269698#comment-14269698
 ] 

Zhe Zhang commented on HDFS-7344:
-

Thanks Bo! 

Quick comment: as Andrew also pointed out in this 
[comment|https://issues.apache.org/jira/browse/HDFS-7337?focusedCommentId=14266816page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14266816],
 let's use patch files instead of raw java/zip files.

 Erasure Coding worker and support in DataNode
 -

 Key: HDFS-7344
 URL: https://issues.apache.org/jira/browse/HDFS-7344
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
 hdfs-ec-datanode.0108.zip


 According to HDFS-7285 and the design, this handles DataNode side extension 
 and related support for Erasure Coding, and implements ECWorker. It mainly 
 covers the following aspects, and separate tasks may be opened to handle each 
 of them.
 * Process encoding work, calculating parity blocks as specified in block 
 groups and codec schema;
 * Process decoding work, recovering data blocks according to block groups and 
 codec schema;
 * Handle client requests for passive recovery blocks data and serving data on 
 demand while reconstructing;
 * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

16 matches

Site Navigation

Mail list logo

Footer information