subject:"\[jira\] \[Commented\] \(HDFS\-6087\) Unify HDFS write\/append\/truncate"

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-26 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947748#comment-13947748
 ] 

Guo Ruijing commented on HDFS-6087:
---

Hi, Konstantin,

In fact, I am proposing new design for HDFS write/append/truncate, writable 
snapshot and snapshot on snapshot. This JIRA was created for HDFS 
write/append/truncate.
Currently, block recovery is very complicated during pipeline broken for 
example HDFS-5728.



 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-26 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947754#comment-13947754
 ] 

Guo Ruijing commented on HDFS-6087:
---

Hi, Konstantin,

truncate semantics in my proposal is same with HDFS-3107. We may implement them 
after resolving design concerns:

1) implement truncate as design proposal
2) implement write/append as design proposal after truncate is stable

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-26 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948814#comment-13948814
]

Konstantin Shvachko commented on HDFS-6087:
---

Currently, block recovery is very complicated during pipeline broken

Indeed the complexity starts when something brakes, and I still don't see how
you propose to simplify the process.

But truncate as discussed in HDFS-3107 is performed on a closed file and
therefore is much less related to pipeline recovery since there is no pipeline
after the file is closed. Which for me makes truncate a simpler task than the
rewrite of the pipeline.
Nicholas in [his
comment|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=13235941page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13235941]
mentioned three ideas to implement truncate. I was recently thinking about
another one.

What if we implement truncate similar to lease recovery. That is when a client
asks to truncate a file, NN changes the list of blocks by deleting some of the
tail ones and decrementing the size of the last. Then NN issues a
DataNodeCommand to recover the last block. DNs as the result of the recovery
will truncated their replica files, and then call commitBockSynchronization()
to report the new length to the NN.

Sorry, didn't want to hijack your jira, so if you intend to proceed with more
general design here I'll re-post my idea under HDFS-3107.

Unify HDFS write/append/truncate

Key: HDFS-6087
URL: https://issues.apache.org/jira/browse/HDFS-6087
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Reporter: Guo Ruijing
Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf

In existing implementation, HDFS file can be appended and HDFS block can be
reopened for append. This design will introduce complexity including lease
recovery. If we design HDFS block as immutable, it will be very simple for
append truncate. The idea is that HDFS block is immutable if the block is
committed to namenode. If the block is not committed to namenode, it is HDFS
client’s responsibility to re-added with new block ID.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-25 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946968#comment-13946968
 ] 

Konstantin Shvachko commented on HDFS-6087:
---

Guo, could you please clarify. 
Are you proposing a new design for pipeline handling or just want to add 
truncate?
New pipeline handling is probably going to be hard, while adding truncate could 
be simpler.
If it is truncate you want, do you have any different requirements for APIs or 
semantics from those laid out under HDFS-3107?

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938070#comment-13938070
 ] 

Todd Lipcon commented on HDFS-6087:
---

Even creating a new hard link on every hflush is a no-go, performance wise, I'd 
think. Involving the NN in a round trip on hflush would also kill the 
scalability of HBase and other applications that hflush hundreds of times per 
second per node.

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936378#comment-13936378
 ] 

Guo Ruijing commented on HDFS-6087:
---

It support hflush/hsync:

1) sync all buffer.

2) commit buffer to NN if it is block boundary.

3) copy new block and append buffer to new block and commit to NN if it is not 
block boundary

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936379#comment-13936379
 ] 

Guo Ruijing commented on HDFS-6087:
---

if client need to read data in early time, application should be:

1. open (for create/append) 2. write 3. hflush/hsync 4. write 5. close

Note: writing not in block boundary will trigger block copy in DN (we may 
design zero copy for block copy)

if client don't need to read in wary time, application can be:

1. open (for create/append) 2. write 3. write. 5 close

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936383#comment-13936383
 ] 

Guo Ruijing commented on HDFS-6087:
---

writing not in block boundary will trigger block copying in DN:

1) it won't lead to a lot of small block
2) Like most of file system, hflush/hsync/truncate may cause performance 
downgrade.

If we can design zero copy for block copy, there is little performance 
downgrade.

1) Block is defined as (block data file, block length)
2) source block is already committed to NN and immutable.
3) block file can be created/appended and cannot be overridden or truncated.
4) Block size may not be equal to block data file length
5) create hardlink for block data file if copy block length = file length
6) copy block data file if copy block length  file length

Example:

1) Block 1:  (blockfile1, 32M) blockfile1(length: 32M)
2) copy Block 1 to Block 2 with 32M

a) hardlink blockfile 1 to blockfile 2.
b) Block 2: (blockfile2, 32M) blockfile2 (length: 32M)

3) write 16M buffer to block 2

a) Block 1:  (blockfile1, 32M) blockfile1(length: 48M)
   
b) Block 2:  (blockfile2, 48M) blockfile2(length: 48M)

3) copy Block 2 to Block 3 with 16M

a) copy blockfile2 to blockfile3 with 16M

b) Block 1:  (blockfile1, 32M) blockfile1(length: 48M)
   
c) Block 2:  (blockfile2, 48M) blockfile2(length: 48M)

d) block 3: (blockfile 3, 16M) blockfile3(length: 16M)

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936391#comment-13936391
]

Guo Ruijing commented on HDFS-6087:
---

issue: The last block is not available for reading.

solution 1: if the block is referenced by client, the block can be moved to
remove list in NN after block is unreferenced by client.

1) GetBlockLocations with Reference option
2) Client copy block to local buffer
3) New RPM message UnreferenceBlocks is sent to NN

solution 2: block is moved to trash and delayed to be deleted in DN.

In exsiting, blocks are deleted in DN after Heartbeat is responded to DN (lazy
to delete blocks)

if block is already read by client and the block is requested to delete, DN
should delete the block after read complete.

In most case, client can read the last block:

1) client request block location information

2) HDFS client copy blocks to local buffer.

3) Heartbeat request to delete block(lazy to delete blocks)

4) HDFS application slowly read data from local buffer.

for race condition 2) and 3), we can delay to delete blocks.

even if block is deleted, client can request new block information.

I like solution 2

Unify HDFS write/append/truncate

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-14 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935017#comment-13935017
 ] 

Guo Ruijing commented on HDFS-6087:
---

update new document according to Konstantin's comments

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-14 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935686#comment-13935686
]

Konstantin Shvachko commented on HDFS-6087:
---

Based on what you write, I see two main problems with your approach.
# A block cannot be read by others while under construction, until it is fully
written and committed.
That would be a step back. Making UC-blocks readable was one of the append
design requirements (see HDFS-265 and preceding work). If a slow client writes
to a block 1KB/min others will have to wait for hours until they can see the
progress on the file.
# Your proposal (if I understand it correctly) will potentially lead to a lot
of small blocks if appends, fscyncs (and truncates) are used intensively.
Say, in order to overcome problem (1) I write my application so that it closes
the file after each 1KB written and reopens for append one minute later. You
get lots of 1KB blocks. And small blocks are bad for the NameNode as we know.

Unify HDFS write/append/truncate

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-14 Thread Tsz Wo Nicholas Sze (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935707#comment-13935707
]

Tsz Wo Nicholas Sze commented on HDFS-6087:
---

1. A block cannot be read by others while under construction, until it is
fully written and committed. ...

It also does not support hflush.

2. Your proposal (if I understand it correctly) will potentially lead to a
lot of small blocks if appends, fscyncs (and truncates) are used intensively.
...

I guess it won't lead to a lot of small block since it does copy-on-write.
However, there is going to be a lot of block coping if there are a lot of
append, hsync, etc.

In addition, I think it would be a problem for reading the last block: If a
reader opens a file and reads the last block slowly, then a writer reopen the
file for append and committed the new last block. The old last block may then
be deleted and becomes not available to the read anymore.

Unify HDFS write/append/truncate

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-14 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935727#comment-13935727
 ] 

Konstantin Shvachko commented on HDFS-6087:
---

If it does copy-on-write, then the block is not immutable, at least in the 
sense I understand the term.

 Unify HDFS write/append/truncate
 

 Key: HDFS-6087
 URL: https://issues.apache.org/jira/browse/HDFS-6087
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Guo Ruijing
 Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf


 In existing implementation, HDFS file can be appended and HDFS block can be 
 reopened for append. This design will introduce complexity including lease 
 recovery. If we design HDFS block as immutable, it will be very simple for 
 append  truncate. The idea is that HDFS block is immutable if the block is 
 committed to namenode. If the block is not committed to namenode, it is HDFS 
 client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-13 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934316#comment-13934316
]

Konstantin Shvachko commented on HDFS-6087:
---

Not sure I fully understood what you propose. So please feel free to correct if
I am wrong.
# Sounds like you propose to update blockID every time the pipeline fails and
that will guarantee block immutability.
Isn't that similar to how current HDFS uses generationStamp? When pipeline
fails HDFS increments genStamp making previously created replicas outdated.
# Seems you propose to introduce an extra commitBlock() call to NN.
Current HDFS has similar logic. Block commit is incorporated with addBlock()
and complete() calls. E.g. addBlock() changes state to committed of the
previous block of the file and then allocates the new one.
# Don't see how you get rid of lease recovery. The purpose of which is to
reconcile different replicas of the incomplete last block, as they can have
different lengths or genStamps on different DNs, as the results of the client
or DNs failure in the middle of a data transfer.
If you propose to discard uncommitted blocks entirely, then it will break
current semantics, which states that if a byte was read by a client once it
should be readable by other clients as well.
# I guess it boils down to that your diagrams show regular work-flow, but don't
consider failure scenarios.

Unify HDFS write/append/truncate

Key: HDFS-6087
URL: https://issues.apache.org/jira/browse/HDFS-6087
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Reporter: Guo Ruijing
Attachments: HDFS Design Proposal.pdf

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-13 Thread Guo Ruijing (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934485#comment-13934485
]

Guo Ruijing commented on HDFS-6087:
---

I plan to remove snapshot part and add one work-flow for write/append/truncate
and more work-flow for exception handle in design proposal.

The basic idea:

1) block is immutable. if block is committed to NN, we can copy the bock
instead of append the block and commit to NN.

2) before block is committed to NN, it is client's repsonsibility to readd it
if fails and other client cannot read that block. so we don't need
generationStamp to recover the block.

3) after block is committed to NN, file length is updated in NN so that client
cannot see uncommitted block.

4) write/append/truncate have same logic.

1. Update BlockID before commit failure including pipeline failure. The design
proposal try to remove generationStamp.

2. extra copyBlock(oldBlockID, newBlockID, length) is used for append and
truncate.

3. commitBlock a) block will be immutable b) remove all blocks after offset to
implement truncate append 3) update file length.

4. if block is not committed to namenode, file length is not updated and client
cannot read the block.

5. I will add more failure scenarios

Unify HDFS write/append/truncate

Key: HDFS-6087
URL: https://issues.apache.org/jira/browse/HDFS-6087
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Reporter: Guo Ruijing
Attachments: HDFS Design Proposal.pdf

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

15 matches

Site Navigation

Mail list logo

Footer information