[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936395#comment-13936395
 ] 

Guo Ruijing commented on HDFS-6087:
---

I think I already resolve Konstantin Shvachko and Tsz Wo Nicholas Sze's 
comments/concerns. I will wait for your new comments/concerns and update it in 
document.

design motivation is:

1) unify HDFS write/append/truncate

2) the design is base of writable snapshot / snapshot restore (This JIRA is not 
created to track snapshot items)

> Unify HDFS write/append/truncate
> 
>
> Key: HDFS-6087
> URL: https://issues.apache.org/jira/browse/HDFS-6087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Guo Ruijing
> Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
>
>
> In existing implementation, HDFS file can be appended and HDFS block can be 
> reopened for append. This design will introduce complexity including lease 
> recovery. If we design HDFS block as immutable, it will be very simple for 
> append & truncate. The idea is that HDFS block is immutable if the block is 
> committed to namenode. If the block is not committed to namenode, it is HDFS 
> client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936391#comment-13936391
 ] 

Guo Ruijing commented on HDFS-6087:
---

issue: The last block is not available for reading.


solution 1: if the block is referenced by client, the block can be moved to 
remove list in NN after block is unreferenced by client.

1) GetBlockLocations with Reference option
2) Client copy block to local buffer
3) New RPM message UnreferenceBlocks is sent to NN

solution 2: block is moved to trash and delayed to be deleted in DN.

In exsiting, blocks are deleted in DN after Heartbeat is responded to DN (lazy 
to delete blocks)

if block is already read by client and the block is requested to delete, DN 
should delete the block after read complete.

In most case, client can read the last block:

1) client request block location information

2) HDFS client copy blocks to local buffer. 

3) Heartbeat request to delete block(lazy to delete blocks)

4) HDFS application slowly read data from local buffer.

for race condition 2) and 3), we can delay to delete blocks.

even if block is deleted, client can request new block information.

I like solution 2


> Unify HDFS write/append/truncate
> 
>
> Key: HDFS-6087
> URL: https://issues.apache.org/jira/browse/HDFS-6087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Guo Ruijing
> Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
>
>
> In existing implementation, HDFS file can be appended and HDFS block can be 
> reopened for append. This design will introduce complexity including lease 
> recovery. If we design HDFS block as immutable, it will be very simple for 
> append & truncate. The idea is that HDFS block is immutable if the block is 
> committed to namenode. If the block is not committed to namenode, it is HDFS 
> client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936383#comment-13936383
 ] 

Guo Ruijing commented on HDFS-6087:
---

writing not in block boundary will trigger block copying in DN:

1) it won't lead to a lot of small block
2) Like most of file system, hflush/hsync/truncate may cause performance 
downgrade.

If we can design zero copy for block copy, there is little performance 
downgrade.

1) Block is defined as (block data file, block length)
2) source block is already committed to NN and immutable.
3) block file can be created/appended and cannot be overridden or truncated.
4) Block size may not be equal to block data file length
5) create hardlink for block data file if copy block length = file length
6) copy block data file if copy block length < file length

Example:

1) Block 1:  (blockfile1, 32M) blockfile1(length: 32M)
2) copy Block 1 to Block 2 with 32M

a) hardlink blockfile 1 to blockfile 2.
b) Block 2: (blockfile2, 32M) blockfile2 (length: 32M)

3) write 16M buffer to block 2

a) Block 1:  (blockfile1, 32M) blockfile1(length: 48M)
   
b) Block 2:  (blockfile2, 48M) blockfile2(length: 48M)

3) copy Block 2 to Block 3 with 16M

a) copy blockfile2 to blockfile3 with 16M

b) Block 1:  (blockfile1, 32M) blockfile1(length: 48M)
   
c) Block 2:  (blockfile2, 48M) blockfile2(length: 48M)

d) block 3: (blockfile 3, 16M) blockfile3(length: 16M)

> Unify HDFS write/append/truncate
> 
>
> Key: HDFS-6087
> URL: https://issues.apache.org/jira/browse/HDFS-6087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Guo Ruijing
> Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
>
>
> In existing implementation, HDFS file can be appended and HDFS block can be 
> reopened for append. This design will introduce complexity including lease 
> recovery. If we design HDFS block as immutable, it will be very simple for 
> append & truncate. The idea is that HDFS block is immutable if the block is 
> committed to namenode. If the block is not committed to namenode, it is HDFS 
> client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936379#comment-13936379
 ] 

Guo Ruijing commented on HDFS-6087:
---

if client need to read data in early time, application should be:

1. open (for create/append) 2. write 3. hflush/hsync 4. write 5. close

Note: writing not in block boundary will trigger block copy in DN (we may 
design zero copy for block copy)

if client don't need to read in wary time, application can be:

1. open (for create/append) 2. write 3. write. 5 close

> Unify HDFS write/append/truncate
> 
>
> Key: HDFS-6087
> URL: https://issues.apache.org/jira/browse/HDFS-6087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Guo Ruijing
> Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
>
>
> In existing implementation, HDFS file can be appended and HDFS block can be 
> reopened for append. This design will introduce complexity including lease 
> recovery. If we design HDFS block as immutable, it will be very simple for 
> append & truncate. The idea is that HDFS block is immutable if the block is 
> committed to namenode. If the block is not committed to namenode, it is HDFS 
> client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate

2014-03-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936378#comment-13936378
 ] 

Guo Ruijing commented on HDFS-6087:
---

It support hflush/hsync:

1) sync all buffer.

2) commit buffer to NN if it is block boundary.

3) copy new block and append buffer to new block and commit to NN if it is not 
block boundary

> Unify HDFS write/append/truncate
> 
>
> Key: HDFS-6087
> URL: https://issues.apache.org/jira/browse/HDFS-6087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Guo Ruijing
> Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
>
>
> In existing implementation, HDFS file can be appended and HDFS block can be 
> reopened for append. This design will introduce complexity including lease 
> recovery. If we design HDFS block as immutable, it will be very simple for 
> append & truncate. The idea is that HDFS block is immutable if the block is 
> committed to namenode. If the block is not committed to namenode, it is HDFS 
> client’s responsibility to re-added with new block ID.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms

2014-03-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936218#comment-13936218
 ] 

Hudson commented on HDFS-6106:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1727 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1727/])
HDFS-6106. Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms 
(cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1577798)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
> 
>
> Key: HDFS-6106
> URL: https://issues.apache.org/jira/browse/HDFS-6106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-6106.001.patch
>
>
> Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} 
> to improve the responsiveness of caching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms

2014-03-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936195#comment-13936195
 ] 

Hudson commented on HDFS-6106:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1702 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1702/])
HDFS-6106. Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms 
(cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1577798)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
> 
>
> Key: HDFS-6106
> URL: https://issues.apache.org/jira/browse/HDFS-6106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-6106.001.patch
>
>
> Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} 
> to improve the responsiveness of caching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms

2014-03-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936146#comment-13936146
 ] 

Hudson commented on HDFS-6106:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #510 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/510/])
HDFS-6106. Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms 
(cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1577798)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
> 
>
> Key: HDFS-6106
> URL: https://issues.apache.org/jira/browse/HDFS-6106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-6106.001.patch
>
>
> Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} 
> to improve the responsiveness of caching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6107) When a block can't be cached due to limited space on the DataNode, that block becomes uncacheable

2014-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936112#comment-13936112
 ] 

Hadoop QA commented on HDFS-6107:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634897/HDFS-6107.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6409//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6409//console

This message is automatically generated.

> When a block can't be cached due to limited space on the DataNode, that block 
> becomes uncacheable
> -
>
> Key: HDFS-6107
> URL: https://issues.apache.org/jira/browse/HDFS-6107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6107.001.patch
>
>
> When a block can't be cached due to limited space on the DataNode, that block 
> becomes uncacheable.  This is because the CachingTask fails to reset the 
> block state in this error handling case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936087#comment-13936087
 ] 

Colin Patrick McCabe commented on HDFS-6093:


bq. Arpit said: In addition to reducing the timeout as you suggested, can we 
add some explanation to the command output, or update 
CentralizedCacheManagement.html in the docs?

I agree, we could add a short comment to the docs about this.  Now that the 
timeout has been reduced, there should be much less discrepancy between the 
output of the two commands, of course.

Taking a more detailed look at the patch now.

{code}
+  public FsStatus getCacheStatus() throws IOException {
{code}

I know it seems clever to reuse the same class for getStatus and 
getCacheStatus, but it could become a problem if someone later adds more fields 
to getStatus that don't apply to getCacheStatus.  I think we need our own type 
for this, to maintain sanity in the future.  It's not that much code.

{code}
   public long getMissingBlocksCount() throws IOException {
+statistics.incrementReadOps(1);
 return dfs.getMissingBlocksCount();
{code}

Can we put the non-caching-related incrementReadOps changes in their own JIRA?  
It may seem like a trivial change, but it's kind of distracting from this JIRA. 
 Also I'm not sure I understand when we're "supposed" to increment this...

{code}
  /**
   * Number of replicas pending caching.
   */
  private long numPendingCaching;
  /**
   * Number of replicas pending uncaching.
   */
  private long numPendingUncaching;
{code}

Could use a linebreak after {{numPendingCaching}} for consistency.

Like I said earlier, I'd prefer to decouple the counter(s) that can be read 
from the CRM from the counters that the CRM uses internally during the scan.  
Using the same variable for both just invites bugs like the one Arpit pointed 
out, where rescan zeroes the counter outside the lock.

{code}
[CacheManager#processCacheReportImpl changes]
{code}

Incrementally updating the pendingUncached list and stats is a nice idea, but 
it seems too ambitious for 2.4 at this point.  Now that the CRM interval is 30 
seconds, it shouldn't be too bad to just wait for the CRM to update its stats 
and the lists.  Additionally, we don't even know that monitor is non-null at 
this point, so there is an NPE here, I think.  Let's leave this out and revisit 
it later.

> Expose more caching information for debugging by users
> --
>
> Key: HDFS-6093
> URL: https://issues.apache.org/jira/browse/HDFS-6093
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6093-1.patch
>
>
> When users submit a new cache directive, it's unclear if the NN has 
> recognized it and is actively trying to cache it, or if it's hung for some 
> other reason. It'd be nice to expose a "pending caching/uncaching" count the 
> same way we expose pending replication work.
> It'd also be nice to display the aggregate cache capacity and usage in 
> dfsadmin -report, since we already have have it as a metric and expose it 
> per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms

2014-03-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936083#comment-13936083
 ] 

Hudson commented on HDFS-6106:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5335 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5335/])
HDFS-6106. Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms 
(cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1577798)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
> 
>
> Key: HDFS-6106
> URL: https://issues.apache.org/jira/browse/HDFS-6106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-6106.001.patch
>
>
> Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} 
> to improve the responsiveness of caching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms

2014-03-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6106:
---

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

> Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
> 
>
> Key: HDFS-6106
> URL: https://issues.apache.org/jira/browse/HDFS-6106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-6106.001.patch
>
>
> Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} 
> to improve the responsiveness of caching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms

2014-03-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936073#comment-13936073
 ] 

Colin Patrick McCabe commented on HDFS-6106:


TestHASafeMode failure seems to be HDFS-6094, not related to this patch.  
Committing.  Thanks, Andrew.

> Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
> 
>
> Key: HDFS-6106
> URL: https://issues.apache.org/jira/browse/HDFS-6106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-6106.001.patch
>
>
> Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} 
> to improve the responsiveness of caching.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6107) When a block can't be cached due to limited space on the DataNode, that block becomes uncacheable

2014-03-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6107:
---

Attachment: HDFS-6107.001.patch

I fixed the error handling case and added a unit test.

I noticed that we were incrementing the DN metrics for BlocksCached and 
BlocksUncached just as soon as we received the DNA_CACHE and DNA_CACHE 
commands.  This is wrong, since if caching takes a while, the NN may send those 
commands more than once.  The command itself is idempotent.  I fixed it so that 
FsDatasetCache changes those stats instead.

I think this might fix some flaky unit tests we had, since we'll no longer 
double-count a block if the NN happens to send a DNA_CACHE for it twice.

> When a block can't be cached due to limited space on the DataNode, that block 
> becomes uncacheable
> -
>
> Key: HDFS-6107
> URL: https://issues.apache.org/jira/browse/HDFS-6107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6107.001.patch
>
>
> When a block can't be cached due to limited space on the DataNode, that block 
> becomes uncacheable.  This is because the CachingTask fails to reset the 
> block state in this error handling case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6107) When a block can't be cached due to limited space on the DataNode, that block becomes uncacheable

2014-03-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6107:
---

Status: Patch Available  (was: Open)

> When a block can't be cached due to limited space on the DataNode, that block 
> becomes uncacheable
> -
>
> Key: HDFS-6107
> URL: https://issues.apache.org/jira/browse/HDFS-6107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6107.001.patch
>
>
> When a block can't be cached due to limited space on the DataNode, that block 
> becomes uncacheable.  This is because the CachingTask fails to reset the 
> block state in this error handling case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6107) When a block can't be cached due to limited space on the DataNode, that block becomes uncacheable

2014-03-15 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6107:
--

 Summary: When a block can't be cached due to limited space on the 
DataNode, that block becomes uncacheable
 Key: HDFS-6107
 URL: https://issues.apache.org/jira/browse/HDFS-6107
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


When a block can't be cached due to limited space on the DataNode, that block 
becomes uncacheable.  This is because the CachingTask fails to reset the block 
state in this error handling case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)