[jira] [Created] (HDFS-12888) NameNode web UI shows stale config values after cli refresh

2017-12-04 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-12888:


 Summary: NameNode web UI shows stale config values after cli 
refresh
 Key: HDFS-12888
 URL: https://issues.apache.org/jira/browse/HDFS-12888
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ui
Affects Versions: 2.7.4
Reporter: Zhe Zhang


To reproduce:
# Run webui /conf
# Use {{hdfs -refresh}} to update a configuration value
# Run webui /conf again, it will still show the old configuration value 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12502) nntop should support a category based on FilesInGetListingOps

2017-10-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-12502:
--

> nntop should support a category based on FilesInGetListingOps
> -
>
> Key: HDFS-12502
> URL: https://issues.apache.org/jira/browse/HDFS-12502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.9.0, 2.8.3, 3.0.0, 3.1.0
>
> Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, 
> HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch
>
>
> Large listing ops can oftentimes be the main contributor to NameNode 
> slowness. The aggregate cost of listing ops is proportional to the 
> {{FilesInGetListingOps}} rather than the number of listing ops. Therefore 
> it'd be very useful for nntop to support this category.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12502) nntop should support category based on FilesInGetListingOps

2017-09-20 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-12502:


 Summary: nntop should support category based on 
FilesInGetListingOps
 Key: HDFS-12502
 URL: https://issues.apache.org/jira/browse/HDFS-12502
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12379) NameNode getListing should use FileStatus instead of HdfsFileStatus

2017-08-30 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-12379:


 Summary: NameNode getListing should use FileStatus instead of 
HdfsFileStatus
 Key: HDFS-12379
 URL: https://issues.apache.org/jira/browse/HDFS-12379
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang


The public {{listStatus}} APIs in {{FileSystem}} and {{DistributedFileSystem}} 
expose {{FileStatus}} instead of {{HdfsFileStatus}}. Therefore it is a waste to 
create the more expensive {{HdfsFileStatus}} objects on NameNode.

It should be a simple change similar to HDFS-11641. Marking incompatible 
because wire protocol is incompatible. Not sure what downstream apps are 
affected by this incompatibility. Maybe those directly using curl, or writing 
their own HDFS client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12345) Scale testing HDFS NameNode

2017-08-23 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-12345:


 Summary: Scale testing HDFS NameNode
 Key: HDFS-12345
 URL: https://issues.apache.org/jira/browse/HDFS-12345
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12284) Router support for Kerberos and delegation tokens

2017-08-09 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-12284:


 Summary: Router support for Kerberos and delegation tokens
 Key: HDFS-12284
 URL: https://issues.apache.org/jira/browse/HDFS-12284
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Reporter: Zhe Zhang
Assignee: xiangguang zheng


HDFS Router should support Kerberos authentication and issuing / managing HDFS 
delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11743) Revert HDFS-7933 from branch-2.7 (fsck reporting decommissioning replicas)

2017-05-02 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-11743:


 Summary: Revert HDFS-7933 from branch-2.7 (fsck reporting 
decommissioning replicas)
 Key: HDFS-11743
 URL: https://issues.apache.org/jira/browse/HDFS-11743
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11732) Backport HDFS-8498 to branch-2.7: Blocks can be committed with wrong size

2017-05-01 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-11732:


 Summary: Backport HDFS-8498 to branch-2.7: Blocks can be committed 
with wrong size
 Key: HDFS-11732
 URL: https://issues.apache.org/jira/browse/HDFS-11732
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.3
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-9005) Provide configuration support for upgrade domain

2017-05-01 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-9005:
-

> Provide configuration support for upgrade domain
> 
>
> Key: HDFS-9005
> URL: https://issues.apache.org/jira/browse/HDFS-9005
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9005-2.patch, HDFS-9005-3.patch, HDFS-9005-4.patch, 
> HDFS-9005.branch-2.8.001.patch, HDFS-9005.patch
>
>
> As part of the upgrade domain feature, we need to provide a mechanism to 
> specify upgrade domain for each datanode. One way to accomplish that is to 
> allow admins specify an upgrade domain script that takes DN ip or hostname as 
> input and return the upgrade domain. Then namenode will use it at run time to 
> set {{DatanodeInfo}}'s upgrade domain string. The configuration can be 
> something like:
> {noformat}
> 
> dfs.namenode.upgrade.domain.script.file.name
> /etc/hadoop/conf/upgrade-domain.sh
> 
> {noformat}
> just like topology script, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-8873) Allow the directoryScanner to be rate-limited

2017-04-27 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8873:
-

> Allow the directoryScanner to be rate-limited
> -
>
> Key: HDFS-8873
> URL: https://issues.apache.org/jira/browse/HDFS-8873
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Daniel Templeton
>  Labels: 2.7.2-candidate
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch, 
> HDFS-8873.006.patch, HDFS-8873.007.patch, HDFS-8873.008.patch, 
> HDFS-8873.009.patch, HDFS-8873-branch-2.7.009.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11709) StandbyCheckpointer should handle an non-existing legacyOivImageDir gracefully

2017-04-26 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-11709:


 Summary: StandbyCheckpointer should handle an non-existing 
legacyOivImageDir gracefully
 Key: HDFS-11709
 URL: https://issues.apache.org/jira/browse/HDFS-11709
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.6.1
Reporter: Zhe Zhang
Assignee: Erik Krogen
Priority: Critical


In {{StandbyCheckpointer}}, if the legacy OIV directory is not properly 
created, or was deleted for some reason (e.g. mis-operation), all checkpoint 
ops will fall. Not only the ANN won't receive new fsimages, the JNs will get 
full with edit log files, and cause NN to crash.
{code}
  // Save the legacy OIV image, if the output dir is defined.
  String outputDir = checkpointConf.getLegacyOivImageDir();
  if (outputDir != null && !outputDir.isEmpty()) {
img.saveLegacyOIVImage(namesystem, outputDir, canceler);
  }
{code}

It doesn't make sense to let such an unimportant part (saving OIV) abort all 
checkpoints and cause NN crash (and possibly lose data).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-8872) Reporting of missing blocks is different in fsck and namenode ui/metasave

2017-04-26 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8872:
-

> Reporting of missing blocks is different in fsck and namenode ui/metasave
> -
>
> Key: HDFS-8872
> URL: https://issues.apache.org/jira/browse/HDFS-8872
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> Namenode ui and metasave will not report a block as missing if the only 
> replica is on decommissioning/decomissioned node while fsck will show it as 
> MISSING.
> Since decommissioned node can be formatted/removed anytime, we can actually 
> lose the block.
> Its better to alert on namenode ui if the only copy is on 
> decomissioned/decommissioning node.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8872) Reporting of missing blocks is different in fsck and namenode ui/metasave

2017-04-26 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8872.
-
Resolution: Duplicate

> Reporting of missing blocks is different in fsck and namenode ui/metasave
> -
>
> Key: HDFS-8872
> URL: https://issues.apache.org/jira/browse/HDFS-8872
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> Namenode ui and metasave will not report a block as missing if the only 
> replica is on decommissioning/decomissioned node while fsck will show it as 
> MISSING.
> Since decommissioned node can be formatted/removed anytime, we can actually 
> lose the block.
> Its better to alert on namenode ui if the only copy is on 
> decomissioned/decommissioning node.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode

2017-01-18 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8015.
-
Resolution: Duplicate

Resolving this issue since HDFS-9719 already introduced {{StripedBlockWriter}}

> Erasure Coding: local and remote block writer for coding work in DataNode
> -
>
> Key: HDFS-8015
> URL: https://issues.apache.org/jira/browse/HDFS-8015
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Li Bo
> Attachments: HDFS-8015-000.patch, HDFS-8015-001.patch
>
>
> As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure 
> coding, to perform encoding or decoding, we need to be able to write data 
> blocks locally or remotely. This is to come up block writer facility in 
> DataNode side. Better to think about the similar work done in client side, so 
> in future it's possible to unify the both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8014) Erasure Coding: local and remote block reader for coding work in DataNode

2017-01-18 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8014.
-
Resolution: Duplicate

Resolving the issue as HDFS-9719 already creates {{StripedBlockReader}}

> Erasure Coding: local and remote block reader for coding work in DataNode
> -
>
> Key: HDFS-8014
> URL: https://issues.apache.org/jira/browse/HDFS-8014
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Zhe Zhang
>
> As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure 
> coding, to perform encoding or decoding, we need first to be able to read 
> locally or remotely data blocks. This is to come up block reader facility in 
> DataNode side. Better to think about the similar work done in client side, so 
> in future it's possible to unify the both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11345) Document the configuration key for FSNamesystem lock fairness

2017-01-17 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-11345:


 Summary: Document the configuration key for FSNamesystem lock 
fairness
 Key: HDFS-11345
 URL: https://issues.apache.org/jira/browse/HDFS-11345
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, namenode
Reporter: Zhe Zhang
Assignee: Erik Krogen
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11157) Enhance documentation around cpLock

2016-11-18 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-11157:


 Summary: Enhance documentation around cpLock
 Key: HDFS-11157
 URL: https://issues.apache.org/jira/browse/HDFS-11157
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, ha, namenode
Reporter: Zhe Zhang
Priority: Minor


The {{cpLock}} was introduced in HDFS-7097. Among all operations allowed on 
SbNN, some acquire this lock (e.g. {{restoreFailedStorage}}) and some do not, 
including {{metasave}}, {{refreshNodes}}, {{setSafeMode}}.

We should enhance the documentation around {{cpLock}} to explain the above. 
Also, maybe {{metasave}} and {{refreshNodes}} do not need fsn write lock?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-7964) Add support for async edit logging

2016-10-31 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-7964:
-

I think this'd be a very useful addition to branch-2.8 and branch-2.7. 
Especially since the patch has been used in 2.6 production cluster for a long 
time.

Attaching branch-2.8/branch-2.7 patches to verify with Jenkins.

> Add support for async edit logging
> --
>
> Key: HDFS-7964
> URL: https://issues.apache.org/jira/browse/HDFS-7964
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, 
> HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-1499) mv the namenode NameSpace and BlocksMap to hbase to save the namenode memory

2016-10-28 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-1499.
-
Resolution: Duplicate

Resolving the old JIRA since many similar JIRAs have been raised, including 
HDFS-8286.

> mv the namenode NameSpace and BlocksMap to hbase to save the namenode memory
> 
>
> Key: HDFS-1499
> URL: https://issues.apache.org/jira/browse/HDFS-1499
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: dl.brain.ln
>
> The NameNode stores all its metadata in the main memory of the machine on 
> which it is deployed. With the file-count and block number growing, namenode 
> machine can't hold anymore files and blocks in its memory and thus restrict 
> the HDFS cluster growth. So many people are talking and thinking abont this 
> problem. Google's next version of GFS use bigtable to store the metadata of 
> the DFS and that seem works. What if we use hbase as the same?
> In the namenode structure, the namespace of the filesystem and the map of 
> block -> datanodes, datanode->blocks which keeped in memory are consume most 
> of the namenode's heap, what if we store those data structure in hbase to 
> decrease the namenode's memory?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11051) Test Balancer behavior when some block moves are slow

2016-10-25 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-11051:


 Summary: Test Balancer behavior when some block moves are slow
 Key: HDFS-11051
 URL: https://issues.apache.org/jira/browse/HDFS-11051
 Project: Hadoop HDFS
  Issue Type: Test
  Components: balancer & mover
Reporter: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10977) Balancer should query NameNode with a timeout

2016-10-06 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10977:


 Summary: Balancer should query NameNode with a timeout
 Key: HDFS-10977
 URL: https://issues.apache.org/jira/browse/HDFS-10977
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Reporter: Zhe Zhang
Assignee: Zhe Zhang


We found a case where {{Dispatcher}} was stuck at {{getBlockList}} *forever* 
(well, several hours when we found it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10967) Add configuration for BlockPlacementPolicy to deprioritize near-full DataNodes

2016-10-05 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10967:


 Summary: Add configuration for BlockPlacementPolicy to 
deprioritize near-full DataNodes
 Key: HDFS-10967
 URL: https://issues.apache.org/jira/browse/HDFS-10967
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang


Large production clusters are likely to have heterogeneous nodes in terms of 
storage capacity, memory, and CPU cores. It is not always possible to 
proportionally ingest data into DataNodes based on their remaining storage 
capacity. Therefore it's possible for a subset of DataNodes to be much closer 
to full capacity than the rest.

Notice that this heterogeneity is most likely rack-by-rack -- i.e. _m_ whole 
racks with low-storage nodes and _n_ whole racks with high-storage nodes. So 
It'd be very useful if we can deprioritize those near-full DataNodes as 
destinations for the 2nd and 3rd replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10966) Enhance Dispatcher logic on deciding when to give up a source DataNode

2016-10-05 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10966:


 Summary: Enhance Dispatcher logic on deciding when to give up a 
source DataNode
 Key: HDFS-10966
 URL: https://issues.apache.org/jira/browse/HDFS-10966
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Reporter: Zhe Zhang
Assignee: Mark Wagner


When a {{Dispatcher}} thread works on a source DataNode, in each iteration it 
tries to execute a {{PendingMove}}. If no block is moved after 5 iterations, 
this source (over-utlized) DataNode is given up for this Balancer iteration (20 
mins). This is problematic if the source DataNode was heavily loaded in the 
beginning of the iteration. It will quickly encounter 5 unsuccessful moves and 
be abandoned.

We should enhance this logic by e.g. using elapsed time instead of number of 
iterations.
{code}
// Check if the previous move was successful
} else {
  // source node cannot find a pending block to move, iteration +1
  noPendingMoveIteration++;
  // in case no blocks can be moved for source node's task,
  // jump out of while-loop after 5 iterations.
  if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) {
LOG.info("Failed to find a pending move "  + noPendingMoveIteration
+ " times.  Skipping " + this);
resetScheduledSize();
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10745) Directly resolve paths into INodesInPath

2016-09-14 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10745:
--

Sorry to reopen the JIRA, testing branch-2.7 patch.

> Directly resolve paths into INodesInPath
> 
>
> Key: HDFS-10745
> URL: https://issues.apache.org/jira/browse/HDFS-10745
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10745.2.patch, HDFS-10745.branch-2.patch, 
> HDFS-10745.patch
>
>
> The intermediate resolution to a string, only to be decomposed by 
> {{INodesInPath}} back into a byte[][] can be eliminated by resolving directly 
> to an IIP.  The IIP will contain the resolved path if required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10859) TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs are flaky in branch-2.7

2016-09-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-10859.
--
Resolution: Duplicate

Thanks for the pointer Xiao. HDFS-10716 indeed solves the problem.

> TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs are flaky 
> in branch-2.7
> --
>
> Key: HDFS-10859
> URL: https://issues.apache.org/jira/browse/HDFS-10859
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, test
>Affects Versions: 2.7.4
>Reporter: Zhe Zhang
>Priority: Minor
> Attachments: testUnknownDatanodeSimple-failure.log
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10744) Internally optimize path component resolution

2016-09-12 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10744:
--

Sorry for re-opening this. Triggering Jenkins for branch-2.7 patch.

> Internally optimize path component resolution
> -
>
> Key: HDFS-10744
> URL: https://issues.apache.org/jira/browse/HDFS-10744
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10744-branch-2.7.patch, HDFS-10744.patch
>
>
> {{FSDirectory}}'s path resolution currently uses a mixture of string & 
> byte[][]  conversions, back to string, back to byte[][] for {{INodesInPath}}. 
>  Internally all path component resolution should be byte[][]-based as the 
> precursor to instantiating an {{INodesInPath}} w/o the last 2 unnecessary 
> conversions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10673) Optimize FSPermissionChecker's internal path usage

2016-09-12 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10673:
--

Sorry to reopen the JIRA. I want to test the branch-2.7 patch on Jenkins.

> Optimize FSPermissionChecker's internal path usage
> --
>
> Key: HDFS-10673
> URL: https://issues.apache.org/jira/browse/HDFS-10673
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-10673-branch-2.7.00.patch, HDFS-10673.1.patch, 
> HDFS-10673.2.patch, HDFS-10673.patch
>
>
> The INodeAttributeProvider and AccessControlEnforcer features degrade 
> performance and generate excessive garbage even when neither is used.  Main 
> issues:
> # A byte[][] of components is unnecessarily created.  Each path component 
> lookup converts a subrange of the byte[][] to a new String[] - then not used 
> by default attribute provider.
> # Subaccess checks are insanely expensive.  The full path of every subdir is 
> created by walking up the inode tree, creating a INode[], building a string 
> by converting each inode's byte[] name to a string, etc.  Which will only be 
> used if there's an exception.
> The expensive of #1 should only be incurred when using the provider/enforcer 
> feature.  For #2, paths should be created on-demand for exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10859) TestBalancer#testUnknownDatanodeSimple and testBalancerWithKeytabs fail in branch-2.7

2016-09-12 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10859:


 Summary: TestBalancer#testUnknownDatanodeSimple and 
testBalancerWithKeytabs fail in branch-2.7
 Key: HDFS-10859
 URL: https://issues.apache.org/jira/browse/HDFS-10859
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover, test
Affects Versions: 2.7
Reporter: Zhe Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-8818) Allow Balancer to run faster

2016-09-12 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8818:
-

Sorry to reopen this one. I think this is a valid improvement for branch-2.7 
and I'm trying to backport it. I'll attach a branch-2.7 patch for Jenkins 
verification.

> Allow Balancer to run faster
> 
>
> Key: HDFS-8818
> URL: https://issues.apache.org/jira/browse/HDFS-8818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: h8818_20150723.patch, h8818_20150727.patch
>
>
> The original design of Balancer is intentionally to make it run slowly so 
> that the balancing activities won't affect the normal cluster activities and 
> the running jobs.
> There are new use case that cluster admin may choose to balance the cluster 
> when the cluster load is low, or in a maintain window.  So that we should 
> have an option to allow Balancer to run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10854) Remove createStripedFile and addBlockToFile by creating real EC files

2016-09-09 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10854:


 Summary: Remove createStripedFile and addBlockToFile by creating 
real EC files
 Key: HDFS-10854
 URL: https://issues.apache.org/jira/browse/HDFS-10854
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding, test
Affects Versions: 3.0.0-alpha2
Reporter: Zhe Zhang


{{DFSTestUtil#createStripedFile}} and {{addBlockToFile}} were developed before 
we completed EC client. They were used to test the {{NameNode}} EC logic when 
the client was unable to really create/read/write EC files.

They are causing confusions in other issues about {{NameNode}}. For example, in 
one of the patches under {{HDFS-10301}}, 
{{testProcessOverReplicatedAndMissingStripedBlock}} fails because in the test 
we fake a block report from a DN, with a randomly generated storage ID. The DN 
itself is never aware of that storage. This is not possible in a real 
production environment.
{code}
  DatanodeStorage storage = new 
DatanodeStorage(UUID.randomUUID().toString());
  StorageReceivedDeletedBlocks[] reports = DFSTestUtil
  .makeReportForReceivedBlock(block,
  ReceivedDeletedBlockInfo.BlockStatus.RECEIVING_BLOCK, storage);
  for (StorageReceivedDeletedBlocks report : reports) {
ns.processIncrementalBlockReport(dn.getDatanodeId(), report);
  }
{code}

Now that we have a fully functional EC client, we should remove the old testing 
logic and use similar logic as non-EC tests (creating real files and emulate 
blocks missing / being corrupt).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10662) Optimize UTF8 string/byte conversions

2016-08-31 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10662:
--

Sorry to reopen the JIRA. I'm backporting to branch-2.7 and it was quite messy.

[~daryn] [~kihwal] I'd appreciate it if you could take a look.

> Optimize UTF8 string/byte conversions
> -
>
> Key: HDFS-10662
> URL: https://issues.apache.org/jira/browse/HDFS-10662
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10662-branch-2.7.00.patch, HDFS-10662.patch, 
> HDFS-10662.patch.1
>
>
> String/byte conversions may take either a Charset instance or its canonical 
> name.  One might think a Charset instance would be faster due to avoiding a 
> lookup and instantiation of a Charset, but it's not.  The canonical string 
> name variants will cache the string encoder/decoder (obtained from a Charset) 
> resulting in better performance.
> LOG4J2-935 describes a real-world performance boost.  I micro-benched a 
> marginal runtime improvement on jdk 7/8.  However for a 16 byte path, using 
> the canonical name generated 50% less garbage.  For a 64 byte path, 25% of 
> the garbage.  Given the sheer number of times that paths are (re)parsed, the 
> cost adds up quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10655:
--

Sorry to reopen it, I want to backport it to branch-2.7 and have a full Jenkins 
run before doing that.

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-10655-branch-2.7.patch, HDFS-10655.patch, 
> HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10809) getNumEncryptionZones causes NPE in branch-2.7

2016-08-26 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10809:


 Summary: getNumEncryptionZones causes NPE in branch-2.7
 Key: HDFS-10809
 URL: https://issues.apache.org/jira/browse/HDFS-10809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.7.4
Reporter: Zhe Zhang


This bug was caused by the fact that we did HDFS-10458 from trunk to 
branch-2.7, but we did HDFS-8721 initially up to branch-2.8. So from branch-2.8 
and up, the order is HDFS-8721 -> HDFS-10458. But in branch-2.7, we have the 
reverse order. Hence the inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable

2016-08-25 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10798:


 Summary: Make the threshold of reporting FSNamesystem lock 
contention configurable
 Key: HDFS-10798
 URL: https://issues.apache.org/jira/browse/HDFS-10798
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang


Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. In 
a busy system this might add too much overhead. We should make the threshold 
configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-7933) fsck should also report decommissioning replicas.

2016-08-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-7933:
-

> fsck should also report decommissioning replicas. 
> --
>
> Key: HDFS-7933
> URL: https://issues.apache.org/jira/browse/HDFS-7933
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Xiaoyu Yao
> Fix For: 2.8.0
>
> Attachments: HDFS-7933-branch-2.7.00.patch, HDFS-7933.00.patch, 
> HDFS-7933.01.patch, HDFS-7933.02.patch, HDFS-7933.03.patch
>
>
> Fsck doesn't count replicas that are on decommissioning nodes. If a block has 
> all replicas on the decommissioning nodes, it will be marked as missing, 
> which is alarming for the admins, although the system will replicate them 
> before nodes are decommissioned.
> Fsck output should also show decommissioning replicas along with the live 
> replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-9804) Allow long-running Balancer to login with keytab

2016-08-10 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-9804:
-

> Allow long-running Balancer to login with keytab
> 
>
> Key: HDFS-9804
> URL: https://issues.apache.org/jira/browse/HDFS-9804
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover, security
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: supportability
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-9804-branch-2.00.patch, HDFS-9804.01.patch, 
> HDFS-9804.02.patch, HDFS-9804.03.patch
>
>
> From the discussion of HDFS-9698, it might be nice to allow the balancer to 
> run as a daemon and login from a keytab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10680) TestUTF8 fails in branch-2.6

2016-07-22 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10680:


 Summary: TestUTF8 fails in branch-2.6
 Key: HDFS-10680
 URL: https://issues.apache.org/jira/browse/HDFS-10680
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Zhe Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10653) Optimize conversion from path string to components

2016-07-22 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-10653.
--
Resolution: Fixed

Local test-patch shows 2 failures but those tests fail even without the patch. 
I'll create a JIRA to address.

Just committed v03 branch-2.6 patch. Thanks Allen and Daryn for the review and 
comments.

> Optimize conversion from path string to components
> --
>
> Key: HDFS-10653
> URL: https://issues.apache.org/jira/browse/HDFS-10653
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 2.6.5, 2.7.4
>
> Attachments: HDFS-10653-branch-2.6.00.patch, 
> HDFS-10653-branch-2.6.01.patch, HDFS-10653-branch-2.6.02.patch, 
> HDFS-10653-branch-2.6.03.patch, HDFS-10653.patch
>
>
> Converting a path String to a byte[][] currently requires an unnecessary 
> intermediate conversion from String to String[].  Removing this will reduce 
> excessive object allocation and byte copying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10653) Optimize conversion from path string to components

2016-07-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10653:
--

Sorry, have to reopen the issue otherwise Jenkins won't run.

> Optimize conversion from path string to components
> --
>
> Key: HDFS-10653
> URL: https://issues.apache.org/jira/browse/HDFS-10653
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 2.9.0, 2.7.4
>
> Attachments: HDFS-10653-branch-2.6.00.patch, HDFS-10653.patch
>
>
> Converting a path String to a byte[][] currently requires an unnecessary 
> intermediate conversion from String to String[].  Removing this will reduce 
> excessive object allocation and byte copying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10534) NameNode WebUI should display DataNode usage rate with a certain percentile

2016-06-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-10534:
--

> NameNode WebUI should display DataNode usage rate with a certain percentile
> ---
>
> Key: HDFS-10534
> URL: https://issues.apache.org/jira/browse/HDFS-10534
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, ui
>Reporter: Zhe Zhang
>Assignee: Kai Sasaki
> Attachments: HDFS-10534.01.patch, HDFS-10534.02.patch, 
> HDFS-10534.03.patch, HDFS-10534.04.patch, HDFS-10534.05.patch, Screen Shot 
> 2016-06-23 at 6.25.50 AM.png
>
>
> In addition of *Min/Median/Max*, another meaningful metric for cluster 
> balance is DN usage rate at a certain percentile (e.g. 90 or 95). We should 
> add a config option, and another filed on NN WebUI, to display this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10534) NameNode WebUI should display DataNode usage rate with a certain percentile

2016-06-15 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10534:


 Summary: NameNode WebUI should display DataNode usage rate with a 
certain percentile
 Key: HDFS-10534
 URL: https://issues.apache.org/jira/browse/HDFS-10534
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, ui
Reporter: Zhe Zhang


In addition of *Min/Median/Max*, another meaningful metric for cluster balance 
is DN usage rate at a certain percentile (e.g. 90 or 95). We should add a 
config option, and another filed on NN WebUI, to display this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10458) getFileEncryptionInfo should return quickly for non-encrypted cluster

2016-05-24 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-10458:


 Summary: getFileEncryptionInfo should return quickly for 
non-encrypted cluster
 Key: HDFS-10458
 URL: https://issues.apache.org/jira/browse/HDFS-10458
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang


{{FSDirectory#getFileEncryptionInfo}} always acquires {{readLock}} and checks 
if the path belongs to an EZ. For a busy system with potentially many listing 
operations, this could cause locking contention.

I think we should add a call {{EncryptionZoneManager#hasEncryptionZone()}} to 
return whether the system has any EZ. If no EZ at all, 
{{getFileEncryptionInfo}} should return null without {{readLock}}.

If {{hasEncryptionZone}} is only used in the above scenario, maybe itself 
doesn't need a {{readLock}} -- if the system doesn't have any EZ when 
{{getFileEncryptionInfo}} is called on a path, it means the path cannot be 
encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-9844) Correct path creation in getTrashRoot to handle root dir

2016-02-22 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9844:
---

 Summary: Correct path creation in getTrashRoot to handle root dir
 Key: HDFS-9844
 URL: https://issues.apache.org/jira/browse/HDFS-9844
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.8.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Blocker


{code}
  if ((ez != null)) {
return this.makeQualified(
new Path(ez.getPath() + "/" + FileSystem.TRASH_PREFIX +
dfs.ugi.getShortUserName()));
{code}
This doesn't handle root dir correctly. The unit test {{testRootDirEZTrash}} in 
the attached patch can reproduce the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9799) Reimplement getCurrentTrashDir to remove incompatibility

2016-02-12 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9799:
---

 Summary: Reimplement getCurrentTrashDir to remove incompatibility
 Key: HDFS-9799
 URL: https://issues.apache.org/jira/browse/HDFS-9799
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Blocker


HDFS-8831 changed the signature of {{TrashPolicy#getCurrentTrashDir}} by adding 
an IOException. This breaks other applications using this public API. This JIRA 
aims to reimplement the logic to safely handle the IOException within HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9785) Remove unused TrashPolicy#getInstance code

2016-02-09 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9785:
---

 Summary: Remove unused TrashPolicy#getInstance code
 Key: HDFS-9785
 URL: https://issues.apache.org/jira/browse/HDFS-9785
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhe Zhang
Priority: Minor


A follow-on from HDFS-8831: now the {{getInstance}} API with Path is not used 
anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9172) Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client

2016-02-02 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-9172.
-
Resolution: Invalid

> Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client
> --
>
> Key: HDFS-9172
> URL: https://issues.apache.org/jira/browse/HDFS-9172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Zhe Zhang
>
> The idea of this jira is to move the striped stream related classes to 
> {{hadoop-hdfs-client}} project. This will help to be in sync with the 
> HDFS-6200 proposal.
> - DFSStripedInputStream
> - DFSStripedOutputStream
> - StripedDataStreamer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9698) Long running Balancer should renew TGT

2016-01-25 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9698:
---

 Summary: Long running Balancer should renew TGT
 Key: HDFS-9698
 URL: https://issues.apache.org/jira/browse/HDFS-9698
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover, security
Affects Versions: 2.6.3
Reporter: Zhe Zhang
Assignee: Zhe Zhang


When the {{Balancer}} runs beyond the configured TGT lifetime, the current 
logic won't renew TGT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9688) Test the effect of nested encryption zones in HDFS downgrade

2016-01-22 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9688:
---

 Summary: Test the effect of nested encryption zones in HDFS 
downgrade
 Key: HDFS-9688
 URL: https://issues.apache.org/jira/browse/HDFS-9688
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption, test
Reporter: Zhe Zhang
Assignee: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9644) Update encryption documentation to reflect nested EZs

2016-01-12 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9644:
---

 Summary: Update encryption documentation to reflect nested EZs
 Key: HDFS-9644
 URL: https://issues.apache.org/jira/browse/HDFS-9644
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: documentation, encryption
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9576) HTrace: collect path/offset/length information on read and write operations

2015-12-18 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9576:
---

 Summary: HTrace: collect path/offset/length information on read 
and write operations
 Key: HDFS-9576
 URL: https://issues.apache.org/jira/browse/HDFS-9576
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, tracing
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9578) Incorrect default value (typo) in hdfs-default.xml

2015-12-18 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-9578.
-
Resolution: Duplicate

Thanks for reporting this [~tianyin]. It has been fixed in trunk already.

> Incorrect default value (typo) in hdfs-default.xml
> --
>
> Key: HDFS-9578
> URL: https://issues.apache.org/jira/browse/HDFS-9578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.2, 2.6.3
>Reporter: Tianyin Xu
>Priority: Minor
>
> In {{hdfs-default.xml}}, the default value of 
> {{dfs.datanode.readahead.bytes}} is wrong. 
> The value should be 4MB (4 * 1024 * 1024) according to the code, i.e.,
> {{4194304}}, 
> while in hdfs-default.xml, it is 
> {{4193404}}.
> 
> (src)
> {code:title=DFSConfigKeys.java|borderStyle=solid}
> 134   public static final String  DFS_DATANODE_READAHEAD_BYTES_KEY = 
> "dfs.datanode.readahead.bytes";
> 135   public static final longDFS_DATANODE_READAHEAD_BYTES_DEFAULT = 4 * 
> 1024 * 1024; // 4MB
> {code}
> (hdfs-default.xml)
> https://hadoop.apache.org/docs/r2.6.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7691) Handle hflush and hsync in the best optimal way possible during online Erasure encoding

2015-12-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-7691.
-
Resolution: Duplicate

Thanks Vinay for confirming. Let's track the hflush related efforts under 
HDFS-7661.

> Handle hflush and hsync in the best optimal way possible during online 
> Erasure encoding
> ---
>
> Key: HDFS-7691
> URL: https://issues.apache.org/jira/browse/HDFS-7691
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>
> as mentioned in design doc, hsync and hflush tends to make the online erasure 
> encoding complex.
> But these are critical features to ensure fault tolerance for some users.
> These operations should be supported in best way possible during online 
> erasure encoding to support the fault tolerance.
> This Jira is a placeholder for the task. How to solve this will be discussed 
> later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9529) Extend Erasure Code to support POWER Chip acceleration

2015-12-09 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-9529.
-
Resolution: Duplicate

Thanks Qijun for confirming this. Resolving this one as dup.

> Extend Erasure Code to support POWER Chip acceleration
> --
>
> Key: HDFS-9529
> URL: https://issues.apache.org/jira/browse/HDFS-9529
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: wqijun
>Assignee: wqijun
> Fix For: 3.0.0
>
>
> Erasure Code is a very important feature in new HDFS version. This JIRA will 
> focus on how to extend EC to support multiple types of EC acceleration by C 
> library and other hardware method, like GPU or FPGA. Compared with 
> Hadoop-11887, this JIRA will more focus on how to leverage POWER Chip 
> capability to accelerate the EC calculating. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9496) Erasure coding: an erasure codec throughput benchmark tool

2015-12-02 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-9496.
-
Resolution: Duplicate

Thanks Rui for clarifying this.

> Erasure coding: an erasure codec throughput benchmark tool
> --
>
> Key: HDFS-9496
> URL: https://issues.apache.org/jira/browse/HDFS-9496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Hui Zheng
>
> We need a tool which can help us decide/benchmark an Erasure Codec and schema.
> Considering HDFS-8968 has implemented an I/O throughput benchmark tool.Maybe 
> we could simply add encode/decode operation to it or implement another tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9403) Erasure coding: some EC tests are missing timeout

2015-11-09 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9403:
---

 Summary: Erasure coding: some EC tests are missing timeout
 Key: HDFS-9403
 URL: https://issues.apache.org/jira/browse/HDFS-9403
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: erasure-coding, test
Affects Versions: 3.0.0
Reporter: Zhe Zhang
Priority: Minor


EC data writing pipeline is still being worked on, and bugs could introduce 
program hang. We should add a timeout for all tests involving striped writing. 
I see at least the following:

* {{TestErasureCodingPolicies}}
* {{TestFileStatusWithECPolicy}}
* {{TestDFSStripedOutputStream}}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9405) When starting a file, NameNode should generate EDEK in a separate thread

2015-11-09 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9405:
---

 Summary: When starting a file, NameNode should generate EDEK in a 
separate thread
 Key: HDFS-9405
 URL: https://issues.apache.org/jira/browse/HDFS-9405
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: encryption, namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang


{{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation to 
the key provider, which could be slow or cause timeout. It should be done as a 
separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9386) Erasure coding: updateBlockForPipeline sometimes returns non-striped block for striped file

2015-11-05 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9386:
---

 Summary: Erasure coding: updateBlockForPipeline sometimes returns 
non-striped block for striped file
 Key: HDFS-9386
 URL: https://issues.apache.org/jira/browse/HDFS-9386
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Zhe Zhang


I've seen this bug a few times. The returned {{LocatedBlock}} from 
{{updateBlockForPipeline}} is sometimes not {{LocatedStripedBlock}}. However, 
{{FSNamesystem#bumpBlockGenerationStamp}} did return a {{LocatedStripedBlock}}. 
Maybe a bug in PB. I'm still debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9344) QUEUE_WITH_CORRUPT_BLOCKS is no longer needed

2015-10-30 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9344:
---

 Summary: QUEUE_WITH_CORRUPT_BLOCKS is no longer needed
 Key: HDFS-9344
 URL: https://issues.apache.org/jira/browse/HDFS-9344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Priority: Minor


After the change of HDFS-9205, the {{QUEUE_WITH_CORRUPT_BLOCKS}} queue in 
{{UnderReplicatedBlocks}} is no longer needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9344) QUEUE_WITH_CORRUPT_BLOCKS is no longer needed

2015-10-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-9344.
-
Resolution: Invalid

Thanks Xiao! Good analysis! Closing this as invalid.

> QUEUE_WITH_CORRUPT_BLOCKS is no longer needed
> -
>
> Key: HDFS-9344
> URL: https://issues.apache.org/jira/browse/HDFS-9344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Xiao Chen
>Priority: Minor
>
> After the change of HDFS-9205, the {{QUEUE_WITH_CORRUPT_BLOCKS}} queue in 
> {{UnderReplicatedBlocks}} is no longer needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9329) TestBootstrapStandby#testRateThrottling is flaky because fsimage size is smaller than IO buffer size

2015-10-28 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9329:
---

 Summary: TestBootstrapStandby#testRateThrottling is flaky because 
fsimage size is smaller than IO buffer size
 Key: HDFS-9329
 URL: https://issues.apache.org/jira/browse/HDFS-9329
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor


{{testRateThrottling}} verifies that bootstrap transfer should timeout with a 
very small {{DFS_IMAGE_TRANSFER_BOOTSTRAP_STANDBY_RATE_KEY}} value. However, 
throttling on the image sender only happens after sending each IO buffer. 
Therefore, the test sometimes fails if the receiver receives the full fsimage 
(which is smaller than IO buffer size) before throttling begins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9280) Document NFS gateway export point parameter

2015-10-21 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9280:
---

 Summary: Document NFS gateway export point parameter
 Key: HDFS-9280
 URL: https://issues.apache.org/jira/browse/HDFS-9280
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Priority: Trivial


We should document the {{nfs.export.point}} configuration parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7285) Erasure Coding Support inside HDFS

2015-09-30 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-7285.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0

I just did a final {{git merge}} to sync with trunk and [~andrew.wang] helped 
push the HDFS-7285 branch to trunk. Resolving this JIRA now; let's keep working 
on follow-on tasks under HDFS-8031.

Thanks very much for all contributors to EC phase I, as well as the helpful 
discussions in the community.

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Fix For: 3.0.0
>
> Attachments: Compare-consolidated-20150824.diff, 
> Consolidated-20150707.patch, Consolidated-20150806.patch, 
> Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, 
> HDFS-7285-Consolidated-20150911.patch, HDFS-7285-initial-PoC.patch, 
> HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> HDFSErasureCodingSystemTestPlan-20150824.pdf, 
> HDFSErasureCodingSystemTestReport-20150826.pdf, fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9119:
---

 Summary: Discrepancy between edit log tailing interval and RPC 
timeout for transitionToActive
 Key: HDFS-9119
 URL: https://issues.apache.org/jira/browse/HDFS-9119
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.7.1
Reporter: Zhe Zhang


{{EditLogTailer}} on standby NameNode tails edits from active NameNode every 2 
minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.

If active NameNode encounters very intensive metadata workload (in particular, 
a lot of {{AddOp}} and {{MkDir}} operations to create new files and 
directories), the amount of updates accumulated in the 2 mins edit log tailing 
interval is hard for the standby NameNode to catch up in the 1 min timeout 
window. If that happens, the FailoverController will timeout and give up trying 
to transition the standby to active. The old ANN will resume adding more edits. 
When the SbNN finally finishes catching up the edits and tries to become 
active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9097) Erasure coding: update EC command "-s" flag to "-p" when specifying policy

2015-09-17 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9097:
---

 Summary: Erasure coding: update EC command "-s" flag to "-p" when 
specifying policy
 Key: HDFS-9097
 URL: https://issues.apache.org/jira/browse/HDFS-9097
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang


HDFS-8833 missed this update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9098) Erasure coding: emulate race conditions among striped streamers in write pipeline

2015-09-17 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9098:
---

 Summary: Erasure coding: emulate race conditions among striped 
streamers in write pipeline
 Key: HDFS-9098
 URL: https://issues.apache.org/jira/browse/HDFS-9098
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang


Apparently the interleaving of events among {{StripedDataStreamer}}s is very 
tricky to handle. [~walter.k.su] and [~jingzhao] have discussed several race 
conditions under HDFS-9040.

Let's use FaultInjector to emulate different combinations of interleaved events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8373) Ec files can't be deleted into Trash because of that Trash isn't EC zone.

2015-09-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8373.
-
Resolution: Not A Problem

With HDFS-8833 we should be able to delete EC files into Trash.

> Ec files can't be deleted into Trash because of that Trash isn't EC zone.
> -
>
> Key: HDFS-8373
> URL: https://issues.apache.org/jira/browse/HDFS-8373
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: GAO Rui
>Assignee: Brahma Reddy Battula
>  Labels: EC
>
> When EC files were deleted, they would be moved into {{Trash}} directory. 
> But, EC files can only be placed under EC zone. So, EC files which have been 
> deleted can not be moved to {{Trash}} directory.
> Problem could be solved by creating a EC zone(floder) inside {{Trash}} to 
> contain deleted EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9079) Erasure coding: preallocate multiple generation stamps when creating striped blocks

2015-09-14 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9079:
---

 Summary: Erasure coding: preallocate multiple generation stamps 
when creating striped blocks
 Key: HDFS-9079
 URL: https://issues.apache.org/jira/browse/HDFS-9079
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang


A non-striped DataStreamer goes through the following steps in error handling:
{code}
1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies 
new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on 
NN
{code}
To simplify the above we can preallocate GS when NN creates a new striped block 
group ({{FSN#createNewBlock}}). For each new striped block group we can reserve 
{{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can be saved. 
If more than {{NUM_PARITY_BLOCKS}} errors have happened we shouldn't try to 
further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9050) updatePipeline RPC call should only take new GS as input

2015-09-10 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9050:
---

 Summary: updatePipeline RPC call should only take new GS as input
 Key: HDFS-9050
 URL: https://issues.apache.org/jira/browse/HDFS-9050
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang


The only usage of the call is in {{DataStreamer#updatePipeline}}, where 
{{newBlock}} differs from current {{block}} only in GS.

Basically the RPC call is not supposed to update the {{poolID}}, {{ID}}, and 
{{numBytes}} of the block on NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8996) Remove {{scanEditLog}}

2015-08-31 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8996:
---

 Summary: Remove {{scanEditLog}}
 Key: HDFS-8996
 URL: https://issues.apache.org/jira/browse/HDFS-8996
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, namenode
Affects Versions: 2.0.0-alpha
Reporter: Zhe Zhang


After HDFS-8965 is committed, {{scanEditLog}} will be identical to 
{{validateEditLog}} in {{EditLogInputStream}} and {{FSEditlogLoader}}. This is 
a place holder for us to remove the redundant {{scanEditLog}} code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8985.
-
Resolution: Invalid

 Restarted namenode suffer from block report storm
 -

 Key: HDFS-8985
 URL: https://issues.apache.org/jira/browse/HDFS-8985
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Zhihua Deng
Priority: Trivial
  Labels: test





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8987) Erasure coding: MapReduce job failed when I set the / folder to the EC zone

2015-08-28 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8987.
-
Resolution: Fixed

 Erasure coding: MapReduce job failed when I set the / folder to the EC zone 
 

 Key: HDFS-8987
 URL: https://issues.apache.org/jira/browse/HDFS-8987
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: HDFS
Affects Versions: 3.0.0
Reporter: Lifeng Wang

 Test progress is as follows
  * For a new cluster, I format the namenode and then start HDFS service.
  * After HDFS service is started, there is no files in  HDFS and set the / 
 folder to the EC zone. the EC zone is created successfully.
  * Start the yarn and mr JobHistoryServer services. All the services start 
 successfully.
  * Then run hadoop example pi program and it failed.
 The following is the exception.
 {noformat}
  ```
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.UnsupportedActionException):
  Cannot set replication to a file with striped blocks
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetReplication(FSDirAttrOp.java:391)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication(FSDirAttrOp.java:151)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2231)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:682)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:445)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2165)
 ```
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8987) Erasure coding: MapReduce job failed when I set the / folder to the EC zone

2015-08-28 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8987:
-

 Erasure coding: MapReduce job failed when I set the / folder to the EC zone 
 

 Key: HDFS-8987
 URL: https://issues.apache.org/jira/browse/HDFS-8987
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: HDFS
Affects Versions: 3.0.0
Reporter: Lifeng Wang

 Test progress is as follows
  * For a new cluster, I format the namenode and then start HDFS service.
  * After HDFS service is started, there is no files in  HDFS and set the / 
 folder to the EC zone. the EC zone is created successfully.
  * Start the yarn and mr JobHistoryServer services. All the services start 
 successfully.
  * Then run hadoop example pi program and it failed.
 The following is the exception.
 {noformat}
  ```
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.UnsupportedActionException):
  Cannot set replication to a file with striped blocks
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetReplication(FSDirAttrOp.java:391)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication(FSDirAttrOp.java:151)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2231)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:682)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:445)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2165)
 ```
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8987) Erasure coding: MapReduce job failed when I set the / folder to the EC zone

2015-08-28 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8987.
-
Resolution: Duplicate

 Erasure coding: MapReduce job failed when I set the / folder to the EC zone 
 

 Key: HDFS-8987
 URL: https://issues.apache.org/jira/browse/HDFS-8987
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: HDFS
Affects Versions: 3.0.0
Reporter: Lifeng Wang

 Test progress is as follows
  * For a new cluster, I format the namenode and then start HDFS service.
  * After HDFS service is started, there is no files in  HDFS and set the / 
 folder to the EC zone. the EC zone is created successfully.
  * Start the yarn and mr JobHistoryServer services. All the services start 
 successfully.
  * Then run hadoop example pi program and it failed.
 The following is the exception.
 {noformat}
  ```
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.UnsupportedActionException):
  Cannot set replication to a file with striped blocks
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetReplication(FSDirAttrOp.java:391)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication(FSDirAttrOp.java:151)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2231)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:682)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:445)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2171)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2165)
 ```
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8982) Consolidate getFileReplication and getPreferredBlockReplication in INodeFile

2015-08-27 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8982:
---

 Summary: Consolidate getFileReplication and 
getPreferredBlockReplication in INodeFile
 Key: HDFS-8982
 URL: https://issues.apache.org/jira/browse/HDFS-8982
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang


Currently {{INodeFile}} provides both {{getFileReplication}} and 
{{getPreferredBlockReplication}} interfaces. At the very least they should be 
renamed (e.g. {{getCurrentFileReplication}} and 
{{getMaxConfiguredFileReplication}}), with clearer Javadoc.

I also suspect we are not using them correctly in all places right now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8985) Restarted namenode suffer from block report storm

2015-08-27 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8985:
-

[~dengzh] Could you re-close the case with the correct Resolution? I guess 
it's Invalid?

 Restarted namenode suffer from block report storm
 -

 Key: HDFS-8985
 URL: https://issues.apache.org/jira/browse/HDFS-8985
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Zhihua Deng
Priority: Trivial
  Labels: test





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8982) Consolidate getFileReplication and getPreferredBlockReplication in INodeFile

2015-08-27 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8982:
-

bq. The so-called getPerferredBlockReplication() records the maximum 
replication factor of the file w.r.t. the current and all snapshot state of the 
file.
[~wheat9] By using so-called I take that you agree we should at least give it 
a better name (which was suggested in the JIRA description). Reopening the 
issue based on the consensus.

 Consolidate getFileReplication and getPreferredBlockReplication in INodeFile
 

 Key: HDFS-8982
 URL: https://issues.apache.org/jira/browse/HDFS-8982
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang

 Currently {{INodeFile}} provides both {{getFileReplication}} and 
 {{getPreferredBlockReplication}} interfaces. At the very least they should be 
 renamed (e.g. {{getCurrentFileReplication}} and 
 {{getMaxConfiguredFileReplication}}), with clearer Javadoc.
 I also suspect we are not using them correctly in all places right now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8964) Provide max TxId when validating in-progress edit log files

2015-08-26 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8964:
---

 Summary: Provide max TxId when validating in-progress edit log 
files
 Key: HDFS-8964
 URL: https://issues.apache.org/jira/browse/HDFS-8964
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang


NN/JN validates in-progress edit log files in multiple scenarios, via 
{{EditLogFile#validateLog}}. The method scans through the edit log file to find 
the last transaction ID.

However, an in-progress edit log file could be actively written to, which 
creates a race condition and causes incorrect data to be read (and later we 
attempt to interpret the data as ops).

Currently {{validateLog}} is used in 3 places:
# NN {{getEditsFromTxid}}
# JN {{getEditLogManifest}}
# NN/JN {{recoverUnfinalizedSegments}}

In the first two scenarios we should provide a maximum TxId to validate in the 
in-progress file. The 3rd scenario won't cause a race condition because only 
non-current in-progress edit log files are validated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8928) Improvements for BlockUnderConstructionFeature: ReplicaUnderConstruction as a separate class and replicas as an array

2015-08-20 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8928:
---

 Summary: Improvements for BlockUnderConstructionFeature: 
ReplicaUnderConstruction as a separate class and replicas as an array
 Key: HDFS-8928
 URL: https://issues.apache.org/jira/browse/HDFS-8928
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.8.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8918) Convert BlockUnderConstructionFeature#replicas form list to array

2015-08-19 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8918.
-
Resolution: Duplicate

 Convert BlockUnderConstructionFeature#replicas form list to array
 -

 Key: HDFS-8918
 URL: https://issues.apache.org/jira/browse/HDFS-8918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.8.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 {{BlockInfoUnderConstruction}} / {{BlockUnderConstructionFeature}} uses a 
 List to store its {{replicas}}. To reduce memory usage, we can use an array 
 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8918) Convert BlockUnderConstructionFeature#replicas form list to array

2015-08-18 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8918:
---

 Summary: Convert BlockUnderConstructionFeature#replicas form list 
to array
 Key: HDFS-8918
 URL: https://issues.apache.org/jira/browse/HDFS-8918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.8.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang


{{BlockInfoUnderConstruction}} / {{BlockUnderConstructionFeature}} uses a List 
to store its {{replicas}}. To reduce memory usage, we can use an array instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface

2015-08-18 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8835.
-
Resolution: Invalid

HDFS-8801 has converted {{BlockInfoUC}} as a feature.

 Convert BlockInfoUnderConstruction as an interface
 --

 Key: HDFS-8835
 URL: https://issues.apache.org/jira/browse/HDFS-8835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per discussion under HDFS-8499, this JIRA aims to convert 
 {{BlockInfoUnderConstruction}} as an interface and 
 {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 
 branch will add {{BlockInfoStripedUnderConstruction}} as another 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8917) Cleanup BlockInfoUnderConstruction from comments and tests

2015-08-18 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8917:
---

 Summary: Cleanup BlockInfoUnderConstruction from comments and tests
 Key: HDFS-8917
 URL: https://issues.apache.org/jira/browse/HDFS-8917
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.8.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor


HDFS-8801 eliminates the {{BlockInfoUnderConstruction}} class. This JIRA is a 
follow-on to cleanup comments and tests which refer to the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature

2015-08-17 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8909:
---

 Summary: Erasure coding: update BlockInfoContiguousUC and 
BlockInfoStripedUC to use BlockUnderConstructionFeature
 Key: HDFS-8909
 URL: https://issues.apache.org/jira/browse/HDFS-8909
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang


HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate 
{{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-03 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8849:
---

 Summary: fsck should report number of missing blocks with 
replication factor 1
 Key: HDFS-8849
 URL: https://issues.apache.org/jira/browse/HDFS-8849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor


HDFS-7165 supports reporting number of blocks with replication factor 1 in 
{{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test

2015-07-31 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8202.
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-7285
Target Version/s: HDFS-7285

+1 on the latest patch. I just committed to the branch. Thanks Xinwei for the 
contribution!

 Improve end to end stirpping file test to add erasure recovering test
 -

 Key: HDFS-8202
 URL: https://issues.apache.org/jira/browse/HDFS-8202
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Fix For: HDFS-7285

 Attachments: HDFS-8202-HDFS-7285.003.patch, 
 HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, 
 HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch


 This to follow on HDFS-8201 to add erasure recovering test in the end to end 
 stripping file test:
 * After writing certain blocks to the test file, delete some block file;
 * Read the file content and compare, see if any recovering issue, or verify 
 the erasure recovering works or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8846) Create edit log files with old layout version for upgrade testing

2015-07-31 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8846:
---

 Summary: Create edit log files with old layout version for upgrade 
testing
 Key: HDFS-8846
 URL: https://issues.apache.org/jira/browse/HDFS-8846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang


Per discussion under HDFS-8480, we should create some edit log files with old 
layout version, to test whether they can be correctly handled in upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8839) Erasure Coding: client occasionally gets less block locations when some datanodes fail

2015-07-31 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8839.
-
Resolution: Duplicate

Thanks Bo for identifying this. I think this is a duplicate of HDFS-8220. 

 Erasure Coding: client occasionally gets less block locations when some 
 datanodes fail 
 ---

 Key: HDFS-8839
 URL: https://issues.apache.org/jira/browse/HDFS-8839
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo

 9 datanodes, write two block groups. A datanode dies when writing the first 
 block group. When client retrieves the second block group from namenode, the 
 returned block group only contains 8 locations occasionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck

2015-07-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8768.
-
Resolution: Duplicate

 Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
 -

 Key: HDFS-8768
 URL: https://issues.apache.org/jira/browse/HDFS-8768
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: GAO Rui
 Attachments: Screen Shot 2015-07-14 at 15.33.08.png, 
 screen-shot-with-HDFS-8779-patch.PNG


 This is duplicated by [HDFS-8779].
 For example, In WebUI( usually, namenode port: 50070) , one Erasure Code   
 file with one block group was displayed as the attached screenshot [^Screen 
 Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of 
 the same file was displayed like: {{0. 
 BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 
 len=6438256640}}
 After checking block file names in datanodes, we believe WebUI may have some 
 problem with Erasure Code block group display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-28 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8833:
---

 Summary: Erasure coding: store EC schema and cell size with 
INodeFile and eliminate EC zones
 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang


We have [discussed | 
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
 storing EC schema with files instead of EC zones and recently revisited the 
discussion under HDFS-8059.

As a recap, the _zone_ concept has severe limitations including renaming and 
nested configuration. Those limitations are valid in encryption for security 
reasons and it doesn't make sense to carry them over in EC.

This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
simplicity, we should first implement it as an xattr and consider memory 
optimizations (such as moving it to file header) as a follow-on. We should also 
disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface

2015-07-28 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8835:
---

 Summary: Convert BlockInfoUnderConstruction as an interface
 Key: HDFS-8835
 URL: https://issues.apache.org/jira/browse/HDFS-8835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang


Per discussion under HDFS-8499, this JIRA aims to convert 
{{BlockInfoUnderConstruction}} as an interface and 
{{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 
branch will add {{BlockInfoStripedUnderConstruction}} as another implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile

2015-07-28 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8728.
-
Resolution: Later

Since HDFS-8499 is reopened, closing this one. We should revisit it after 
finalizing the HDFS-8499 discussion.

 Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
 ---

 Key: HDFS-8728
 URL: https://issues.apache.org/jira/browse/HDFS-8728
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8728-HDFS-7285.00.patch, 
 HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, 
 HDFS-8728-HDFS-7285.03.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, 
 HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, 
 Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, 
 Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, 
 Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8499) Refactor BlockInfo class hierarchy with static helper class

2015-07-27 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8499:
-

 Refactor BlockInfo class hierarchy with static helper class
 ---

 Key: HDFS-8499
 URL: https://issues.apache.org/jira/browse/HDFS-8499
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.8.0

 Attachments: HDFS-8499.00.patch, HDFS-8499.01.patch, 
 HDFS-8499.02.patch, HDFS-8499.03.patch, HDFS-8499.04.patch, 
 HDFS-8499.05.patch, HDFS-8499.06.patch, HDFS-8499.07.patch, 
 HDFS-8499.UCFeature.patch, HDFS-bistriped.patch


 In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a 
 common abstraction for striped and contiguous UC blocks. This JIRA aims to 
 merge it to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8806) Inconsistent metrics: number of missing blocks with replication factor 1 not properly cleared

2015-07-22 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8806:
---

 Summary: Inconsistent metrics: number of missing blocks with 
replication factor 1 not properly cleared
 Key: HDFS-8806
 URL: https://issues.apache.org/jira/browse/HDFS-8806
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang


HDFS-7165 introduced a new metric for _number of missing blocks with 
replication factor 1_. It is maintained as 
{{UnderReplicatedBlocks#corruptReplOneBlocks}}. However, that variable is not 
reset when other {{UnderReplicatedBlocks}} are cleared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-07-21 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8801:
---

 Summary: Convert BlockInfoUnderConstruction as a feature
 Key: HDFS-8801
 URL: https://issues.apache.org/jira/browse/HDFS-8801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang


Per discussion under HDFS-8499, with the erasure coding feature, there will be 
4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, 
{{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same 
challenge with {{INodeFile}} and the solution was building feature classes like 
{{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on 
{{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8784) BlockInfo#numNodes should be numStorages

2015-07-15 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8784:
---

 Summary: BlockInfo#numNodes should be numStorages
 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang


The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

2015-07-15 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8786:
---

 Summary: Erasure coding: DataNode should transfer striped blocks 
before being decommissioned
 Key: HDFS-8786
 URL: https://issues.apache.org/jira/browse/HDFS-8786
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang


Per [discussion | 
https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
 under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk

2015-07-15 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8787:
---

 Summary: Erasure coding: rename BlockInfoContiguousUC and 
BlockInfoStripedUC to be consistent with trunk
 Key: HDFS-8787
 URL: https://issues.apache.org/jira/browse/HDFS-8787
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang


As Nicholas suggested under HDFS-8728, we should split the patch on 
{{BlockInfo}} structure into smaller pieces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8751) Remove setBlocks API from INodeFile and misc code cleanup

2015-07-10 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8751:
---

 Summary: Remove setBlocks API from INodeFile and misc code cleanup
 Key: HDFS-8751
 URL: https://issues.apache.org/jira/browse/HDFS-8751
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang


The public {{INodeFile#setBlocks}} API, when used outside {{INodeFile}}, is 
always called with {{null}}. Therefore we should replace it with a safer 
{{clearBlocks}} API. Also merging code cleanups from HDFS-7285 branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8497) ErasureCodingWorker fails to do decode work

2015-07-09 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8497.
-
Resolution: Duplicate

Thanks for the comment Jing. Closing this as a duplicate of HDFS-8328.

 ErasureCodingWorker fails to do decode work
 ---

 Key: HDFS-8497
 URL: https://issues.apache.org/jira/browse/HDFS-8497
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8497-HDFS-7285-01.patch


 When I run the unit test in HDFS-8449, it fails due to the decode error in 
 ErasureCodingWorker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile

2015-07-08 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-8058:
-

Per discussion under HDFS-7285 and HDFS-8728, we should revisit the use of 
{{BlockInfoStriped}} and {{BlockInfoContiguous}} before merging into trunk.

 Erasure coding: use BlockInfo[] for both striped and contiguous blocks in 
 INodeFile
 ---

 Key: HDFS-8058
 URL: https://issues.apache.org/jira/browse/HDFS-8058
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8058.001.patch, HDFS-8058.002.patch


 This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous 
 blocks in INodeFile.
 Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped 
 blocks, and the methods there duplicate with those in INodeFile, and current 
 code need to judge {{isStriped}} then do different things. Also if file is 
 striped, the {{blocks}} in INodeFile occupy a reference memory space.
 These are not necessary, and we can use the same {{blocks}} to make code more 
 clear.
 I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file 
 a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from 
 *BlockInfoStriped* to INodeFile, since ideally they are the same for all 
 striped blocks in a file, and store them in block will waste NN memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >