date:20101007

[jira] Updated: (HDFS-1444) Test related code of build.xml is error-prone and needs to be re-aligned.

2010-10-07 Thread Konstantin Boudnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1444:
-

Attachment: HDFS-1444.patch

Here's the patch which merge two different properties for test classes' 
locations into one and removes redundant 'copy' logic from the test-jar target. 

All test have passes (except for well-known 6 or 7 failures). Jar files for 
test classes and their source code are the same before and after this patch's 
application.

> Test related code of build.xml is error-prone and needs to be re-aligned.
> -
>
> Key: HDFS-1444
> URL: https://issues.apache.org/jira/browse/HDFS-1444
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.1
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
>Priority: Minor
> Attachments: HDFS-1444.patch
>
>
> Test related parts of build.xml introduce at least two places (effectively 
> different) for test classes destination compilation.
> Then some extra logic is applied at say test-jar creation step where the 
> content of one is copied over to another. Etc.
> This seems to be overcomplicated and is better be fixed to prevent possible 
> issues with future build modificaitons.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-1443) Improve Datanode startup time

2010-10-07 Thread Matt Foley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1443:
-

Description: 
One of the factors slowing down cluster restart is the startup time for the 
Datanodes.  In particular, if Upgrade is needed, the Datanodes must do a 
Snapshot and this can take 5-15 minutes per volume, serially.  Thus, for a 
4-disk datanode, it may be 45 minutes before it is ready to send its initial 
Block Report to the Namenode.  This is an umbrella bug for the following four 
pieces of work to improve Datanode startup time:

1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
once per directory instead of once per file.  This is the biggest villain, 
responsible for 90% of that 45 minute delay.  See subordinate bug for details.

2. Refactor Upgrade process in DataStorage to run volume-parallel.  There is 
already a bug open for this, HDFS-270, and the volume-parallel work in 
DirectoryScanner from HDFS-854 is a good foundation to build on.

3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they 
share data and run volume-parallel.  Currently the two constructors for 
in-memory directory tree and replicas map run THREE full scans of the entire 
disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in 
addToReplicasMap().  During each scan, a new File object is created for each of 
the 100,000 or so items in the native file system (for a 50,000-block node).  
This impacts GC as well as disk traffic.

4. Make getGenerationStampFromFile() more efficient.  Currently this routine is 
called by addToReplicasMap() for every blockfile in the directory tree, and it 
walks the listing of each file's containing directory on every call.  There is 
a simple refactoring that makes this unnecessary.



  was:
One of the factors slowing down cluster restart is the startup time for the 
Datanodes.  In particular, if Upgrade is needed, the Datanodes must do a 
Snapshot and this can take 5-15 minutes per volume, serially.  Thus, for a 
4-disk datanode, it may be 45 minutes before it is ready to send its initial 
Block Report to the Namenode.  This is an umbrella bug for the following four 
pieces of work to improve Datanode startup time:

1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
once per directory instead of once per file.  This is the biggest villain, 
responsible for 90% of that 45 minute delay.  See subordinate bug for details.

2. Refactor Upgrade process in DataStorage to run volume-parallel.  There is 
already a bug open for this, HDFS-270, and the volume-parallel work in 
DirectoryScanner from HDFS-854 is a good foundation to build on.

3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they 
share data and run volume-parallel.  Currently the two constructors for 
in-memory directory tree and replicas map run THREE full scans of the entire 
disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in 
addToReplicasMap().  During each scan, a new File object is created for each of 
the 100,000 or so items in the native file system (for a 50,000-block node).  
This impacts GC as well as disk traffic.

4. Make getGenerationStampFromFile() more efficient.  Currently this routine is 
called by addToReplicasMap() for every blockfile in the directory tree, and it 
does a full listing of each file's containing directory on every call.  This is 
the equivalent of doing lots MORE full disk scans.  The underlying disk i/o 
buffers probably prevent disk thrashing, but we are still creating bazillions 
of unnecessary File objects that need to be GC'ed.  There is a simple 
refactoring that prevents this.




> Improve Datanode startup time
> -
>
> Key: HDFS-1443
> URL: https://issues.apache.org/jira/browse/HDFS-1443
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.22.0
>
>
> One of the factors slowing down cluster restart is the startup time for the 
> Datanodes.  In particular, if Upgrade is needed, the Datanodes must do a 
> Snapshot and this can take 5-15 minutes per volume, serially.  Thus, for a 
> 4-disk datanode, it may be 45 minutes before it is ready to send its initial 
> Block Report to the Namenode.  This is an umbrella bug for the following four 
> pieces of work to improve Datanode startup time:
> 1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
> once per directory instead of once per file.  This is the biggest villain, 
> responsible for 90% of that 45 minute delay.  See subordinate bug for details.
> 2. Refactor Upgrade process in DataStorage to run volume-parallel.  There is 
> already a bug open for this, HDFS-270

[jira] Updated: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2010-10-07 Thread Matt Foley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1445:
-

  Component/s: data-node
Affects Version/s: 0.20.2
Fix Version/s: 0.22.0
 Assignee: Matt Foley

> Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
> once per directory instead of once per file
> --
>
> Key: HDFS-1445
> URL: https://issues.apache.org/jira/browse/HDFS-1445
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.22.0
>
>
> It was a bit of a puzzle why we can do a full scan of a disk in about 30 
> seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes 
> to do Upgrade replication via hardlinks.  It turns out that the 
> org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
> Runtime.getRuntime().exec(), to utilize native filesystem hardlink 
> capability.  So it is forking a full-weight external process, and we call it 
> on each individual file to be replicated.
> As a simple check on the possible cost of this approach, I built a Perl test 
> script (under Linux on a production-class datanode).  Perl also uses a 
> compiled and optimized p-code engine, and it has both native support for 
> hardlinks and the ability to do "exec".  
> -  A simple script to create 256,000 files in a directory tree organized like 
> the Datanode, took 10 seconds to run.
> -  Replicating that directory tree using hardlinks, the same way as the 
> Datanode, took 12 seconds using native hardlink support.
> -  The same replication using outcalls to exec, one per file, took 256 
> seconds!
> -  Batching the calls, and doing 'exec' once per directory instead of once 
> per file, took 16 seconds.
> Obviously, your mileage will vary based on the number of blocks per volume.  
> A volume with less than about 4000 blocks will have only 65 directories.  A 
> volume with more than 4K and less than about 250K blocks will have 4200 
> directories (more or less).  And there are two files per block (the data file 
> and the .meta file).  So the average number of files per directory may vary 
> from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K 
> files per volume, or an average of about 6:1.  So this change may be expected 
> to take it down from, say, 12 minutes per volume to 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1447) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block

2010-10-07 Thread Matt Foley (JIRA)

Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full 
directory listing for every block


 Key: HDFS-1447
 URL: https://issues.apache.org/jira/browse/HDFS-1447
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.22.0


Make getGenerationStampFromFile() more efficient. Currently this routine is 
called by addToReplicasMap() for every blockfile in the directory tree, and it 
walks each file's containing directory on every call. There is a simple 
refactoring that should make it more efficient.

This work item is one of four sub-tasks for HDFS-1443, Improve Datanode startup 
time.
The fix will probably be folded into sibling task HDFS-1446, which is already 
refactoring the method that calls getGenerationStampFromFile().


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1446) Refactor the start-time Directory Tree and Replicas Map constructors to share data and run volume-parallel

2010-10-07 Thread Matt Foley (JIRA)

Refactor the start-time Directory Tree and Replicas Map constructors to share 
data and run volume-parallel
--

 Key: HDFS-1446
 URL: https://issues.apache.org/jira/browse/HDFS-1446
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.22.0


Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they share 
data and run volume-parallel. Currently the two constructors for in-memory 
directory tree and replicas map run THREE full scans of the entire disk - once 
in FSDir(), once in recoverTempUnlinkedBlock(), and once in addToReplicasMap(). 
During each scan, a new File object is created for each of the 100,000 or so 
items in the native file system (for a 50,000-block node). This impacts GC as 
well as disk traffic.

This work item is one of four sub-tasks for HDFS-1443, Improve Datanode startup 
time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-1442) Api to get delegation token in Hdfs

2010-10-07 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1442:
---

Attachment: HDFS-1442.2.patch

> Api to get delegation token in Hdfs
> ---
>
> Key: HDFS-1442
> URL: https://issues.apache.org/jira/browse/HDFS-1442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-1442.2.patch
>
>
> FileContext uses Hdfs instead of DistributedFileSystem. We need to add 
> delegation token APIs in Hdfs class as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

2010-10-07 Thread Matt Foley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-270:


  Component/s: data-node
Affects Version/s: (was: 0.22.0)
   0.20.2
 Tags: datanode startup, volume parallel
Fix Version/s: 0.22.0

The principle issue with datanode upgrade speed turned out to be the per-file 
outcall to "exec", see HDFS-1445.
However, running the upgrade volume-parallel is still very worthwhile, 
especially as we are moving to 12-disk standard nodes.
Placing this work item under the umbrella bug HDFS-1443.

> DFS Upgrade should process dfs.data.dirs in parallel
> 
>
> Key: HDFS-270
> URL: https://issues.apache.org/jira/browse/HDFS-270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Stu Hood
>Assignee: Matt Foley
> Fix For: 0.22.0
>
>
> I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a 
> little slowly.
> The main reason the upgrade took so long was the block upgrades on the 
> datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir 
> parameter. From looking at the logs, it is fairly clear that the upgrade 
> procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
> I think even if all of your dfs.data.dir's are on the same physical device, 
> there would still be an advantage to performing the upgrade process in 
> parallel. The less downtime, the better: especially if it is potentially 20 
> minutes versus 60 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

2010-10-07 Thread Matt Foley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-270:


Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-1443

> DFS Upgrade should process dfs.data.dirs in parallel
> 
>
> Key: HDFS-270
> URL: https://issues.apache.org/jira/browse/HDFS-270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 0.22.0
>Reporter: Stu Hood
>Assignee: Matt Foley
>
> I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a 
> little slowly.
> The main reason the upgrade took so long was the block upgrades on the 
> datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir 
> parameter. From looking at the logs, it is fairly clear that the upgrade 
> procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
> I think even if all of your dfs.data.dir's are on the same physical device, 
> there would still be an advantage to performing the upgrade process in 
> parallel. The less downtime, the better: especially if it is potentially 20 
> minutes versus 60 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2010-10-07 Thread Matt Foley (JIRA)

Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once 
per directory instead of once per file
--

 Key: HDFS-1445
 URL: https://issues.apache.org/jira/browse/HDFS-1445
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Matt Foley


It was a bit of a puzzle why we can do a full scan of a disk in about 30 
seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes to 
do Upgrade replication via hardlinks.  It turns out that the 
org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
Runtime.getRuntime().exec(), to utilize native filesystem hardlink capability.  
So it is forking a full-weight external process, and we call it on each 
individual file to be replicated.

As a simple check on the possible cost of this approach, I built a Perl test 
script (under Linux on a production-class datanode).  Perl also uses a compiled 
and optimized p-code engine, and it has both native support for hardlinks and 
the ability to do "exec".  
-  A simple script to create 256,000 files in a directory tree organized like 
the Datanode, took 10 seconds to run.
-  Replicating that directory tree using hardlinks, the same way as the 
Datanode, took 12 seconds using native hardlink support.
-  The same replication using outcalls to exec, one per file, took 256 seconds!
-  Batching the calls, and doing 'exec' once per directory instead of once per 
file, took 16 seconds.

Obviously, your mileage will vary based on the number of blocks per volume.  A 
volume with less than about 4000 blocks will have only 65 directories.  A 
volume with more than 4K and less than about 250K blocks will have 4200 
directories (more or less).  And there are two files per block (the data file 
and the .meta file).  So the average number of files per directory may vary 
from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K files 
per volume, or an average of about 6:1.  So this change may be expected to take 
it down from, say, 12 minutes per volume to 2.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-07 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919079#action_12919079
 ] 

Doug Cutting commented on HDFS-1435:


Hairong, Avro's file format has little overhead.  It supports compression.  
However it assumes that a file is composed of a sequence of entries with a the 
same schema.  The fsimage has various sections.  The header information could 
be added as Avro file metadata.  The files and directories, datanodes and files 
under construction are currently written as separate blocks.  Instead, the 
schema for every item might be something like a union of [File, Directory, 
Symlink, DataNode, FileUnderConstruction].

> Provide an option to store fsimage compressed
> -
>
> Key: HDFS-1435
> URL: https://issues.apache.org/jira/browse/HDFS-1435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
>
> Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
> bandwidth when secondary NN uploads a new fsimage to primary NN.
> If we could store fsimage compressed, the problem could be greatly alleviated.
> I plan to provide a new configuration hdfs.image.compressed with a default 
> value of false. If it is set to be true, fsimage is stored as compressed.
> The fsimage will have a new layout with a new field "compressed" in its 
> header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1444) Test related code of build.xml is error-prone and needs to be re-aligned.

2010-10-07 Thread Konstantin Boudnik (JIRA)

Test related code of build.xml is error-prone and needs to be re-aligned.
-

 Key: HDFS-1444
 URL: https://issues.apache.org/jira/browse/HDFS-1444
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.1
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor


Test related parts of build.xml introduce at least two places (effectively 
different) for test classes destination compilation.
Then some extra logic is applied at say test-jar creation step where the 
content of one is copied over to another. Etc.

This seems to be overcomplicated and is better be fixed to prevent possible 
issues with future build modificaitons.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-1150) Verify datanodes' identities to clients in secure clusters

2010-10-07 Thread Jakob Homan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1150:
--

Attachment: RequireSecurePorts.patch

Small follow-up patch.  Our Ops team had requested to not have the secure 
datanode bail out, but rather give a warning, if non-privileged ports were 
specified, during the transition to secure ports.  Now that this has been done, 
this patch changes the secure datanode to throw a RTE if provided with 
non-privileged ports.  This is for Y!20 only; trunk already has this behavior.

> Verify datanodes' identities to clients in secure clusters
> --
>
> Key: HDFS-1150
> URL: https://issues.apache.org/jira/browse/HDFS-1150
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node
>Affects Versions: 0.22.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.22.0
>
> Attachments: commons-daemon-1.0.2-src.tar.gz, 
> HDFS-1150-BF-Y20-LOG-DIRS-2.patch, HDFS-1150-BF-Y20-LOG-DIRS.patch, 
> HDFS-1150-BF1-Y20.patch, hdfs-1150-bugfix-1.1.patch, 
> hdfs-1150-bugfix-1.2.patch, hdfs-1150-bugfix-1.patch, 
> HDFS-1150-trunk-2.patch, HDFS-1150-trunk-3.patch, HDFS-1150-trunk.patch, 
> HDFS-1150-Y20-BetterJsvcHandling.patch, HDFS-1150-y20.build-script.patch, 
> HDFS-1150-Y20S-ready-5.patch, HDFS-1150-Y20S-ready-6.patch, 
> HDFS-1150-Y20S-ready-7.patch, HDFS-1150-Y20S-ready-8.patch, 
> HDFS-1150-Y20S-Rough-2.patch, HDFS-1150-Y20S-Rough-3.patch, 
> HDFS-1150-Y20S-Rough-4.patch, HDFS-1150-Y20S-Rough.txt, 
> RequireSecurePorts.patch
>
>
> Currently we use block access tokens to allow datanodes to verify clients' 
> identities, however we don't have a way for clients to verify the 
> authenticity of the datanodes themselves.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1443) Improve Datanode startup time

2010-10-07 Thread Matt Foley (JIRA)

Improve Datanode startup time
-

 Key: HDFS-1443
 URL: https://issues.apache.org/jira/browse/HDFS-1443
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.22.0


One of the factors slowing down cluster restart is the startup time for the 
Datanodes.  In particular, if Upgrade is needed, the Datanodes must do a 
Snapshot and this can take 5-15 minutes per volume, serially.  Thus, for a 
4-disk datanode, it may be 45 minutes before it is ready to send its initial 
Block Report to the Namenode.  This is an umbrella bug for the following four 
pieces of work to improve Datanode startup time:

1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
once per directory instead of once per file.  This is the biggest villain, 
responsible for 90% of that 45 minute delay.  See subordinate bug for details.

2. Refactor Upgrade process in DataStorage to run volume-parallel.  There is 
already a bug open for this, HDFS-270, and the volume-parallel work in 
DirectoryScanner from HDFS-854 is a good foundation to build on.

3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they 
share data and run volume-parallel.  Currently the two constructors for 
in-memory directory tree and replicas map run THREE full scans of the entire 
disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in 
addToReplicasMap().  During each scan, a new File object is created for each of 
the 100,000 or so items in the native file system (for a 50,000-block node).  
This impacts GC as well as disk traffic.

4. Make getGenerationStampFromFile() more efficient.  Currently this routine is 
called by addToReplicasMap() for every blockfile in the directory tree, and it 
does a full listing of each file's containing directory on every call.  This is 
the equivalent of doing lots MORE full disk scans.  The underlying disk i/o 
buffers probably prevent disk thrashing, but we are still creating bazillions 
of unnecessary File objects that need to be GC'ed.  There is a simple 
refactoring that prevents this.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1442) Api to get delegation token in Hdfs

2010-10-07 Thread Jitendra Nath Pandey (JIRA)

Api to get delegation token in Hdfs
---

 Key: HDFS-1442
 URL: https://issues.apache.org/jira/browse/HDFS-1442
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


FileContext uses Hdfs instead of DistributedFileSystem. We need to add 
delegation token APIs in Hdfs class as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

2010-10-07 Thread Jeff Hammerbacher (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918800#action_12918800
 ] 

Jeff Hammerbacher commented on HDFS-1435:
-

bq. Jeff, I'd like to take a look at the Avro file format. Do you know if Avro 
file format has any overhead than the current fsimage format?

I don't know about the current fsimage format. The Avro format, however, is 
detailed in the Avro spec: 
http://avro.apache.org/docs/current/spec.html#Object+Container+Files

> Provide an option to store fsimage compressed
> -
>
> Key: HDFS-1435
> URL: https://issues.apache.org/jira/browse/HDFS-1435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
>
> Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network 
> bandwidth when secondary NN uploads a new fsimage to primary NN.
> If we could store fsimage compressed, the problem could be greatly alleviated.
> I plan to provide a new configuration hdfs.image.compressed with a default 
> value of false. If it is set to be true, fsimage is stored as compressed.
> The fsimage will have a new layout with a new field "compressed" in its 
> header, indicating if the namespace is stored compressed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-1444) Test related code of build.xml is error-prone and needs to be re-aligned.

[jira] Updated: (HDFS-1443) Improve Datanode startup time

[jira] Updated: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

[jira] Created: (HDFS-1447) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block

[jira] Created: (HDFS-1446) Refactor the start-time Directory Tree and Replicas Map constructors to share data and run volume-parallel

[jira] Updated: (HDFS-1442) Api to get delegation token in Hdfs

[jira] Updated: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

[jira] Updated: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

[jira] Created: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

[jira] Created: (HDFS-1444) Test related code of build.xml is error-prone and needs to be re-aligned.

[jira] Updated: (HDFS-1150) Verify datanodes' identities to clients in secure clusters

[jira] Created: (HDFS-1443) Improve Datanode startup time

[jira] Created: (HDFS-1442) Api to get delegation token in Hdfs

[jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed

15 matches

Site Navigation

Mail list logo

Footer information