[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2008-01-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560643#action_12560643
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

Nah, this won't work either. Sorry missed that point when we talked.
The new property actually combines values from two old names, and the old names 
have different value types. E.g., 
{code}
dfs.http.bindAddress = dfs.info.bindAddress + dfs.info.port
{code}

So I need to pass 2 deprecated names and one new. This does not sound very 
generic already, besides I will need to
specify the values of the old parameters, which makes it messy.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigConvert2.patch, 
> ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2659) The commands in DFSAdmin should require admin privilege

2008-01-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560616#action_12560616
 ] 

Konstantin Shvachko commented on HADOOP-2659:
-

+1

> The commands in DFSAdmin should require admin privilege
> ---
>
> Key: HADOOP-2659
> URL: https://issues.apache.org/jira/browse/HADOOP-2659
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: 2659_20080118.patch, 2659_20080118b.patch
>
>
> The commands in DFSAdmin and the corresponding RPC calls should require admin 
> privilege.
> DFSAdmin commands:
> -report
> -safemode
> -refreshNodes
> -finalizeUpgrade
> -upgradeProgress
> -metasave
> ClientProtocol:
> {code}
> public void renewLease(String clientName) throws IOException;
> public long[] getStats() throws IOException;
> public DatanodeInfo[] getDatanodeReport(FSConstants.DatanodeReportType type) 
> throws IOException;
> public boolean setSafeMode(FSConstants.SafeModeAction action) throws 
> IOException;
> public void refreshNodes() throws IOException;
> public void finalizeUpgrade() throws IOException;
> public UpgradeStatusReport distributedUpgradeProgress(UpgradeAction action) 
> throws IOException;
> public void metaSave(String filename) throws IOException;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2659) The commands in DFSAdmin should require admin privilege

2008-01-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560604#action_12560604
 ] 

Konstantin Shvachko commented on HADOOP-2659:
-

- renewLease() does not seem to be ab admin command.
- distributedUpgradeProgress() is called by DFSAdmin and by JspHelper. 
In the DFSAdmin case it should be protected, but web UI does not need to have 
have super-user privileges. 
For consistency I would propose just to treat this operation available to all 
users in all cases.
I do not see how knowing the upgrade stage can threaten the system security. Or 
does it?
- I'd prefer a full name checkSuperuserPermissions() instead of checkIsSuper().
- import of FSConstants.SafeModeAction is redundant because FSNamesystem 
inherits FSConstants.

> The commands in DFSAdmin should require admin privilege
> ---
>
> Key: HADOOP-2659
> URL: https://issues.apache.org/jira/browse/HADOOP-2659
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: 2659_20080118.patch
>
>
> The commands in DFSAdmin and the corresponding RPC calls should require admin 
> privilege.
> DFSAdmin commands:
> -report
> -safemode
> -refreshNodes
> -finalizeUpgrade
> -upgradeProgress
> -metasave
> ClientProtocol:
> {code}
> public void renewLease(String clientName) throws IOException;
> public long[] getStats() throws IOException;
> public DatanodeInfo[] getDatanodeReport(FSConstants.DatanodeReportType type) 
> throws IOException;
> public boolean setSafeMode(FSConstants.SafeModeAction action) throws 
> IOException;
> public void refreshNodes() throws IOException;
> public void finalizeUpgrade() throws IOException;
> public UpgradeStatusReport distributedUpgradeProgress(UpgradeAction action) 
> throws IOException;
> public void metaSave(String filename) throws IOException;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

2008-01-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560590#action_12560590
 ] 

Konstantin Shvachko commented on HADOOP-2549:
-

Yes, if there is a doubt whether we should remove these two warnings lets not 
do it as a part of this patch.
But lets not forget to investigate later on.
+1

> hdfs does not honor dfs.du.reserved setting
> ---
>
> Key: HADOOP-2549
> URL: https://issues.apache.org/jira/browse/HADOOP-2549
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.14.4
> Environment: FC Linux.
>Reporter: Joydeep Sen Sarma
>Assignee: Hairong Kuang
>Priority: Critical
> Fix For: 0.16.0
>
> Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. 
> i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every 
> volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is 
> :*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume 
> /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume 
> /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for 
> now.
> this is driving us nuts since our automounter starts failing when we run out 
> of space. so everything's broke.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2633) Revert change to fsck made as part of permissions implementation

2008-01-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560202#action_12560202
 ] 

Konstantin Shvachko commented on HADOOP-2633:
-

+1

> Revert change to fsck made as part of permissions implementation
> 
>
> Key: HADOOP-2633
> URL: https://issues.apache.org/jira/browse/HADOOP-2633
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: 2633_20080116.patch, 2633_20080117.patch, 
> 2633_20080117b.patch, 2633_20080117c.patch
>
>
> Earlier change has unacceptable performance behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

2008-01-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560189#action_12560189
 ] 

Konstantin Shvachko commented on HADOOP-2549:
-

Yes, that should turn on volume switching if one of them is full.
Some comments.
- It is better to move declaration of estimateBlockSize up together with all 
other member declarations
- Use JavaDoc style comment for estimateBlockSize instead of the regular ones. 
That way I can see the description whenever I move the cursor over the variable 
in Eclipse.
- Do we plan to apply it to previous releases 0.14 or 0.15? If not then could 
you please also remove unused pieces of code
-# import org.apache.hadoop.io.Text;
-# private void enumerateThreadGroup()
-# short opStatus



> hdfs does not honor dfs.du.reserved setting
> ---
>
> Key: HADOOP-2549
> URL: https://issues.apache.org/jira/browse/HADOOP-2549
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.14.4
> Environment: FC Linux.
>Reporter: Joydeep Sen Sarma
>Assignee: Hairong Kuang
>Priority: Critical
> Attachments: diskfull.patch, diskfull1.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. 
> i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every 
> volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is 
> :*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume 
> /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume 
> /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for 
> now.
> this is driving us nuts since our automounter starts failing when we run out 
> of space. so everything's broke.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2633) Revert change to fsck made as part of permissions implementation

2008-01-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560163#action_12560163
 ] 

Konstantin Shvachko commented on HADOOP-2633:
-

- nn.namesystem.now() should be FSNamesystem.now()
- Methods should be separate by a blank line.
- there is to many methods called getBlockLocationsInternal(). It took me at 
least 20 minutes to understand who is calling whom. Traditionally the general 
idea of methods and their *Internal counterparts is to distinguish between the 
api methods and their synchronized parts. Synchronized part of the 
implementation is usually called *Internal. It is also supposed to be private.

I propose the following modifications here:
- getBlockLocationsInternal(String src,long,long) should be renamed to 
getBlockLocations(String src,long,long) because you need to call ii in 
NamenodeFsck.
- getBlockLocationsInternal(String clientMachine,String src,long,long)  should 
be removed and the sorting part of it should be placed directly into 
getBlockLocations(String clientMachine,String src,long,long).
- the private getBlockLocationInternal(INodeFile, ...) should be renamed to 
getBlockLocationsInternal(INodeFile, ...) with an 's' in the middle. This was 
probably my fault.

As a result you will have only one private synchronized 
getBlockLocationsInternal() and two getBlockLocations().

> Revert change to fsck made as part of permissions implementation
> 
>
> Key: HADOOP-2633
> URL: https://issues.apache.org/jira/browse/HADOOP-2633
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: 2633_20080116.patch
>
>
> Earlier change has unacceptable performance behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2634) Deprecate exists() and isDir() to simplify ClientProtocol.

2008-01-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560094#action_12560094
 ] 

Konstantin Shvachko commented on HADOOP-2634:
-

- Right now getFileInfo() - an *HDFS* variant of getFileStatus() - throws
{code} IOException("File does not exist: " + srcs); {code}
- *LocalFileSystem* does not throw anything but actually returns a valid 
FileStatus with some default values.
- *S3FileSystem* throws 
{code} IOException(f.toString() + ": No such file or directory."); {code} 
- And *kfs* does not seem to be throwing anything just like LocalFileSystem, 
please correct me if I'm wrong.

So this is all really inconsistent. And to make it consistent I would vote for 
throwing rather than returning null, but throwing FileNotFoundException instead 
of the base IOException. Then it would make implementation of exists() rather 
simple.

> Deprecate exists() and isDir() to simplify ClientProtocol.
> --
>
> Key: HADOOP-2634
> URL: https://issues.apache.org/jira/browse/HADOOP-2634
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>
> ClientProtocol can be simplified by removing two methods
> {code}
> public boolean exists(String src) throws IOException;
> public boolean isDir(String src) throws IOException;
> {code}
> This is a redundant api, which can be implemented in DFSClient as convenience 
> methods using
> {code}
> public DFSFileInfo getFileInfo(String src) throws IOException;
> {code}
> Note that we already deprecated several Filesystem method and advised to use 
> getFileStatus() instead.
> Should we deprecate them in 0.16?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-16 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this.

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch, JDocCorrect.patch, JDocCorrect.patch, 
> JDocCorrect2.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2634) Deprecate exists() and isDir() to simplify ClientProtocol.

2008-01-16 Thread Konstantin Shvachko (JIRA)
Deprecate exists() and isDir() to simplify ClientProtocol.
--

 Key: HADOOP-2634
 URL: https://issues.apache.org/jira/browse/HADOOP-2634
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko


ClientProtocol can be simplified by removing two methods
{code}
public boolean exists(String src) throws IOException;
public boolean isDir(String src) throws IOException;
{code}
This is a redundant api, which can be implemented in DFSClient as convenience 
methods using
{code}
public DFSFileInfo getFileInfo(String src) throws IOException;
{code}
Note that we already deprecated several Filesystem method and advised to use 
getFileStatus() instead.
Should we deprecate them in 0.16?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-16 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Attachment: JDocCorrect2.patch

I incorporated all comments, added some formatting to JavaDoc, and clarified 
rename and delete documentations. Thanks, Nicholas.

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch, JDocCorrect.patch, JDocCorrect.patch, 
> JDocCorrect2.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

2008-01-16 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1989:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thank you Sanjay!

> Add support for simulated Data Nodes  - helpful for testing and performance 
> benchmarking of the Name Node without having a large cluster
> 
>
> Key: HADOOP-1989
> URL: https://issues.apache.org/jira/browse/HADOOP-1989
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: SimulatedStoragePatchSubmit.txt, 
> SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, 
> SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt, 
> SimulatedStoragePatchSubmit9.patch
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, 
> protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a 
> large cluster.
>   - Inject faults for testing (e.g. one can add random faults based 
> probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their 
> sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed 
> set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to 
> control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2449) Restore the old NN Bench that was replaced by a MR NN Bench

2008-01-16 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2449:


   Resolution: Fixed
Fix Version/s: 0.16.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thank you Sanjay!

> Restore the  old NN Bench that was replaced by a MR NN Bench
> 
>
> Key: HADOOP-2449
> URL: https://issues.apache.org/jira/browse/HADOOP-2449
> Project: Hadoop
>  Issue Type: Test
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
> Fix For: 0.16.0
>
> Attachments: fixNNBenchPatch.txt
>
>
> The old NN Bench did not use Map Reduce.
> It was replaced by a new NN Bench that uses Map reduce.
> The old NN Bench is useful and should be restored.
>   - useful ofr simulated data niodes which do not work for Map reduce since 
> the job configs need to be persistent.
>   - a NN test that is independent of map reduce can be useful as it is one 
> less variable in figuring out bottlenecks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2561) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly

2008-01-15 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559325#action_12559325
 ] 

Konstantin Shvachko commented on HADOOP-2561:
-

Good catch.
It looks like the problem here is more general than what your solution does.
The client writes to the backup file for as long as it reaches the block size. 
So if the client fails anywhere between 2 individual writes the backupFile will 
not be deleted. Simple try-catch will not solve it. And there is not much you 
can do about it.
This issue should probably be resolved as a part of HADOOP-1707, which intends 
to eliminate the local backup file completely.
Benjamin, could you please take a look at 1707 to make sure the solution works 
for you.

> /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
> ---
>
> Key: HADOOP-2561
> URL: https://issues.apache.org/jira/browse/HADOOP-2561
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs, fs
>Affects Versions: 0.14.0
>Reporter: Benjamin Francisoud
> Attachments: patch-DFSClient-HADOOP-2561.diff
>
>
> Diretory "/tmp/hadoop-${user}/dfs/tmp/tmp" is being filled with those kinfd 
> of files: client-226966559287638337420857.tmp
> I tried to look at the code and found:
> h3. DFSClient.java
> src/java/org/apache/hadoop/dfs/DFSClient.java
> {code:java}
> private void closeBackupStream() throws IOException {...}
> /* Similar to closeBackupStream(). Theoritically deleting a file
>  * twice could result in deleting a file that we should not.
>  */
> private void deleteBackupFile() {...}
> private File newBackupFile() throws IOException {
> String name = "tmp" + File.separator +
>  "client-" + Math.abs(r.nextLong());
> File result = dirAllocator.createTmpFileForWrite(name,
>2 * blockSize,
>conf);
> return result;
> }
> {code}
> h3. LocalDirAllocator
> src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java
> {code:java}
> /** Creates a file on the local FS. Pass size as -1 if not known apriori. We
>  *  round-robin over the set of disks (via the configured dirs) and return
>  *  a file on the first path which has enough space. The file is guaranteed
>  *  to go away when the JVM exits.
>  */
> public File createTmpFileForWrite(String pathStr, long size,
> Configuration conf) throws IOException {
> // find an appropriate directory
> Path path = getLocalPathForWrite(pathStr, size, conf);
> File dir = new File(path.getParent().toUri().getPath());
> String prefix = path.getName();
> // create a temp file on this directory
> File result = File.createTempFile(prefix, null, dir);
> result.deleteOnExit();
> return result;
> }
> {code}
> First it seems to me it's a bit of a mess here I don't know if it's 
> DFSClient.java#deleteBackupFile() or 
> LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... 
> or both. Why not keep it dry and delete it only once.
> But the most important is the "deleteOnExit();" since it mean if it is never 
> restarted it will never delete files :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2605) leading slash in mapred.task.tracker.report.bindAddress

2008-01-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2605:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this.

> leading slash in mapred.task.tracker.report.bindAddress
> ---
>
> Key: HADOOP-2605
> URL: https://issues.apache.org/jira/browse/HADOOP-2605
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf, mapred
>Affects Versions: 0.16.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TrackerBindAddress.patch
>
>
> TaskTracker incorrectly sets mapred.task.tracker.report.bindAddress with a 
> slash in front of the host:port pair.
> This described in more details here: 
> [Deveraj|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554551] 
> and 
> [Konstantin|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554859]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Status: Patch Available  (was: Open)

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch, JDocCorrect.patch, JDocCorrect.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Attachment: JDocCorrect.patch

This reflects current trunk.

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch, JDocCorrect.patch, JDocCorrect.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Status: Open  (was: Patch Available)

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch, JDocCorrect.patch, JDocCorrect.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2601) TestNNThroughput should not use a fixed namenode port

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2601:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this.

> TestNNThroughput should not use a fixed namenode port
> -
>
> Key: HADOOP-2601
> URL: https://issues.apache.org/jira/browse/HADOOP-2601
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.15.2
>Reporter: Hairong Kuang
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TestThrghputBenchZeroPort.patch
>
>
> TestNNThroughput failed with the following error: 
> Address already in use
> java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
> at java.net.ServerSocket.bind(ServerSocket.java:319)
> at java.net.ServerSocket.(ServerSocket.java:185)
> at 
> org.mortbay.util.ThreadedServer.newServerSocket(ThreadedServer.java:391)
> at org.mortbay.util.ThreadedServer.open(ThreadedServer.java:477)
> at org.mortbay.util.ThreadedServer.start(ThreadedServer.java:503)
> at org.mortbay.http.SocketListener.start(SocketListener.java:203)
> at org.mortbay.http.HttpServer.doStart(HttpServer.java:761)
> at org.mortbay.util.Container.start(Container.java:72)
> at 
> org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:182)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:223)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:129)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:174)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:160)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.(NNThroughputBenchmark.java:57)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:752)
> at 
> org.apache.hadoop.dfs.TestNNThroughputBenchmark.testNNThroughput(TestNNThroughputBenchmark.java:15)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2385) Validate configuration parameters

2008-01-14 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558880#action_12558880
 ] 

shv edited comment on HADOOP-2385 at 1/14/08 4:44 PM:
--

I agree, visibility of accessors should depend on whether they are used 
publicly or internally.
Say getters for parameters used only in FSNamesystem should be package private.
Some getters may be public, but the corresponding setters may not.

> find the most-specific public class that encompasses the use and add the 
> accessor there.

I would prefer creating new classes solely dedicated to configuration logic 
rather then including implementation of accessors in existing public classes. 
Imo this makes a better structured code.


  was (Author: shv):
I agree, visibility of accessors should depend on whether they are used 
publicly or internally.
Say getters for parameters used only in FSNamesystem should package private.
Some getters may be public, but the corresponding setters may not.

> find the most-specific public class that encompasses the use and add the 
> accessor there.

I would prefer creating new classes solely dedicated to configuration logic 
rather then including implementation of accessors in existing public classes. 
Imo this makes a better structured code.

  
> Validate configuration parameters
> -
>
> Key: HADOOP-2385
> URL: https://issues.apache.org/jira/browse/HADOOP-2385
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>
> Configuration parameters should be fully validated before name nodes or data 
> nodes begin service.
> Required parameters must be present.
> Required and optional parameters must have values of proper type and range.
> Undefined parameters must not be present.
> (I was recently observing some confusion whose root cause was a mis-spelled 
> parameter.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2385) Validate configuration parameters

2008-01-14 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558880#action_12558880
 ] 

Konstantin Shvachko commented on HADOOP-2385:
-

I agree, visibility of accessors should depend on whether they are used 
publicly or internally.
Say getters for parameters used only in FSNamesystem should package private.
Some getters may be public, but the corresponding setters may not.

> find the most-specific public class that encompasses the use and add the 
> accessor there.

I would prefer creating new classes solely dedicated to configuration logic 
rather then including implementation of accessors in existing public classes. 
Imo this makes a better structured code.


> Validate configuration parameters
> -
>
> Key: HADOOP-2385
> URL: https://issues.apache.org/jira/browse/HADOOP-2385
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>
> Configuration parameters should be fully validated before name nodes or data 
> nodes begin service.
> Required parameters must be present.
> Required and optional parameters must have values of proper type and range.
> Undefined parameters must not be present.
> (I was recently observing some confusion whose root cause was a mis-spelled 
> parameter.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Attachment: JDocCorrect.patch

A few spelling corrections.

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch, JDocCorrect.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2605) leading slash in mapred.task.tracker.report.bindAddress

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2605:


Status: Patch Available  (was: Open)

> leading slash in mapred.task.tracker.report.bindAddress
> ---
>
> Key: HADOOP-2605
> URL: https://issues.apache.org/jira/browse/HADOOP-2605
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf, mapred
>Affects Versions: 0.16.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TrackerBindAddress.patch
>
>
> TaskTracker incorrectly sets mapred.task.tracker.report.bindAddress with a 
> slash in front of the host:port pair.
> This described in more details here: 
> [Deveraj|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554551] 
> and 
> [Konstantin|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554859]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2605) leading slash in mapred.task.tracker.report.bindAddress

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2605:


Attachment: TrackerBindAddress.patch

This is a one line patch that fixes the problem.

> leading slash in mapred.task.tracker.report.bindAddress
> ---
>
> Key: HADOOP-2605
> URL: https://issues.apache.org/jira/browse/HADOOP-2605
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf, mapred
>Affects Versions: 0.16.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TrackerBindAddress.patch
>
>
> TaskTracker incorrectly sets mapred.task.tracker.report.bindAddress with a 
> slash in front of the host:port pair.
> This described in more details here: 
> [Deveraj|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554551] 
> and 
> [Konstantin|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554859]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2605) leading slash in mapred.task.tracker.report.bindAddress

2008-01-14 Thread Konstantin Shvachko (JIRA)
leading slash in mapred.task.tracker.report.bindAddress
---

 Key: HADOOP-2605
 URL: https://issues.apache.org/jira/browse/HADOOP-2605
 Project: Hadoop
  Issue Type: Bug
  Components: conf, mapred
Affects Versions: 0.16.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.16.0


TaskTracker incorrectly sets mapred.task.tracker.report.bindAddress with a 
slash in front of the host:port pair.
This described in more details here: 
[Deveraj|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554551] and 
[Konstantin|http://issues.apache.org/jira/browse/HADOOP-2404#action_12554859]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Fix Version/s: 0.16.0
 Assignee: Konstantin Shvachko
   Status: Patch Available  (was: Open)

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1742) FSNamesystem.startFile() javadoc is inconsistent

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1742:


Attachment: JDocCorrect.patch

This is JavaDoc only patch.
I changed documentation for startFile() and some other FSNamesystem methods.
I also noticed that our ClientProtocol JavaDoc is outdated in some places. So I 
wrote some new comments.
DFSClient had a lot of JavaDoc warnings after security patches. I corrected 
that too.

> FSNamesystem.startFile()  javadoc is inconsistent
> -
>
> Key: HADOOP-1742
> URL: https://issues.apache.org/jira/browse/HADOOP-1742
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: JDocCorrect.patch
>
>
> FSNamesystem.startFile()  description should be updated. 
> It talks about arrays of blocks that are supposed to be returned, but returns 
> void.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2586) Add version to servers' startup massage.

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HADOOP-2586.
-

   Resolution: Duplicate
Fix Version/s: 0.16.0

> Add version to servers' startup massage.
> 
>
> Key: HADOOP-2586
> URL: https://issues.apache.org/jira/browse/HADOOP-2586
> Project: Hadoop
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
> Fix For: 0.16.0
>
>
> It would be useful if hadoop servers printed hadoop version as a part of the 
> startup message:
> {code}
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = my-hadoop-host
> STARTUP_MSG:   args = [-upgrade]
> STARTUP_MSG: Version = 0.15.1, r599161
> /
> {code}
> This would simplify understanding the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2601) TestNNThroughput should not use a fixed namenode port

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2601:


Attachment: TestThrghputBenchZeroPort.patch

I agree. Here is the patch to fix it.

> TestNNThroughput should not use a fixed namenode port
> -
>
> Key: HADOOP-2601
> URL: https://issues.apache.org/jira/browse/HADOOP-2601
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.15.2
>Reporter: Hairong Kuang
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TestThrghputBenchZeroPort.patch
>
>
> TestNNThroughput failed with the following error: 
> Address already in use
> java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
> at java.net.ServerSocket.bind(ServerSocket.java:319)
> at java.net.ServerSocket.(ServerSocket.java:185)
> at 
> org.mortbay.util.ThreadedServer.newServerSocket(ThreadedServer.java:391)
> at org.mortbay.util.ThreadedServer.open(ThreadedServer.java:477)
> at org.mortbay.util.ThreadedServer.start(ThreadedServer.java:503)
> at org.mortbay.http.SocketListener.start(SocketListener.java:203)
> at org.mortbay.http.HttpServer.doStart(HttpServer.java:761)
> at org.mortbay.util.Container.start(Container.java:72)
> at 
> org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:182)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:223)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:129)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:174)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:160)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.(NNThroughputBenchmark.java:57)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:752)
> at 
> org.apache.hadoop.dfs.TestNNThroughputBenchmark.testNNThroughput(TestNNThroughputBenchmark.java:15)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2601) TestNNThroughput should not use a fixed namenode port

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2601:


Status: Patch Available  (was: Open)

> TestNNThroughput should not use a fixed namenode port
> -
>
> Key: HADOOP-2601
> URL: https://issues.apache.org/jira/browse/HADOOP-2601
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.15.2
>Reporter: Hairong Kuang
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TestThrghputBenchZeroPort.patch
>
>
> TestNNThroughput failed with the following error: 
> Address already in use
> java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
> at java.net.ServerSocket.bind(ServerSocket.java:319)
> at java.net.ServerSocket.(ServerSocket.java:185)
> at 
> org.mortbay.util.ThreadedServer.newServerSocket(ThreadedServer.java:391)
> at org.mortbay.util.ThreadedServer.open(ThreadedServer.java:477)
> at org.mortbay.util.ThreadedServer.start(ThreadedServer.java:503)
> at org.mortbay.http.SocketListener.start(SocketListener.java:203)
> at org.mortbay.http.HttpServer.doStart(HttpServer.java:761)
> at org.mortbay.util.Container.start(Container.java:72)
> at 
> org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:182)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:223)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:129)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:174)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:160)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.(NNThroughputBenchmark.java:57)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:752)
> at 
> org.apache.hadoop.dfs.TestNNThroughputBenchmark.testNNThroughput(TestNNThroughputBenchmark.java:15)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2601) TestNNThroughput should not use a fixed namenode port

2008-01-14 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HADOOP-2601:
---

Assignee: Konstantin Shvachko

> TestNNThroughput should not use a fixed namenode port
> -
>
> Key: HADOOP-2601
> URL: https://issues.apache.org/jira/browse/HADOOP-2601
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.15.2
>Reporter: Hairong Kuang
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TestThrghputBenchZeroPort.patch
>
>
> TestNNThroughput failed with the following error: 
> Address already in use
> java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
> at java.net.ServerSocket.bind(ServerSocket.java:319)
> at java.net.ServerSocket.(ServerSocket.java:185)
> at 
> org.mortbay.util.ThreadedServer.newServerSocket(ThreadedServer.java:391)
> at org.mortbay.util.ThreadedServer.open(ThreadedServer.java:477)
> at org.mortbay.util.ThreadedServer.start(ThreadedServer.java:503)
> at org.mortbay.http.SocketListener.start(SocketListener.java:203)
> at org.mortbay.http.HttpServer.doStart(HttpServer.java:761)
> at org.mortbay.util.Container.start(Container.java:72)
> at 
> org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:182)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:223)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:129)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:174)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:160)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:849)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.(NNThroughputBenchmark.java:57)
> at 
> org.apache.hadoop.dfs.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:752)
> at 
> org.apache.hadoop.dfs.TestNNThroughputBenchmark.testNNThroughput(TestNNThroughputBenchmark.java:15)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-1015) slaves are not recognized by name

2008-01-11 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HADOOP-1015.
-

   Resolution: Cannot Reproduce
Fix Version/s: 0.16.0

This looks like a stale issue. It should not matter whether you specify slaves 
by names or ip addresses as long as your shell recognizes where to ssh. Don't 
have Ubuntu to try this but it seams to work in my environment with current 
trunk.
I am closing it but please feel free to reopen and describe the problem in more 
details if it persists.

> slaves are not recognized by name
> -
>
> Key: HADOOP-1015
> URL: https://issues.apache.org/jira/browse/HADOOP-1015
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.10.1
> Environment: Ubuntu 6.06 
>Reporter: moz devil
>Priority: Minor
> Fix For: 0.16.0
>
>
> After upgrading from nutch 0.8.1 (has Hadoop 0.4.0) to nutch 0.9.0 (with 
> hadoop 0.10.1), the datanodes where starting with bin/start-all.sh but did 
> not appear in the Hadoop Map/Reduce Administration screen. Only the datanode 
> where the namenode is also running appeared. I was using local dns names 
> which worked fine with hadoop 0.4.0. Now I use ip addresses which give no 
> problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2586) Add version to servers' startup massage.

2008-01-11 Thread Konstantin Shvachko (JIRA)
Add version to servers' startup massage.


 Key: HADOOP-2586
 URL: https://issues.apache.org/jira/browse/HADOOP-2586
 Project: Hadoop
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko


It would be useful if hadoop servers printed hadoop version as a part of the 
startup message:
{code}
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = my-hadoop-host
STARTUP_MSG:   args = [-upgrade]
STARTUP_MSG: Version = 0.15.1, r599161
/
{code}

This would simplify understanding the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2585) Automatic namespace recovery from the secondary image.

2008-01-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558173#action_12558173
 ] 

Konstantin Shvachko commented on HADOOP-2585:
-

We had a real example of such failure on one of our clusters.
And we were able to reconstruct the namespace image from the secondary node 
using the following 
manual procedure, which might be useful for those who find themselves in the 
same type of trouble.

h4. Manual recovery procedure from the secondary image.
# Stop the cluster to make sure all data-nodes and *-tracker are down.
# Select a node where you will run a new name-node, and set it up as usually 
for the name-node.
# Format the new name-node.
# cd /current
# You will see file VERSION in there. You will need to provide namespaceID of 
the old cluster in it. 
The old namespaceID could be obtained from one of the data-nodes 
just copy it from /current/VERSION.namespaceID
# rm /current/fsimage
# scp :/destimage.tmp ./fsimage
# Start the cluster. Upgrade is recommended, so that you could rollback if 
something goes wrong.
# Run fsck, and remove files with missing blocks if any.

h4. Automatic recovery proposal.
The proposal consists has 2 parts.
# The secondary node should store the latest check-pointed image file in 
compliance with the
name-node storage directory structure. It is best if secondary node uses 
Storage class 
(or FSImage if code re-use makes sense here) in order to maintain the 
checkpoint directory.
This should provide that the checkpointed image is always ready to be read by a 
name-node
if the directory is listed in its "dfs.name.dir" list.
# The name-node should consider the configuration variable "fs.checkpoint.dir" 
as a possible
location of the image available for read-only access during startup.
This means that if name-node finds all directories listed in "dfs.name.dir" 
unavailable or
finds their images corrupted, then it should turn to the "fs.checkpoint.dir" 
directory
and try to fetch the image from there. I think this should not be the default 
behavior but 
rather triggered by a name-node startup option, something like:
{code}
hadoop namenode -fromCheckpoint
{code}
So the name-node can start with the secondary image as long as the secondary 
node drive is mounted.
And the name-node will never attempt to write anything to this drive.

h4. Added bonuses provided by this approach
- One can choose to restart failed name-node directly on the node where the 
secondary node ran.
This brings us a step closer to the hot standby.
- Replication of the image to NFS can be delegated to the secondary name-node 
if we will
support multiple entries in "fs.checkpoint.dir". This is of course if the 
administrator
chooses to accept outdated images in order to boost the name-node performance.


> Automatic namespace recovery from the secondary image.
> --
>
> Key: HADOOP-2585
> URL: https://issues.apache.org/jira/browse/HADOOP-2585
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>
> Hadoop has a three way (configuration controlled) protection from loosing the 
> namespace image.
> # image can be replicated on different hard-drives of the same node;
> # image can be replicated on a nfs mounted drive on an independent node;
> # a stale replica of the image is created during periodic checkpointing and 
> stored on the secondary name-node.
> Currently during startup the name-node examines all configured storage 
> directories, selects the
> most up to date image, reads it, merges with the corresponding edits, and 
> writes to the new image back 
> into all storage directories. Everything is done automatically.
> If due to multiple hardware failures none of those images on mounted hard 
> drives (local or remote) 
> are available the secondary image although stale (up to one hour old by 
> default) can be still 
> used in order to recover the majority of the file system data.
> Currently one can reconstruct a valid name-node image from the secondary one 
> manually.
> It would be nice to support an automatic recovery.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2585) Automatic namespace recovery from the secondary image.

2008-01-11 Thread Konstantin Shvachko (JIRA)
Automatic namespace recovery from the secondary image.
--

 Key: HADOOP-2585
 URL: https://issues.apache.org/jira/browse/HADOOP-2585
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko


Hadoop has a three way (configuration controlled) protection from loosing the 
namespace image.
# image can be replicated on different hard-drives of the same node;
# image can be replicated on a nfs mounted drive on an independent node;
# a stale replica of the image is created during periodic checkpointing and 
stored on the secondary name-node.

Currently during startup the name-node examines all configured storage 
directories, selects the
most up to date image, reads it, merges with the corresponding edits, and 
writes to the new image back 
into all storage directories. Everything is done automatically.

If due to multiple hardware failures none of those images on mounted hard 
drives (local or remote) 
are available the secondary image although stale (up to one hour old by 
default) can be still 
used in order to recover the majority of the file system data.
Currently one can reconstruct a valid name-node image from the secondary one 
manually.
It would be nice to support an automatic recovery.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1873) User permissions for Map/Reduce

2008-01-10 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1873:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thank you Hairong.

> User permissions for Map/Reduce
> ---
>
> Key: HADOOP-1873
> URL: https://issues.apache.org/jira/browse/HADOOP-1873
> Project: Hadoop
>  Issue Type: Improvement
>Affects Versions: 0.15.1
>Reporter: Raghu Angadi
>Assignee: Hairong Kuang
> Fix For: 0.16.0
>
> Attachments: mapred.patch, mapred2.patch, mapred3.patch, 
> mapred4.patch, mapred5.patch, mapred6.patch, mapred7.patch, mapred8.patch
>
>
> HADOOP-1298 and HADOOP-1701 add permissions and pluggable security for DFS 
> files and DFS accesses. Same users permission should work for Map/Reduce jobs 
> as well. 
> User persmission should propegate from client to map/reduce tasks and all the 
> file operations should be subject to user permissions. This is transparent to 
> the user (i.e. no changes to user code should be required). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2540) Empty blocks make fsck report corrupt, even when it isn't

2008-01-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557861#action_12557861
 ] 

Konstantin Shvachko commented on HADOOP-2540:
-

> The namenode was not cleaning up the last block on lease recovery.

You mean it was not cleaning up the last block if it is a one block file, right?
This looks right to me.
I only don't like exposing the opportunity of changing lease intervals directly 
through the NameNode calls.
I'd rather introduce undocumented configuration variables. We used to do this 
in the past afair.

- FSDataOutputStream stm in TestFileCreation.testFileCreationError2() is not 
used anywhere.
Could you please also remove other warnings in this file.
- import org.apache.hadoop.fs.FsShell; is redundant
  import org.apache.hadoop.util.StringUtils; is redundant
- TEST_ROOT_DIR is never read locally.
- "this.assertEquals" should be replaced by simply "assertEquals" because 
assertEquals() is a static method.

> Empty blocks make fsck report corrupt, even when it isn't
> -
>
> Key: HADOOP-2540
> URL: https://issues.apache.org/jira/browse/HADOOP-2540
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs
>Affects Versions: 0.15.1
>Reporter: Allen Wittenauer
>Assignee: dhruba borthakur
>Priority: Blocker
> Fix For: 0.15.3
>
> Attachments: recoverLastBlock.patch
>
>
> If the name node crashes after blocks have been allocated and before the 
> content has been uploaded, fsck will report the zero sized files as corrupt 
> upon restart:
> /user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 
> blocks of total size 0 B
> ... even though all blocks are accounted for:
> Status: CORRUPT
>  Total size:2932802658847 B
>  Total blocks:  26603 (avg. block size 110243305 B)
>  Total dirs:419
>  Total files:   5031
>  Over-replicated blocks:197 (0.740518 %)
>  Under-replicated blocks:   0 (0.0 %)
>  Target replication factor: 3
>  Real replication factor:   3.0074053
> The filesystem under path '/' is CORRUPT
> In UFS and related filesystems, such files would get put into lost+found 
> after an fsck and the filesystem would return back to normal.  It would be 
> super if HDFS could do a similar thing.  Perhaps if all of the nodes stored 
> in the name node's 'includes' file have reported in, HDFS could automatically 
> run a fsck and store these not-necessarily-broken files in something like 
> lost+found.  
> Files that are actually missing blocks, however, should not be touched.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2385) Validate configuration parameters

2008-01-09 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557471#action_12557471
 ] 

Konstantin Shvachko commented on HADOOP-2385:
-

JobConf does not set hdfs parameters, but I agree users might want to 
set/access inter-component parameters.
My idea was to use carefully designed hierarchy of configuration classes and 
interfaces.
Like TaskTrackerConfiguration can be inherited from DFSClientConfiguration.
But I see the static approach you propose is simpler.
This means that the configuration classes should be public then, right?
And it doesn't matter where the get/setters are. Particularly we can combine 
all of them in one class
or even place them in the Configuration class. Is it what you want?

> I don't find the argument that FSNamesystem is already too big compelling.

Yes the size is not important. What I meant is that we keep placing logically 
independent
code inside e.g. FSNamesystem, which makes it bigger, while it could easily be 
made a separate class.
And configuration is just an example of such logically independent part.
If you write a converter for a parameter or add a verification constraint - 
these changes belong to
the configuration only, namely to the implementation of getters, why modify 
FSNamesystem if it only calls them.


> Validate configuration parameters
> -
>
> Key: HADOOP-2385
> URL: https://issues.apache.org/jira/browse/HADOOP-2385
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>
> Configuration parameters should be fully validated before name nodes or data 
> nodes begin service.
> Required parameters must be present.
> Required and optional parameters must have values of proper type and range.
> Undefined parameters must not be present.
> (I was recently observing some confusion whose root cause was a mis-spelled 
> parameter.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2008-01-09 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557455#action_12557455
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

> If you decline to fix it in a way that others approve,

I submitted two patches, and I do not see anybody except for you disapproving 
the last one.
I am not declining to make changes to the configuration we talked about here, 
just in the different jira
and because this requires more discussion while this issue is considered as a 
blocker by many.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigConvert2.patch, 
> ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2008-01-09 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557433#action_12557433
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

> That's a change from what you stated in action_12550831.

I don't see any contradiction. Besides I already said that many times after 
that, if you read a little bit further down
[#action_12551673] and [#action_12553565].

Providing partial conversion is a compromise, which I am willing to accept.
This is temporary for one release, that is why all related code is deprecated 
in my patch.

> You imply that I am asking this issue to fix a few instances of a widespread 
> problem unrelated to the issue.

"some processing" of exactly these parameters was introduced in HADOOP-1085. [I 
opposed it 
then.|https://issues.apache.org/jira/browse/HADOOP-1085?focusedCommentId=12483800#action_12483800]
 You just committed it.
If dozens of files is not widespread then what is.

The only thing I agree with you upon is that configuration needs accessors.
But I do not agree that they should be introdueced in this patch, which will 
lead to massive changes,
and I do not support *static* accessors, and I do not see any of these is 
supported by anybody else.

This argument is going on for almost a month now. I do not find it productive.
Unless the compromise proposed by Dhruba is acceptable for you, 
I am planning to submit the bug fix mentioned by Devaraj in a separate issue, 
and let somebody else deal with this one.
I mean, people can have different opinions, what do you do with that.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigConvert2.patch, 
> ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2385) Validate configuration parameters

2008-01-09 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557424#action_12557424
 ] 

Konstantin Shvachko commented on HADOOP-2385:
-

Why setters need to be static? What is wrong with this
{code}
class NameNodeConfig extends Configuration {
void setHttpBindAddres(String host, int port) {
set("dfs.http.bindAddress", host + ":" + port);
}
}
{code}
Why per-package, not per-component?

Serialization is common for all of them: it's the one defined in Configuration.
And the configuration files (hadooop-default.xml, hadoop-site.xml) are all the 
same.
Just the classes are different because their accessors are different.

> Validate configuration parameters
> -
>
> Key: HADOOP-2385
> URL: https://issues.apache.org/jira/browse/HADOOP-2385
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>
> Configuration parameters should be fully validated before name nodes or data 
> nodes begin service.
> Required parameters must be present.
> Required and optional parameters must have values of proper type and range.
> Undefined parameters must not be present.
> (I was recently observing some confusion whose root cause was a mis-spelled 
> parameter.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2008-01-08 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557109#action_12557109
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

It sounds like you are under the impression that the original patch was just 
about renaming a bunch of configuration parameters. 
It was not. It was about prohibiting port rolling. This is a security issue. 
Security changes are always like that: people are irritated when you impose 
more restrictions on them. 
And it is an incompatible change in its nature. What is the point of 
restricting if you let everybody go around it?

Suppose that I left all the parameters unchanged, then people would have had 
hard (even harder) time understanding what is going wrong with their code. 
That is why I thought it was appropriate to change names at the same time the 
semantics of the parameter was changing.
I thought and still think it is more fair not to provide any backward 
compatibility at all in this case rather than to provide partial compatibility 
in the form of old name recognition.
Will not repeat all the arguments again, but it looks like they convinced 
Hemanth, and turned Arun into willing-to-listen state.

I understand your irritation on the configuration issues, but I don't 
understand why blame my or equally any other patch for not dealing with them.

I do not favor the idea of creating static getters for configuration parameters 
in NameNode, TaskTracker etc. classes. Just commented on that in HADOOP-2385.



> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigConvert2.patch, 
> ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2385) Validate configuration parameters

2008-01-08 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557108#action_12557108
 ] 

Konstantin Shvachko commented on HADOOP-2385:
-

We already have JobConf derived from Configuration.
I am just proposing to extend this approach to other components.
The Configuration itself should remain the same for each component.
That is it reads the same config files with the same override rules, and with 
the same properties inside.
It just exposes get methods specific to the component.
Conversion from one class to another can be easily provided by constructors 
like 
{code} FsConfiguration(Configuration conf) {code}
And then you can pass RPCCongifuration as a parameter into it since the latter 
is a subclass of Configuration and since it contains all properties related to 
rpc.
So configuration class for each component is just a wrapper exposing methods and
providing validation and potentially conversion of the parameters related to 
the component.
I do not support the idea of placing static getters for configuration 
parameters in the (top-level) component 
classes because I would prefer to separate all configuration issues from the 
component code.
FSNamesystem is again almost 4K lines long, JobTracker and TaskTracker over 2K 
lines.


> Validate configuration parameters
> -
>
> Key: HADOOP-2385
> URL: https://issues.apache.org/jira/browse/HADOOP-2385
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>
> Configuration parameters should be fully validated before name nodes or data 
> nodes begin service.
> Required parameters must be present.
> Required and optional parameters must have values of proper type and range.
> Undefined parameters must not be present.
> (I was recently observing some confusion whose root cause was a mis-spelled 
> parameter.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2447) HDFS should be capable of limiting the total number of inodes in the system

2008-01-08 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557087#action_12557087
 ] 

Konstantin Shvachko commented on HADOOP-2447:
-

+1

> HDFS should be capable of limiting the total number of inodes in the system
> ---
>
> Key: HADOOP-2447
> URL: https://issues.apache.org/jira/browse/HADOOP-2447
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs
>Reporter: Sameer Paranjpye
>Assignee: dhruba borthakur
> Fix For: 0.16.0
>
> Attachments: fileLimit.patch, fileLimit2.patch, fileLimit3.patch
>
>
> The HDFS Namenode should be capable of limiting the total number of Inodes 
> (files + directories). The can be done through a config variable, settable in 
> hadoop-site.xml. The default should be no limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2446) TestHDFSServerPorts fails.

2008-01-08 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557009#action_12557009
 ] 

Konstantin Shvachko commented on HADOOP-2446:
-

+1 on Nigel's patch
Yes this the right fix. 
The test failed not because something was running on a specific port but since 
the server did not close on the previous run.

> TestHDFSServerPorts fails.
> --
>
> Key: HADOOP-2446
> URL: https://issues.apache.org/jira/browse/HADOOP-2446
> Project: Hadoop
>  Issue Type: Bug
>  Components: dfs, test
>Affects Versions: 0.16.0
>Reporter: Raghu Angadi
>Assignee: Nigel Daley
> Fix For: 0.16.0
>
> Attachments: HADOOP-2446.patch, HADOOP-2446.patch, 
> TEST-org.apache.hadoop.dfs.TestHDFSServerPorts.txt
>
>
> This might be because I already have Namenode running on my machine. Its 
> better if the unit tests could tolerate another DFS instance running on the 
> same machine.  Otherwise we might get used to seeing unit test failures and 
> miss the new failures.
> I will attach the test output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2447) HDFS should be capable of limiting the total number of inodes in the system

2008-01-07 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556807#action_12556807
 ] 

Konstantin Shvachko commented on HADOOP-2447:
-

Yes I set dfs.max.objects to 15, which is less than the number of objects in my 
file system.


> HDFS should be capable of limiting the total number of inodes in the system
> ---
>
> Key: HADOOP-2447
> URL: https://issues.apache.org/jira/browse/HADOOP-2447
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs
>Reporter: Sameer Paranjpye
>Assignee: dhruba borthakur
> Fix For: 0.16.0
>
> Attachments: fileLimit.patch, fileLimit2.patch, fileLimit3.patch
>
>
> The HDFS Namenode should be capable of limiting the total number of Inodes 
> (files + directories). The can be done through a config variable, settable in 
> hadoop-site.xml. The default should be no limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2447) HDFS should be capable of limiting the total number of inodes in the system

2008-01-07 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556766#action_12556766
 ] 

Konstantin Shvachko commented on HADOOP-2447:
-

{code}
  volatile private long totalNodes = 1;   // number of inodes, for rootdir
{code}
totalNodes should be totalINodes, otherwise it is not clear which nodes are 
being referred to, 
e.g. data-nodes or nodes related to network topology ...
{code}
  private long maxFsObjects = 0;  // maximum allowed inodes.
{code}
The comment should say "objects" rather than "inodes". Also this member 
logically belongs to FSNamesystem, because

- FSDirectory has knowledge only about INodes, but not blocks.
- Traditionally we were trying to keep all configurable parameters inside 
FSNamesystem and set them 
using setConfigurationParameters(). I don't see why we should do any different 
here.

Then the next step would be to move checkFsObjectLimit() to FSNamesystem from 
FSDirectory.
It also looks that you can call checkFsObjectLimit() in the FSNamesystem methods
rather than inside FSDirectory after that.

The statistics is a really good idea. Should we display it the same way the 
other stat fields are displayed?
Something like:
{code}
DFS Used%   : 0 %
Live Nodes  : 0
Dead Nodes  : 0
Files and directories   : 49
Blocks  : 36
Total objects   : 85 (100%) out of max allowed 50
Name-node Heap Size: 74.38 MB / 733.81 MB (10%) 
{code}

The number of files and directories displayed is inconsistent with the number 
reported by fsck.
Fsck apparently does not count the root directory as an entry.
I'd say fsck is wrong, but the important thing they should be consistent.

The percentage of objects should not exceed 100%. Right now it is reported as:
{code}
 20 files and directories, 17 blocks = 37 total / 15 (246%). Heap Size is 50.94 
MB / 733 MB (6%)
{code}

I was not able to apply this patch to the current trunk after the permissions 
patch.

> HDFS should be capable of limiting the total number of inodes in the system
> ---
>
> Key: HADOOP-2447
> URL: https://issues.apache.org/jira/browse/HADOOP-2447
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs
>Reporter: Sameer Paranjpye
>Assignee: dhruba borthakur
> Fix For: 0.16.0
>
> Attachments: fileLimit.patch, fileLimit2.patch
>
>
> The HDFS Namenode should be capable of limiting the total number of Inodes 
> (files + directories). The can be done through a config variable, settable in 
> hadoop-site.xml. The default should be no limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1298) adding user info to file

2007-12-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554874
 ] 

Konstantin Shvachko commented on HADOOP-1298:
-

The NNThrougputBenchmark is pretty much useless for evaluating performance with 
the latest attachment.
The problem is that NameNode uses Server.getUserInfo() static method in order 
to get credentials of the user.
NNThrougputBenchmark does not suppose to have any servers, since it directly 
calls the name-node methods.
This was totally broken with the previous patch, I fixed it so that Server 
performs a login if there are no current calls.
But now the benchmark works forever because it performs login, that is a shell 
command, on each call.
So the performance of creates measured by the benchmark drops 100 times.
I hacked it to return my identity instead of doing login to get the right 
performance numbers.

The real numbers for creating 10,000 files is around *11 - 12%* lower with this 
patch compared to the trunk.
I think this is expected and acceptable. We will be able to optimize it from 
here.

But the credential problem should be fixed.
IMHO we should have an explicit login() call in the ClientProtocol, and the 
name-node should cache credentials
for each client. This will also work for the NNThrougputBenchmark, which will 
be able to call login() directly on the NameNode.
Besides passing credentials with each rpc call is less secure than sending it 
just once in login.
We can make it even bulletproof in the future by exchanging private keys 
between the client and the server for
the purpose of just this one call when then perform a handshake.
Anyway current solution for passing client credentials looks more like 
delivering them through a back door.
This is not introduced by this patch (HADOOP-2184), but should be fixed here 
before it is committed.

> adding user info to file
> 
>
> Key: HADOOP-1298
> URL: https://issues.apache.org/jira/browse/HADOOP-1298
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs, fs
>Affects Versions: 0.16.0
>Reporter: Kurtis Heimerl
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.16.0
>
> Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 
> 1298_20071221b.patch, 1298_20071228s.patch, hadoop-user-munncha.patch17, 
> HDFSPermissionSpecification5.pdf
>
>
> I'm working on adding a permissions model to hadoop's DFS. The first step is 
> this change, which associates user info with files. Following this I'll 
> assoicate permissions info, then block methods based on that user info, then 
> authorization of the user info. 
> So, right now i've implemented adding user info to files. I'm looking for 
> feedback before I clean this up and make it offical. 
> I wasn't sure what release, i'm working off trunk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-28 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Status: Patch Available  (was: Open)

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigConvert2.patch, 
> ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-28 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Attachment: ConfigConvert2.patch

Yes, the old code was setting (by mistake I am sure) 
"mapred.task.tracker.report.address" instead of "...bindAddress".
The former is a non-existing configuration parameter, setting of which to 
anything would not cause any problems since 
it is not used anywhere. But I agree fixing one bug does not release you from 
the responsibility of fixing another one
in the same line. Here is the patch that takes care of this problem.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigConvert2.patch, 
> ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1298) adding user info to file

2007-12-28 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1298:


Fix Version/s: 0.16.0
Affects Version/s: 0.16.0
   Status: Patch Available  (was: Open)

> adding user info to file
> 
>
> Key: HADOOP-1298
> URL: https://issues.apache.org/jira/browse/HADOOP-1298
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs, fs
>Affects Versions: 0.16.0
>Reporter: Kurtis Heimerl
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.16.0
>
> Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 
> 1298_20071221b.patch, 1298_20071228s.patch, hadoop-user-munncha.patch17, 
> HDFSPermissionSpecification5.pdf
>
>
> I'm working on adding a permissions model to hadoop's DFS. The first step is 
> this change, which associates user info with files. Following this I'll 
> assoicate permissions info, then block methods based on that user info, then 
> authorization of the user info. 
> So, right now i've implemented adding user info to files. I'm looking for 
> feedback before I clean this up and make it offical. 
> I wasn't sure what release, i'm working off trunk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1298) adding user info to file

2007-12-28 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1298:


Attachment: 1298_20071228s.patch

This synchronizes with the current trunk.
I also fixed a problem that caused the FSNamesystem constructor to throw an 
IOException 
without closing the namesystem.
Aslo the local call of name-node methods did not work.

> adding user info to file
> 
>
> Key: HADOOP-1298
> URL: https://issues.apache.org/jira/browse/HADOOP-1298
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs, fs
>Affects Versions: 0.16.0
>Reporter: Kurtis Heimerl
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.16.0
>
> Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 
> 1298_20071221b.patch, 1298_20071228s.patch, hadoop-user-munncha.patch17, 
> HDFSPermissionSpecification5.pdf
>
>
> I'm working on adding a permissions model to hadoop's DFS. The first step is 
> this change, which associates user info with files. Following this I'll 
> assoicate permissions info, then block methods based on that user info, then 
> authorization of the user info. 
> So, right now i've implemented adding user info to files. I'm looking for 
> feedback before I clean this up and make it offical. 
> I wasn't sure what release, i'm working off trunk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1298) adding user info to file

2007-12-28 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1298:


Status: Open  (was: Patch Available)

> adding user info to file
> 
>
> Key: HADOOP-1298
> URL: https://issues.apache.org/jira/browse/HADOOP-1298
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs, fs
>Reporter: Kurtis Heimerl
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 
> 1298_20071221b.patch, hadoop-user-munncha.patch17, 
> HDFSPermissionSpecification5.pdf
>
>
> I'm working on adding a permissions model to hadoop's DFS. The first step is 
> this change, which associates user info with files. Following this I'll 
> assoicate permissions info, then block methods based on that user info, then 
> authorization of the user info. 
> So, right now i've implemented adding user info to files. I'm looking for 
> feedback before I clean this up and make it offical. 
> I wasn't sure what release, i'm working off trunk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2149) Pure name-node benchmarks.

2007-12-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HADOOP-2149.
-

Resolution: Fixed

I just committed this.

> Pure name-node benchmarks.
> --
>
> Key: HADOOP-2149
> URL: https://issues.apache.org/jira/browse/HADOOP-2149
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: NNThroughput.patch, NNThroughput.patch
>
>
> h3. Pure name-node benchmark.
> This patch starts a series of name-node benchmarks.
> The intention is to have a separate benchmark for every important name-node 
> operation.
> The purpose of benchmarks is
> # to measure the throughput for each name-node operation, and
> # to evaluate changes in the name-node performance (gain or degradation) when 
> optimization
> or new functionality patches are introduced.
> The benchmarks measure name-node throughput (ops per second) and the average 
> execution time.
> The benchmark does not involve any other hadoop components except for the 
> name-node.
> The name-node server is real, other components are simulated.
> There is no RPC overhead. Each operation is executed by calling directly the 
> respective name-node method.
> The benchmark is multi-threaded, that is one can start multiple threads 
> competing for the
> name-node resources by executing concurrently the same operation but with 
> different data.
> See javadoc for more details.
> The patch contains implementation for two name-node operations: file creates 
> and block reports.
> Implementation of other operations will follow.
> h3. File creation benchmark.
> I've ran two series of the file create benchmarks on the name-node with 
> different number of threads.
> The first series is run on the regular name-node performing an edits log 
> transaction on every create.
> The transaction includes a synch to the disk.
> In the second series the name-node is modified so that the synchs are turned 
> off.
> Each run of the benchmark performs the same number 10,000 of creates equally 
> distributed between
> running threads. I used a 4 core 2.8Ghz machine.
> The following two tables summarized the results. Time is in milliseconds.
> || threads || time (msec)\\with synch || ops/sec\\with synch ||
> | 1 | 13074 | 764 |
> | 2 | 8883 | 1125 |
> | 4 | 7319 | 1366 |
> | 10 | 7094 | 1409 |
> | 20 | 6785 | 1473 |
> | 40 | 6776 | 1475 |
> | 100 | 6899 | 1449 |
> | 200 | 7131 | 1402 |
> | 400 | 7084 | 1411 |
> | 1000 | 7181 | 1392 |
> || threads || time (msec)\\no synch || ops/sec\\no synch ||
> | 1 | 4559 | 2193 |
> | 2 | 4979 | 2008 |
> | 4 | 5617 | 1780 |
> | 10 | 5679 | 1760 |
> | 20 | 5550 | 1801 |
> | 40 | 5804 | 1722 |
> | 100 | 5871 | 1703 |
> | 200 | 6037 | 1656 |
> | 400 | 5855 | 1707 |
> | 1000 | 6069 | 1647 |
> The results show:
> # (Table 1) The new synchronization mechanism that batches synch calls from 
> different threads works well.
> For one thread all synchs cause a real IO making it slow. The more threads is 
> used the more synchs are
> batched resulting in better performance. The performance grows up to a 
> certain point and then stabilizes
> at about 1450 ops/sec.
> # (Table 2) Operations that do not require disk IOs are constrained by memory 
> locks.
> Without synchs the one-threaded execution is the fastest, because there are 
> no waits.
> More threads start to intervene with each other and have to wait.
> Again the performance stabilizes at about 1700 ops/sec, and does not degrade 
> further.
> # Our default 10 handlers per name-node is not the best choice neither for 
> the io bound nor for the pure
> memory operations. We should increase the default to 20 handlers and on big 
> classes 100 handlers
> or more can be used without loss of performance. In fact with more handlers 
> more operations can be handled
> simultaneously, which prevents the name-node from dropping calls that are 
> close to timeout.
> h3. Block report benchmark.
> In this benchmarks each thread pretends it is a data-node and calls 
> blockReport() with the same blocks.
> All blocks are real, that is they were previously allocated by the name-node 
> and assigned to the data-nodes.
> Some reports can contain fake blocks, and some can have missing blocks.
> Each block report consists of 10,000 blocks. The total number of reports sent 
> is 1000.
> The reports are equally divided between the data-nodes so that each of them 
> sends equal number of reports.
> Here is the table with the results.
> || data-nodes || time (msec) || ops/sec ||
> | 1 | 42234 | 24 |
> | 2 | 9412 | 106 |
> | 4 | 11465 | 87 |
> | 10 | 15632 | 64 |
> | 20 | 17623 | 57 |
> | 40 | 19563 | 51 |
> | 100 | 24315 | 41 |
> | 200 | 2

[jira] Resolved: (HADOOP-2115) Task cwds should be distributed across partitions

2007-12-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HADOOP-2115.
-

Resolution: Duplicate

Closing this issue since it has already been fixed.

> Task cwds should be distributed across partitions
> -
>
> Key: HADOOP-2115
> URL: https://issues.apache.org/jira/browse/HADOOP-2115
> Project: Hadoop
>  Issue Type: Improvement
>  Components: mapred
>Affects Versions: 0.14.3
> Environment: All
>Reporter: Milind Bhandarkar
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
>
> Even when mapred.local.dir specifies a comma-separated list of partitions 
> (typically one per physical disk), all tasks of the same job have current 
> working directories that belong to only one partition. For side-effect tasks, 
> that use local cwd as a scratch space, this overloads a single disk while 
> other disks may be idle. Idially, each task should get a cwd on different 
> partition. This is related to HADOOP-1991, but emphasizes performance impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2115) Task cwds should be distributed across partitions

2007-12-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reassigned HADOOP-2115:
---

Assignee: Konstantin Shvachko

> Task cwds should be distributed across partitions
> -
>
> Key: HADOOP-2115
> URL: https://issues.apache.org/jira/browse/HADOOP-2115
> Project: Hadoop
>  Issue Type: Improvement
>  Components: mapred
>Affects Versions: 0.14.3
> Environment: All
>Reporter: Milind Bhandarkar
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
>
> Even when mapred.local.dir specifies a comma-separated list of partitions 
> (typically one per physical disk), all tasks of the same job have current 
> working directories that belong to only one partition. For side-effect tasks, 
> that use local cwd as a scratch space, this overloads a single disk while 
> other disks may be idle. Idially, each task should get a cwd on different 
> partition. This is related to HADOOP-1991, but emphasizes performance impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Status: Patch Available  (was: Open)

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Attachment: ConfigConvert.patch

This is the implementation of variant (4).

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigConvert.patch, ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

2007-12-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554006
 ] 

Konstantin Shvachko commented on HADOOP-2116:
-

This is also practically fixed by HADOOP-2227. The only thing left is to expose 
the shared directory through the configuration.
JobConf now has a property "mapred.jar" accessible through getJar() method, 
which points to the jar file located in the jobcache 
directory, which in fact is in the common shared directory for the job tasks.
Namely,
{code}
"mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache//job.jar
{code}

So we can replace configuration parameter "mapred.jar" by "job.local.dir", 
which will point to the parent of "mapred.jar".
JobConf.getJar() can be implemented then as
{code}
String getJar() {
return get("job.local.dir") + "job.xml";
}
{code}

Will that work?

With respect to all the above I wonder why do we need to use LocalDirAllocator 
in TaskRunner.run()
if job cache directory (jobCacheDir) can be obtained directly from 
TaskRunner.conf
{code}
File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
{code}


> Job.local.dir to be exposed to tasks
> 
>
> Key: HADOOP-2116
> URL: https://issues.apache.org/jira/browse/HADOOP-2116
> Project: Hadoop
>  Issue Type: Improvement
>  Components: mapred
>Affects Versions: 0.14.3
> Environment: All
>Reporter: Milind Bhandarkar
> Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users 
> that need a job-specific shared directory for use as scratch space, create 
> ../work. This is hacky, and will break when HADOOP-2115 is addressed. For 
> such jobs, hadoop mapred should expose job.local.dir via localized 
> configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2116) Job.local.dir to be exposed to tasks

2007-12-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554006
 ] 

shv edited comment on HADOOP-2116 at 12/21/07 11:32 AM:


This is also practically fixed by HADOOP-2227. The only thing left is to expose 
the shared directory through the configuration.
JobConf now has a property "mapred.jar" accessible through getJar() method, 
which points to the jar file located in the jobcache 
directory, which in fact is in the common shared directory for the job tasks.
Namely,
{code}
"mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache//job.jar
{code}

So we can replace configuration parameter "mapred.jar" by "job.local.dir", 
which will point to the parent of "mapred.jar".
JobConf.getJar() can be implemented then as
{code}
String getJar() {
return get("job.local.dir") + "/job.jar";
}
{code}

Will that work?

With respect to all the above I wonder why do we need to use LocalDirAllocator 
in TaskRunner.run()
if job cache directory (jobCacheDir) can be obtained directly from 
TaskRunner.conf
{code}
File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
{code}


  was (Author: shv):
This is also practically fixed by HADOOP-2227. The only thing left is to 
expose the shared directory through the configuration.
JobConf now has a property "mapred.jar" accessible through getJar() method, 
which points to the jar file located in the jobcache 
directory, which in fact is in the common shared directory for the job tasks.
Namely,
{code}
"mapred.jar" = "mapred.local.dir"[i]/taskTracker/jobcache//job.jar
{code}

So we can replace configuration parameter "mapred.jar" by "job.local.dir", 
which will point to the parent of "mapred.jar".
JobConf.getJar() can be implemented then as
{code}
String getJar() {
return get("job.local.dir") + "job.xml";
}
{code}

Will that work?

With respect to all the above I wonder why do we need to use LocalDirAllocator 
in TaskRunner.run()
if job cache directory (jobCacheDir) can be obtained directly from 
TaskRunner.conf
{code}
File jobCacheDir = new File(new File(conf.getJar()).getParentFile(), "work");
{code}

  
> Job.local.dir to be exposed to tasks
> 
>
> Key: HADOOP-2116
> URL: https://issues.apache.org/jira/browse/HADOOP-2116
> Project: Hadoop
>  Issue Type: Improvement
>  Components: mapred
>Affects Versions: 0.14.3
> Environment: All
>Reporter: Milind Bhandarkar
> Fix For: 0.16.0
>
>
> Currently, since all task cwds are created under a jobcache directory, users 
> that need a job-specific shared directory for use as scratch space, create 
> ../work. This is hacky, and will break when HADOOP-2115 is addressed. For 
> such jobs, hadoop mapred should expose job.local.dir via localized 
> configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2115) Task cwds should be distributed across partitions

2007-12-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553998
 ] 

Konstantin Shvachko commented on HADOOP-2115:
-

I think this was fixed by HADOOP-2227 and we can close this issue.
Amareshwari, could you please confirm?

> Task cwds should be distributed across partitions
> -
>
> Key: HADOOP-2115
> URL: https://issues.apache.org/jira/browse/HADOOP-2115
> Project: Hadoop
>  Issue Type: Improvement
>  Components: mapred
>Affects Versions: 0.14.3
> Environment: All
>Reporter: Milind Bhandarkar
> Fix For: 0.16.0
>
>
> Even when mapred.local.dir specifies a comma-separated list of partitions 
> (typically one per physical disk), all tasks of the same job have current 
> working directories that belong to only one partition. For side-effect tasks, 
> that use local cwd as a scratch space, this overloads a single disk while 
> other disks may be idle. Idially, each task should get a cwd on different 
> partition. This is related to HADOOP-1991, but emphasizes performance impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553565
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

We are not converging here. Your arguments are mostly about how the 
configuration should be structured.
Doug, you are trying to kill two birds with this one:
- provide compatibility and 
- clean up the configuration mess.

I prefer to keep them separate and just concentrate on the compatibility. 
Because as this issue reveals
common ground on the general configuration issue will be harder to achieve.

> I also have misgivings about mutating the configuration. Won't that confuse 
> folks?

Interesting point. If somebody uses the default configuration and expects to be 
able to 
obtain a non-existing parameter, like "dfs.info.port" from it in some legacy 
code it will fail.
And there isn't much you can do about it.
That is another reason for declaring this an incompatible change.

To summarize the above we have 4 proposals:
# Declare the port patch an incompatible change and ask people to change their 
configurations.
# Accept current conversion patch with all changes localized in one class, and 
revert it after release 0.16. 
Although the practice of making application-specific changes in 
Configuration.java is a bad practice.
# Provide static configuration parameter getters (and setters?) related to each 
component and call 
them consistently within the code.
# Create a separate class ConfigurationConverter (packaged in hadoop.util?) 
with a deprecated 
static method for conversion, and call the method in NameNode, DataNode, 
JobTracker, TaskTracker,
SeondaryNamenode, and DFSClient constructors.

I am in favor of 1, but I am ok with 2 and 4.


> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553473
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

> It's a good idea to always access parameters through accessor methods. 

Nobody is arguing with that, but this is not a part of this issue.

I like Dhruba's proposal better. It will still change several files, but 
definitely less than 27.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-19 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553467
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

> But not destabilizing.

It is destabilizing for most of the patches currently submitted.

> That I don't see. Won't we still want all of these methods in the next 
> release?

I am not sure we will need then in this form.
# I'd prefer to have a separate configuration class per component combining its 
configuration logic rather than having 
static methods all around, but I'd rather continue this discussion in the other 
issue.
# I'd prefer to have (a) explicit and (b) deprecated methods for conversion in 
order to make it clear, (a) which parameters 
are backward compatible now, and (b) that the feature is not going to be 
supported forever.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553196
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

I thought you proposed to get rid of "directly reading the Configuration 
properties", all of them. Now I understand you do not.

Where do you count the six parameters? 
A total of 14 parameters were affected by HADOOP-2185, 5 of them were removed, 
that leaves us with 9 methods.

I counted at least 12 files that directly read *.http.bindAddress parameters 
from the configuration.
And 27 files that directly read "fs.default.name". So the change is not hard 
its just massive.

Taking into account the amount of testing, which involves old, new, and mixed 
values of the parameters, 
and that all changes are born to be removed in the next release, I am hesitant 
whether 
"establishing the precedent of making application-specific changes in 
Configuration.java"
is really such a bad thing to overweight the advantage of having all changes in 
precisely one class.


> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552940
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

Do you propose to introduce here per-parameter (back-compatible) methods only 
for 
the affected address and port parameters or for all of them?
The latter is a big change presumably to be addressed by HADOOP-2385 after a 
reasonable design effort.
Backward compatibility for port parameters is a more urgent and much less 
permanent matter.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2462) MiniMRCluster does not utilize multiple local directories in "mapred.local.dir"

2007-12-18 Thread Konstantin Shvachko (JIRA)
MiniMRCluster does not utilize multiple local directories in "mapred.local.dir"
---

 Key: HADOOP-2462
 URL: https://issues.apache.org/jira/browse/HADOOP-2462
 Project: Hadoop
  Issue Type: Bug
  Components: test
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko
 Fix For: 0.16.0


My hadoop-site.xml specifies 4 local directories
{code}

  mapred.local.dir
  ${hadoop.tmp.dir}/mapred/local1, ${hadoop.tmp.dir}/mapred/local2, 
 ${hadoop.tmp.dir}/mapred/local3, 
${hadoop.tmp.dir}/mapred/local4

{code}
and I am looking at MiniMRCluster.TaskTrackerRunner

There are several things here:
# localDirBase value is set to
{code}
"/tmp/h/mapred/local1, /tmp/h/mapred/local2, /tmp/h/mapred/local3, 
/tmp/h/mapred/local4"
{code}
and I get a hierarchy of directories with commas and spaces in the names. 
I think this was not designed to work with multiple dirs.
# Further down, all new directories are generated with the same name
{code}
File ttDir = new File(localDirBase, 
  Integer.toString(trackerId) + "_" + 0);
{code}
So in fact only one directory is created. I think the intension was to have i 
instead of 0
{code}
File ttDir = new File(localDirBase, 
  Integer.toString(trackerId) + "_" + i);
{code}
# On windows MiniMRCluster.TaskTrackerRunner in this case throws an 
IOException, 
which is silently ignored by all but the TestMiniMRMapRedDebugScript   MiniMR 
tests.
{code}
java.io.IOException: Mkdirs failed to create 
/tmp/h/mapred/local1, /tmp/h/mapred/local2, /tmp/h/mapred/local3, 
/tmp/h/mapred/local4/0_0
at 
org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.(MiniMRCluster.java:124)
at org.apache.hadoop.mapred.MiniMRCluster.(MiniMRCluster.java:293)
at org.apache.hadoop.mapred.MiniMRCluster.(MiniMRCluster.java:244)
at 
org.apache.hadoop.mapred.TestMiniMRClasspath.testClassPath(TestMiniMRClasspath.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:478)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:344)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
{code}

I am marking it as "Major" because we actually do not test multiple local 
directories.
Looks like it was introduced rather recently by HADOOP-1819.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552904
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

Yes, this is exactly what this patch does, including warnings.
Where do we want the configuration converter method, which class it should 
belong to?
I plan to use the same static converter in HDFS and MR and call it whenever 
appropriate.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551673
 ] 

shv edited comment on HADOOP-2404 at 12/13/07 5:51 PM:
---

May be we should just declare that this change is incompatible and not provide 
any conversion of the changed parameters at all.
We cannot provide true compatibility. Whoever relied on the default port usage 
expecting (consciously or unconsciously) them 
to be rolled if they are busy, would have to explicitly specify 0-ports. This 
means the configuration should be changed, 
so users might change the names as well.

> instead of directly reading "dfs.http.bindAddress", add a package-private 
> static method getHttpBindAddress(Configuration)
+1  I really hate that we keep using raw names rather than providing 
getters/setters.
This should be done for all configuration parameters imo, and probably belongs 
to HADOOP-2385.

  was (Author: shv):
May be should just declare that this change is incompatible and not provide 
any conversion of the changed parameters at all.
We cannot provide true compatibility. Whoever relied on the default port usage 
expecting (consciously or unconsciously) them 
to be rolled if they have are busy, would have to explicitly specify 0-ports. 
This means the configuration should be changed, 
so users might change the names as well.

> instead of directly reading "dfs.http.bindAddress", add a package-private 
> static method getHttpBindAddress(Configuration)
+1  I really hate that we keep using raw names rather than providing 
getters/setters/
This should be done for all configuration parameters imo, and probably belongs 
to HADOOP-2385.
  
> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551673
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

May be should just declare that this change is incompatible and not provide any 
conversion of the changed parameters at all.
We cannot provide true compatibility. Whoever relied on the default port usage 
expecting (consciously or unconsciously) them 
to be rolled if they have are busy, would have to explicitly specify 0-ports. 
This means the configuration should be changed, 
so users might change the names as well.

> instead of directly reading "dfs.http.bindAddress", add a package-private 
> static method getHttpBindAddress(Configuration)
+1  I really hate that we keep using raw names rather than providing 
getters/setters/
This should be done for all configuration parameters imo, and probably belongs 
to HADOOP-2385.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Comment: was deleted

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Comment: was deleted

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2385) Validate configuration parameters

2007-12-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551670
 ] 

Konstantin Shvachko commented on HADOOP-2385:
-

I think this should be related to all hadoop components including hdfs client, 
secondary-node, job-tracker and task-tracker.
They all should independently verify parameters they depend upon.

I'd prefer if we had a separate configuration class for each component, say 
NamenodeConfiguration, DatanodeConfiguration, ...
derived from the base Configuration class, with explicit getters and setters 
for each parameter used in the component.
That way it will be easy to provide verification and backward compatibility.

That reminds me about one abandoned issue HADOOP-24 - an attempt to make a 
configuration interface.
What was the reason it had never been committed?

We should also use xml schema for the verification purposes. That would make 
verification automatic, and there will be no
need to write verification code.

> Validate configuration parameters
> -
>
> Key: HADOOP-2385
> URL: https://issues.apache.org/jira/browse/HADOOP-2385
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.16.0
>Reporter: Robert Chansler
>
> Configuration parameters should be fully validated before name nodes or data 
> nodes begin service.
> Required parameters must be present.
> Required and optional parameters must have values of proper type and range.
> Undefined parameters must not be present.
> (I was recently observing some confusion whose root cause was a mis-spelled 
> parameter.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551099
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

Dhruba, this means that there is something else running on the same port. Most 
likely another data-node on the default port.
As I mentioned before this patch does not reverse port rolling behavior, just 
provides support for the deprecated config parameters.
And "deprecated configuration parameter" message does not appear because your 
configuration does not contain old names.
The application you use should explicitly specify port 0 if the actual port # 
is meant to be arbitrary.

Doug, I agree this would be a better solution. But this will be a much bigger 
change now and then when we remove
deprecation. There are 5 servers involved. So at least 5 places should be 
patched if we go your way.
The problem is only with HOD for now as I understand. HOD does not use 
Configuration class, but rather
generates xml files by directly writing into them. So there is no way they can 
call any methods at all.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-11 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2404:


Attachment: ConfigurationConverter.patch

This patch performs configuration conversion as discussed.
I tested it with new, old, and mixed configurations. It works for me.
Could anybody please verify it on HOD.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
> Attachments: ConfigurationConverter.patch
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550831
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

HADOOP-2185 introduced two types of changes:
# Change in semantics: no port rolling for any ports.
# Configuration api changes: 5 properties have been removed and 5 renamed.

Here is the compatibility plan.
# Semantical changes will not be backward compatible.
# Old configuration variables will be recognized in 0.16 as described below.

If a new name of a configuration parameter is present, then old names 
corresponding
to this parameter will be ignored.

If old names of a configuration parameter are present but not the new ones
the old parameters will be converted to the new according to the conversion
table provided in HADOOP-2185.

The conversion will be done right after loading all Configuration resources
(see Configuration.loadResources()) and Hadoop will further work with the
converted configuration.

I was assured this will work fine for HOD and Pig.
But if your application uses tricks like:
create config; call new JobTracker(config); then get the job-tracker info port
by calling config.get("mapred.job.tracker.info.port")
introduced by HADOOP-1085 this will not work. You will need to change your 
application
to use config.get("mapred.job.tracker.http.bindAddress") and then extract the 
port.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2404) HADOOP-2185 breaks compatibility with hadoop-0.15.0

2007-12-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550717
 ] 

Konstantin Shvachko commented on HADOOP-2404:
-

Arun, There are two +1s from you on HADOOP-2185.
It also clarifies about the port rolling and etc.

> HADOOP-2185 breaks compatibility with hadoop-0.15.0
> ---
>
> Key: HADOOP-2404
> URL: https://issues.apache.org/jira/browse/HADOOP-2404
> Project: Hadoop
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 0.16.0
>Reporter: Arun C Murthy
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.16.0
>
>
> HADOOP-2185 removed the following configuration parameters:
> {noformat}
> dfs.secondary.info.port
> dfs.datanode.port
> dfs.info.port
> mapred.job.tracker.info.port
> tasktracker.http.port
> {noformat}
> and changed the following configuration parameters:
> {noformat}
> dfs.secondary.info.bindAddress
> dfs.datanode.bindAddress
> dfs.info.bindAddress
> mapred.job.tracker.info.bindAddress
> mapred.task.tracker.report.bindAddress
> tasktracker.http.bindAddress
> {noformat}
> without a backward-compatibility story.
> Lots are applications/cluster-configurations are prone to fail hence, we need 
> a way to keep things working as-is for 0.16.0 and remove them for 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2260) TestMiniMRMapRedDebugScript times out

2007-12-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550214
 ] 

Konstantin Shvachko commented on HADOOP-2260:
-

This is interesting.
printStatistics() is called in 2 places FSEditLog.close() and 
FSEditLog.logSync().
Both methods wait() if FSEditLog is locked by another thread.
Is it possible that we have some kind of a deadlock there?

> TestMiniMRMapRedDebugScript times out
> -
>
> Key: HADOOP-2260
> URL: https://issues.apache.org/jira/browse/HADOOP-2260
> Project: Hadoop
>  Issue Type: Bug
>  Components: mapred
>Affects Versions: 0.16.0
> Environment: Linux
>Reporter: Konstantin Shvachko
>Assignee: Amareshwari Sri Ramadasu
> Fix For: 0.16.0
>
> Attachments: Hadoop-2260.log, testrun-2260.log
>
>
> I am running TestMiniMRMapRedDebugScript from trunc.
> This is what I see in the stdout:
> {code}
> 2007-11-22 02:21:23,494 WARN  conf.Configuration 
> (Configuration.java:loadResource(808)) - 
> hadoop/build/test/mapred/local/1_0/taskTracker/jobcache/job_200711220217_0001/task_200711220217_0001_m_00_0/job.xml:a
>  attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2007-11-22 02:21:28,940 INFO  jvm.JvmMetrics (JvmMetrics.java:init(56)) - 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2007-11-22 02:22:09,504 INFO  mapred.MapTask (MapTask.java:run(127)) - 
> numReduceTasks: 0
> 2007-11-22 02:22:42,434 WARN  mapred.TaskTracker 
> (TaskTracker.java:main(1982)) - Error running child
> java.io.IOException
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:41)
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:35)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1977)
> {code}
> Stderr and debugout both say: Bailing out.
> BTW on Windows everything works just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2000) Re-write NNBench to use MapReduce

2007-12-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550178
 ] 

Konstantin Shvachko commented on HADOOP-2000:
-

+1

> Re-write NNBench to use MapReduce
> -
>
> Key: HADOOP-2000
> URL: https://issues.apache.org/jira/browse/HADOOP-2000
> Project: Hadoop
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.15.0
>Reporter: Mukund Madhugiri
>Assignee: Mukund Madhugiri
> Fix For: 0.16.0
>
> Attachments: HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch, 
> HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch
>
>
> The proposal is to re-write the NNBench benchmark/test to measure Namenode 
> operations using MapReduce. Two buckets of measurements will be done:
> 1. Transactions per second 
> 2. Average latency
> for these operations
> - Create and Close file
> - Open file
> - Rename file
> - Delete file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2363) Unit tests fail if there is another instance of Hadoop

2007-12-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2363:


Attachment: TestZeroPort.patch

This should fix the problem.


> Unit tests fail if there is another instance of Hadoop
> --
>
> Key: HADOOP-2363
> URL: https://issues.apache.org/jira/browse/HADOOP-2363
> Project: Hadoop
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.16.0
>Reporter: Raghu Angadi
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TestZeroPort.patch
>
>
> If you are running another Hadoop cluster or DFS, many unit tests fail 
> because Namenode in MiniDFSCluster fails to bind to the right port. Most 
> likely HADOOP-2185 forgot to set right defaults for MiniDFSCluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2363) Unit tests fail if there is another instance of Hadoop

2007-12-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2363:


Fix Version/s: 0.16.0
Affects Version/s: 0.16.0
   Status: Patch Available  (was: Open)

> Unit tests fail if there is another instance of Hadoop
> --
>
> Key: HADOOP-2363
> URL: https://issues.apache.org/jira/browse/HADOOP-2363
> Project: Hadoop
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.16.0
>Reporter: Raghu Angadi
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: TestZeroPort.patch
>
>
> If you are running another Hadoop cluster or DFS, many unit tests fail 
> because Namenode in MiniDFSCluster fails to bind to the right port. Most 
> likely HADOOP-2185 forgot to set right defaults for MiniDFSCluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2000) Re-write NNBench to use MapReduce

2007-12-06 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549273
 ] 

Konstantin Shvachko commented on HADOOP-2000:
-

A few minor comments:
- Redundant imports:
{code}
   import java.text.DateFormat;
   import org.apache.hadoop.io.UTF8;
   import org.apache.hadoop.mapred.Reducer;
{code}
- Usage should specify mandatory options, like -operation, and optional ones.
- I get ArrayIndexOutOfBoundsException if I run any of the
{code}
NNBench -operation
NNBench -bytesToWrite
{code}

> Re-write NNBench to use MapReduce
> -
>
> Key: HADOOP-2000
> URL: https://issues.apache.org/jira/browse/HADOOP-2000
> Project: Hadoop
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.15.0
>Reporter: Mukund Madhugiri
>Assignee: Mukund Madhugiri
> Fix For: 0.16.0
>
> Attachments: HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch, 
> HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch
>
>
> The proposal is to re-write the NNBench benchmark/test to measure Namenode 
> operations using MapReduce. Two buckets of measurements will be done:
> 1. Transactions per second 
> 2. Average latency
> for these operations
> - Create and Close file
> - Open file
> - Rename file
> - Delete file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache

2007-12-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548911
 ] 

Konstantin Shvachko commented on HADOOP-1707:
-

I think this patch has been tested quite thoroughly, and I don't see any 
algorithmic flaws in it. 
The logic is fairly complicated though, so imo
# we need better documentation either in JavaDoc or at least in Jira. 
# it would be good if you could extract common actions for the client and the 
data-node into 
separate classes, not inner ones.

===   DFSClient.java
- DFSClient: 4 unused variables, members.
- DFSOutputStream.lb should be local variable.
- processDatanodeError() and DFSOutputStream.close() have common code.
- BlockReader.readChunk()
{code}
07/12/04 18:36:22 INFO fs.FSInputChecker: DFSClient readChunk got seqno 14 
offsetInBlock 7168
{code}
Should be DEBUG.
- More comments: What is e.g. dataQueue, ackQueue, bytesCurBlock?
- Some new members in DFSOutputStream can be calculated from the other. 
No need to store them all. See e.g.
{code}
private int packetSize = 0;
private int chunksPerPacket = 0;
private int chunksPerBlock = 0;
private int chunkSize = 0;
{code}
- In the line below "8" should be defined as a constant. Otherwise, the meaning 
of that is not clear.
{code}
  chunkSize = bytesPerChecksum + 8; // user data + checksum
{code}
- currentPacket should be a local variable of writeChunk()
- The 4 in the code snippet below looks misterious:
{code}
  if (len + cklen + 4 > chunkSize) {
{code}
- why start ResponseProcessor in processDatanodeError()
- some methods should be moved into new inner classes, like 
nextBlockOutputStream() should be a part of DataStreamer
- Packet should be factored out to a separate class (named probably DataPacket).
  It should have serialization/deserialization methods for packet header, which 
should 
  be reused in DFSClient and DataNodes for consistency in data transfer.
  It also should have methods readPacker() and writePacket()

===   DataNode.java
- import org.apache.hadoop.io.Text; is redundant.
- My Eclipse shows 5 variables that are "never read".
- Rather than using "4" on several occasions a constant should be defined
{code}
SIZE_OF_INTEGER = Integer.SIZE / Byte.SIZE;
{code}
and used whenever required.
- lastDataNodeRun() should not be public

===   FSDataset.java
- writeToBlock(): These are two searches in a map instead of one.
{code}
  if (ongoingCreates.containsKey(b)) {
ActiveFile activeFile = ongoingCreates.get(b);
{code}
- unfinalizeBlock() I kinda find the name funny.

===   General
- Convert comments like   // ..  to JavaDoc   /**  ...  */ style 
comments 
  when used as method or class headers even if they are private.
- Formatting. Tabs should be replaced by 2 spaces. Eg: ResponseProcessor.run(), 
DataStreamer.run()
- Formatting. Long lines.


> Remove the DFS Client disk-based cache
> --
>
> Key: HADOOP-1707
> URL: https://issues.apache.org/jira/browse/HADOOP-1707
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.16.0
>
> Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, 
> clientDiskBuffer11.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, 
> clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, 
> DataTransferProtocol.doc, DataTransferProtocol.html
>
>
> The DFS client currently uses a staging file on local disk to cache all 
> user-writes to a file. When the staging file accumulates 1 block worth of 
> data, its contents are flushed to a HDFS datanode. These operations occur 
> sequentially.
> A simple optimization of allowing the user to write to another staging file 
> while simultaneously uploading the contents of the first staging file to HDFS 
> will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

2007-12-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548797
 ] 

Konstantin Shvachko commented on HADOOP-2012:
-

The question here is whether we would go with our current decision if we new it 
will not be supported on windows?
If we let the balancer write a log type data (verified block #s) into a special 
file "balancer.log" instead of modifying
meta-data files, will that be a problem? Looks like Eric already had a proposal 
of scanning blocks in a predetermined
order. Should we reconsider this?

> Periodic verification at the Datanode
> -
>
> Key: HADOOP-2012
> URL: https://issues.apache.org/jira/browse/HADOOP-2012
> Project: Hadoop
>  Issue Type: New Feature
>  Components: dfs
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.16.0
>
> Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
> HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is 
> read by the client or by another datanode.  These errors are detected much 
> earlier if datanode can periodically verify the data checksums for the local 
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of 
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta 
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode 
> disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2260) TestMiniMRMapRedDebugScript times out

2007-12-04 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548469
 ] 

Konstantin Shvachko commented on HADOOP-2260:
-

The issue is not resolved. Do you have an explanation?

> TestMiniMRMapRedDebugScript times out
> -
>
> Key: HADOOP-2260
> URL: https://issues.apache.org/jira/browse/HADOOP-2260
> Project: Hadoop
>  Issue Type: Bug
>  Components: mapred
>Affects Versions: 0.16.0
> Environment: Linux
>Reporter: Konstantin Shvachko
>Assignee: Amareshwari Sri Ramadasu
> Fix For: 0.16.0
>
> Attachments: Hadoop-2260.log
>
>
> I am running TestMiniMRMapRedDebugScript from trunc.
> This is what I see in the stdout:
> {code}
> 2007-11-22 02:21:23,494 WARN  conf.Configuration 
> (Configuration.java:loadResource(808)) - 
> hadoop/build/test/mapred/local/1_0/taskTracker/jobcache/job_200711220217_0001/task_200711220217_0001_m_00_0/job.xml:a
>  attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2007-11-22 02:21:28,940 INFO  jvm.JvmMetrics (JvmMetrics.java:init(56)) - 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2007-11-22 02:22:09,504 INFO  mapred.MapTask (MapTask.java:run(127)) - 
> numReduceTasks: 0
> 2007-11-22 02:22:42,434 WARN  mapred.TaskTracker 
> (TaskTracker.java:main(1982)) - Error running child
> java.io.IOException
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:41)
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:35)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1977)
> {code}
> Stderr and debugout both say: Bailing out.
> BTW on Windows everything works just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: (was: FixedPorts2.patch)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts4.patch

This is a newer version.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts3.patch

Dhruba, thanks for the feedback. I finally realized why the new tests were 
sometimes failing.
The problem is with the clients.

Example 1: The name-node instantiates Trash, which creates a DFSClient (even if 
trash is disabled).
When the name-node stops this DFSClient remains up and the Secondary name-node 
would not start, 
because it cannot create a client. Namely the secondary nn just hangs trying to 
connect to the
main name-node (RPC.waitForProxy()).

Example 2: Similar thing happens with the JobTracker, which also creates a 
DFSClient in order
to remove a file. But never closes it. So the next start of the JobTracker 
would hang the same
way as in the previous example.

In both cases if you wait long enough the clients eventually dies, that is why 
the failure is
not stable.

I am closing the clients inside my tests now. Closing clients within Trash or 
JobTracker breaks
other unit tests, because the clients are static object, and closing a client 
once would destroy 
that object for everybody else, who opened the client inside the same JVM.
Fixing that is beyond the scope of this patch, I'll open another issue related 
to the problem.

All tests pass now.
As I mentioned before, the findBugs warning about assigning to static fields 
will remain unfixed.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = h

[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: (was: FixedPorts.patch)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2337) Trash never closes FileSystem

2007-12-03 Thread Konstantin Shvachko (JIRA)
Trash never closes FileSystem
-

 Key: HADOOP-2337
 URL: https://issues.apache.org/jira/browse/HADOOP-2337
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko
 Fix For: 0.16.0


Trash opens FileSystem using Path.getFileSystem() but never closes it.
This happens even if Trash is disabled (trash.interval == 0). 
I think trash should not open file system if it is disabled.
I also think that NameNode should not create a trash Thread when trash is 
disabled, see NameNode.init().


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache

2007-11-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546917
 ] 

Konstantin Shvachko commented on HADOOP-1707:
-

Since you have just encountered that.
The same problem will potentially be in the following 3 methods
- nextBlockOutputStream()
- locateFollowingBlock()
- DFSOutputStream.close()

where the client sleeps under a lock. In general a thread should wait() instead 
of sleep() under a lock.

> Remove the DFS Client disk-based cache
> --
>
> Key: HADOOP-1707
> URL: https://issues.apache.org/jira/browse/HADOOP-1707
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.16.0
>
> Attachments: clientDiskBuffer.patch, clientDiskBuffer2.patch, 
> clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, 
> DataTransferProtocol.doc, DataTransferProtocol.html
>
>
> The DFS client currently uses a staging file on local disk to cache all 
> user-writes to a file. When the staging file accumulates 1 block worth of 
> data, its contents are flushed to a HDFS datanode. These operations occur 
> sequentially.
> A simple optimization of allowing the user to write to another staging file 
> while simultaneously uploading the contents of the first staging file to HDFS 
> will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2154) Non-interleaved checksums would optimize block transfers.

2007-11-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546909
 ] 

Konstantin Shvachko commented on HADOOP-2154:
-

Yes on both comments.

> Non-interleaved checksums would optimize block transfers.
> -
>
> Key: HADOOP-2154
> URL: https://issues.apache.org/jira/browse/HADOOP-2154
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Rajagopal Natarajan
> Fix For: 0.16.0
>
>
> Currently when a block is transfered to a data-node the client interleaves 
> data chunks with the respective checksums. 
> This requires creating an extra copy of the original data in a new buffer 
> interleaved with the crcs.
> We can avoid extra copying if the data and the crc are fed to the socket one 
> after another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2154) Non-interleaved checksums would optimize block transfers.

2007-11-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546850
 ] 

shv edited comment on HADOOP-2154 at 11/29/07 1:36 PM:
---

Rajagopal, I do not see how the data:header ratio is decreasing here.

This issue is mainly about removing the interleaving buffer layout. Namely, now 
we partition the original data into chunks, 
calculate crc for each chunk and create the following buffer, which 
subsequently is transferred to a data-node:
| data chunk 1 | crc for data chunk 1 |  data chunk 2 | crc for data chunk 2 |  
... | data chunk n | crc for data chunk n | 
I propose to change it [back] to 
| the original data (+not+ partitioned into chunks) | crcs for the original 
data |

If you add a header before each data and crc chunk then in current approach you 
will have 2*n headers, while in the proposed 
approach there will be only 2. So the data:header ratio will increase: (|data| 
+ |crc|) / 2n < (|data| + |crc|) / 2

This should let us get rid of that extra buffer that is used to collect all the 
interleaved pieces together.

And thus the issue is not about "writing the chunks to the socket directly", 
but rather about removing chunks all together.
Imo, this is related to both reads and writes. May be reads and writes should 
even share this code.
Removing other redundant buffers is a part of a different issue.

Eric, why do you think transferring crc before the data would require less RAM 
on the client?
If it does then it definitely makes sense to send crcs before the data bytes.

  was (Author: shv):
Rajagopal, I do not see how the data:header ratio is decreasing here.

This issue is mainly about removing the interleaving buffer layout. Namely, now 
we partition the original data into chunks, 
calculate crc for each chunk and create the following buffer, which 
subsequently is transferred to a data-node:
| data chunk 1 | crc for data chunk 1 |  data chunk 2 | crc for data chunk 2 |  
... | data chunk n | crc for data chunk n | 
I propose to change it [back] to 
| the original data (+not+ partitioned into chunks) | crc for for the original 
data |

If you add a header before each data and crc chunk then in current approach you 
will have 2*n headers, while in the proposed 
approach there will be only 2. So the data:header ratio will increase: (|data| 
+ |crc|) / 2n < (|data| + |crc|) / 2

This should let us get rid of that extra buffer that is used to collect all the 
interleaved pieces together.

And thus the issue is not about "writing the chunks to the socket directly", 
but rather about removing chunks all together.
Imo, this is related to both reads and writes. May be reads and writes should 
even share this code.
Removing other redundant buffers is a part of a different issue.

Eric, why do you think transferring crc before the data would require less RAM 
on the client?
If it does then it definitely makes sense to send crcs before the data bytes.
  
> Non-interleaved checksums would optimize block transfers.
> -
>
> Key: HADOOP-2154
> URL: https://issues.apache.org/jira/browse/HADOOP-2154
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Rajagopal Natarajan
> Fix For: 0.16.0
>
>
> Currently when a block is transfered to a data-node the client interleaves 
> data chunks with the respective checksums. 
> This requires creating an extra copy of the original data in a new buffer 
> interleaved with the crcs.
> We can avoid extra copying if the data and the crc are fed to the socket one 
> after another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2260) TestMiniMRMapRedDebugScript times out

2007-11-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546901
 ] 

Konstantin Shvachko commented on HADOOP-2260:
-

Yes I tried, it's a different issue.

> TestMiniMRMapRedDebugScript times out
> -
>
> Key: HADOOP-2260
> URL: https://issues.apache.org/jira/browse/HADOOP-2260
> Project: Hadoop
>  Issue Type: Bug
>  Components: mapred
>Affects Versions: 0.16.0
> Environment: Linux
>Reporter: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: Hadoop-2260.log
>
>
> I am running TestMiniMRMapRedDebugScript from trunc.
> This is what I see in the stdout:
> {code}
> 2007-11-22 02:21:23,494 WARN  conf.Configuration 
> (Configuration.java:loadResource(808)) - 
> hadoop/build/test/mapred/local/1_0/taskTracker/jobcache/job_200711220217_0001/task_200711220217_0001_m_00_0/job.xml:a
>  attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2007-11-22 02:21:28,940 INFO  jvm.JvmMetrics (JvmMetrics.java:init(56)) - 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2007-11-22 02:22:09,504 INFO  mapred.MapTask (MapTask.java:run(127)) - 
> numReduceTasks: 0
> 2007-11-22 02:22:42,434 WARN  mapred.TaskTracker 
> (TaskTracker.java:main(1982)) - Error running child
> java.io.IOException
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:41)
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:35)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1977)
> {code}
> Stderr and debugout both say: Bailing out.
> BTW on Windows everything works just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2154) Non-interleaved checksums would optimize block transfers.

2007-11-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546850
 ] 

Konstantin Shvachko commented on HADOOP-2154:
-

Rajagopal, I do not see how the data:header ratio is decreasing here.

This issue is mainly about removing the interleaving buffer layout. Namely, now 
we partition the original data into chunks, 
calculate crc for each chunk and create the following buffer, which 
subsequently is transferred to a data-node:
| data chunk 1 | crc for data chunk 1 |  data chunk 2 | crc for data chunk 2 |  
... | data chunk n | crc for data chunk n | 
I propose to change it [back] to 
| the original data (+not+ partitioned into chunks) | crc for for the original 
data |

If you add a header before each data and crc chunk then in current approach you 
will have 2*n headers, while in the proposed 
approach there will be only 2. So the data:header ratio will increase: (|data| 
+ |crc|) / 2n < (|data| + |crc|) / 2

This should let us get rid of that extra buffer that is used to collect all the 
interleaved pieces together.

And thus the issue is not about "writing the chunks to the socket directly", 
but rather about removing chunks all together.
Imo, this is related to both reads and writes. May be reads and writes should 
even share this code.
Removing other redundant buffers is a part of a different issue.

Eric, why do you think transferring crc before the data would require less RAM 
on the client?
If it does then it definitely makes sense to send crcs before the data bytes.

> Non-interleaved checksums would optimize block transfers.
> -
>
> Key: HADOOP-2154
> URL: https://issues.apache.org/jira/browse/HADOOP-2154
> Project: Hadoop
>  Issue Type: Improvement
>  Components: dfs
>Affects Versions: 0.14.0
>Reporter: Konstantin Shvachko
>Assignee: Rajagopal Natarajan
> Fix For: 0.16.0
>
>
> Currently when a block is transfered to a data-node the client interleaves 
> data chunks with the respective checksums. 
> This requires creating an extra copy of the original data in a new buffer 
> interleaved with the crcs.
> We can avoid extra copying if the data and the crc are fed to the socket one 
> after another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2260) TestMiniMRMapRedDebugScript times out

2007-11-27 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2260:


Attachment: Hadoop-2260.log

Attaching a complete log of the test run. It contains a thread dump taken in 
the middle of the wait.

> TestMiniMRMapRedDebugScript times out
> -
>
> Key: HADOOP-2260
> URL: https://issues.apache.org/jira/browse/HADOOP-2260
> Project: Hadoop
>  Issue Type: Bug
>  Components: mapred
>Affects Versions: 0.16.0
> Environment: Linux
>Reporter: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: Hadoop-2260.log
>
>
> I am running TestMiniMRMapRedDebugScript from trunc.
> This is what I see in the stdout:
> {code}
> 2007-11-22 02:21:23,494 WARN  conf.Configuration 
> (Configuration.java:loadResource(808)) - 
> hadoop/build/test/mapred/local/1_0/taskTracker/jobcache/job_200711220217_0001/task_200711220217_0001_m_00_0/job.xml:a
>  attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2007-11-22 02:21:28,940 INFO  jvm.JvmMetrics (JvmMetrics.java:init(56)) - 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2007-11-22 02:22:09,504 INFO  mapred.MapTask (MapTask.java:run(127)) - 
> numReduceTasks: 0
> 2007-11-22 02:22:42,434 WARN  mapred.TaskTracker 
> (TaskTracker.java:main(1982)) - Error running child
> java.io.IOException
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:41)
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:35)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1977)
> {code}
> Stderr and debugout both say: Bailing out.
> BTW on Windows everything works just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2260) TestMiniMRMapRedDebugScript times out

2007-11-27 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546071
 ] 

Konstantin Shvachko commented on HADOOP-2260:
-

Yes I do on my linux machine. It is stable. I see the job is stuck doing 
something at 100% cpu, then reschedules tasks, and finally fails.
What could be the problem?

> TestMiniMRMapRedDebugScript times out
> -
>
> Key: HADOOP-2260
> URL: https://issues.apache.org/jira/browse/HADOOP-2260
> Project: Hadoop
>  Issue Type: Bug
>  Components: mapred
>Affects Versions: 0.16.0
> Environment: Linux
>Reporter: Konstantin Shvachko
> Fix For: 0.16.0
>
>
> I am running TestMiniMRMapRedDebugScript from trunc.
> This is what I see in the stdout:
> {code}
> 2007-11-22 02:21:23,494 WARN  conf.Configuration 
> (Configuration.java:loadResource(808)) - 
> hadoop/build/test/mapred/local/1_0/taskTracker/jobcache/job_200711220217_0001/task_200711220217_0001_m_00_0/job.xml:a
>  attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2007-11-22 02:21:28,940 INFO  jvm.JvmMetrics (JvmMetrics.java:init(56)) - 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2007-11-22 02:22:09,504 INFO  mapred.MapTask (MapTask.java:run(127)) - 
> numReduceTasks: 0
> 2007-11-22 02:22:42,434 WARN  mapred.TaskTracker 
> (TaskTracker.java:main(1982)) - Error running child
> java.io.IOException
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:41)
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:35)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1977)
> {code}
> Stderr and debugout both say: Bailing out.
> BTW on Windows everything works just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   8   >