[jira] Created: (HDFS-1488) hadoop will terminate Web service process when a hadoop mapreduce task is finished.

2010-11-05 Thread oliverboss (JIRA)
hadoop will terminate Web service process when a hadoop mapreduce task is 
finished.
---

 Key: HDFS-1488
 URL: https://issues.apache.org/jira/browse/HDFS-1488
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
 Environment: OS:windows XP + cygwin + hadoop 0.20.2 + myeclipse8.5
Reporter: oliverboss
 Fix For: 0.20.2


1. In the myeclipse 8.5 enviroment, I create a new Map/Reduce project named 
wordcount project!
2. create class including "public void main(string[] args)" named Wordcount
3.copy the hadoop wordcount exampler code from the hadoop folder to "wordcount 
project ".
4. in the main() method, I add a jetty server and start it .the codes is showed 
as follows!
5.when I build and run it, i find jetty server will be terminate after hadoop 
task finishs.
6.I check hadoop jobtracker logs showing as follows.
==
 logs
2010-11-05 16:47:41,968 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 9001: readAndProcess threw exception java.io.IOException: connection was 
forcibly closed .Count of bytes read: 0
java.io.IOException: connection was forcibly closed
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:25)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:1214)
at org.apache.hadoop.ipc.Server.access$16(Server.java:1210)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:801)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
==
codes:
public static void main(String[] args) throws Exception {

Handler handler = new AbstractHandler() {

@Override
public void handle(String target, HttpServletRequest 
request,
HttpServletResponse response, int 
dispatch)
throws IOException, ServletException {
// TODO Auto-generated method stub
response.setContentType("text/html");
response.setStatus(HttpServletResponse.SC_OK);

response.getWriter().println("--start---");
// ---
// ---

response.getWriter().println("--end1---");
((Request) request).setHandled(true);
// 
request.getRequestDispatcher("/WebRoot/result.jsp").forward(request,
// response);
}
};

// 开启Jetty服务
Server server = new Server(8086);
server.setHandler(handler);
server.start();
// server.join();



SimpleDateFormat tempDate = new SimpleDateFormat("_MM_dd"
+ "_hh_mm_ss");
String datetime = tempDate.format(new java.util.Date());

String out4 = "out" + datetime;
args = new String[] { "in", out4 };
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount  ");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1488) hadoop will terminate Web service process when a hadoop mapreduce task is finished.

2010-11-05 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928619#action_12928619
 ] 

Doug Cutting commented on HDFS-1488:


System.exit(), which your program calls above, kills all threads in the 
process.  So this is expected.  Did I misunderstand your question?

> hadoop will terminate Web service process when a hadoop mapreduce task is 
> finished.
> ---
>
> Key: HDFS-1488
> URL: https://issues.apache.org/jira/browse/HDFS-1488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.2
> Environment: OS:windows XP + cygwin + hadoop 0.20.2 + myeclipse8.5
>Reporter: oliverboss
> Fix For: 0.20.2
>
>
> 1. In the myeclipse 8.5 enviroment, I create a new Map/Reduce project named 
> wordcount project!
> 2. create class including "public void main(string[] args)" named Wordcount
> 3.copy the hadoop wordcount exampler code from the hadoop folder to 
> "wordcount project ".
> 4. in the main() method, I add a jetty server and start it .the codes is 
> showed as follows!
> 5.when I build and run it, i find jetty server will be terminate after hadoop 
> task finishs.
> 6.I check hadoop jobtracker logs showing as follows.
> ==
>  logs
> 2010-11-05 16:47:41,968 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 9001: readAndProcess threw exception java.io.IOException: 
> connection was forcibly closed .Count of bytes read: 0
> java.io.IOException: connection was forcibly closed
>   at sun.nio.ch.SocketDispatcher.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:25)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>   at org.apache.hadoop.ipc.Server.channelRead(Server.java:1214)
>   at org.apache.hadoop.ipc.Server.access$16(Server.java:1210)
>   at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:801)
>   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>   at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> ==
> codes:
>   public static void main(String[] args) throws Exception {
>   Handler handler = new AbstractHandler() {
>   @Override
>   public void handle(String target, HttpServletRequest 
> request,
>   HttpServletResponse response, int 
> dispatch)
>   throws IOException, ServletException {
>   // TODO Auto-generated method stub
>   response.setContentType("text/html");
>   response.setStatus(HttpServletResponse.SC_OK);
>   
> response.getWriter().println("--start---");
>   // ---
>   // ---
>   
> response.getWriter().println("--end1---");
>   ((Request) request).setHandled(true);
>   // 
> request.getRequestDispatcher("/WebRoot/result.jsp").forward(request,
>   // response);
>   }
>   };
>   // 开启Jetty服务
>   Server server = new Server(8086);
>   server.setHandler(handler);
>   server.start();
>   // server.join();
>   
>   
>   SimpleDateFormat tempDate = new SimpleDateFormat("_MM_dd"
>   + "_hh_mm_ss");
>   String datetime = tempDate.format(new java.util.Date());
>   String out4 = "out" + datetime;
>   args = new String[] { "in", out4 };
>   Configuration conf = new Configuration();
>   String[] otherArgs = new GenericOptionsParser(conf, args)
>   .getRemainingArgs();
>   if (otherArgs.length != 2) {
>   System.err.println("Usage: wordcount  ");
>   System.exit(2);
>   }
>   Job job = new Job(conf, "word count");
>   job.setJarByClass(WordCount.class);
>   job.setMapperClass(TokenizerMapper.class);
>   job.setCombinerClass(IntSumReducer.class);
>   job.setReducerClass(IntSumReducer.class);
>   job.setOutputKeyClass(Text.class);
>   job.setOutputValueClass

[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

2010-11-05 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928672#action_12928672
 ] 

Sanjay Radia commented on HDFS-1073:


>... but then you get the problem of synchronizing the start of a checkpoint 
>and the edits roll event. Otherwise checkpoints may become way behind the 
>current namespace state.
I guess I am missing this. 
We should avoid the synchronization that has been there in the original design 
of the secondary NN.
The BN can checkpoint whenever it feels that the set of rolled edits since 
previous checkpoint is large enough. It may be simpler to do it on every roll 
if we have configured the
NN to roll say every 10K transactions.
Perhaps what I am proposing works for the checkpointer but not for the BN 
because of some property of the BN that I am missing.

> Simpler model for Namenode's fs Image and edit Logs 
> 
>
> Key: HDFS-1073
> URL: https://issues.apache.org/jira/browse/HDFS-1073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sanjay Radia
>Assignee: Todd Lipcon
> Attachments: hdfs-1073.txt, hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

2010-11-05 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928682#action_12928682
 ] 

Sanjay Radia commented on HDFS-1073:


>.. not understand the rational for "I'm quitting!" record. Why should NN care 
>whether last record was lost or not, just keep going with what it has.
The quitting record basically shows that the NN did a shutdown and did not die. 
This is useful to know. The NN will still continue to keep going as before.

If we were to add a similar "rolled" transaction at the end of every roll then 
we could avoid the edits_100-100 since it will become edits_100-101.
Also the "rolled" transaction is a nice way to to tell the BN that the primary 
did a roll without any special message from NN to BNN.



> Simpler model for Namenode's fs Image and edit Logs 
> 
>
> Key: HDFS-1073
> URL: https://issues.apache.org/jira/browse/HDFS-1073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sanjay Radia
>Assignee: Todd Lipcon
> Attachments: hdfs-1073.txt, hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-903) NN should verify images and edit logs on startup

2010-11-05 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-903:
---

Attachment: trunkChecksumImage3.patch

This patch made the change in Checkpointer as Konstantin suggested. Actually 
TestBackupNode caught this.

The patch also fixed a subtle bug in TestSaveNameSpace caused by using spy. A 
spyed object does only a shallow copy of the original object. So when a new 
checksum is generated when saving the image to disk, the new value is set in 
the spyImage, but when saving the signature into VERSION file using 
StorageDirectory, it uses the value set in orignalImage. So reloading the image 
would fail. I fixed it by explicitly setting the storage directories in 
spyImage.

> NN should verify images and edit logs on startup
> 
>
> Key: HDFS-903
> URL: https://issues.apache.org/jira/browse/HDFS-903
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Hairong Kuang
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: trunkChecksumImage.patch, trunkChecksumImage1.patch, 
> trunkChecksumImage2.patch, trunkChecksumImage3.patch
>
>
> I was playing around with corrupting fsimage and edits logs when there are 
> multiple dfs.name.dirs specified. I noticed that:
>  * As long as your corruption does not make the image invalid, eg changes an 
> opcode so it's an invalid opcode HDFS doesn't notice and happily uses a 
> corrupt image or applies the corrupt edit.
> * If the first image in dfs.name.dir is "valid" it replaces the other copies 
> in the other name.dirs, even if they are different, with this first image, ie 
> if the first image is actually invalid/old/corrupt metadata than you've lost 
> your valid metadata, which can result in data loss if the namenode garbage 
> collects blocks that it thinks are no longer used.
> How about we maintain a checksum as part of the image and edit log and check 
> those on startup and refuse to startup if they are different. Or at least 
> provide a configuration option to do so if people are worried about the 
> overhead of maintaining checksums of these files. Even if we assume 
> dfs.name.dir is reliable storage this guards against operator errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1457) Limit transmission rate when transfering image between primary and secondary NNs

2010-11-05 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928731#action_12928731
 ] 

Hairong Kuang commented on HDFS-1457:
-

Another option is to initialize the throttler in NameNode and then pass it to 
the http servlet throught context.

> Limit transmission rate when transfering image between primary and secondary 
> NNs
> 
>
> Key: HDFS-1457
> URL: https://issues.apache.org/jira/browse/HDFS-1457
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: checkpoint-limitandcompress.patch, 
> trunkThrottleImage.patch, trunkThrottleImage1.patch
>
>
> If the fsimage is very big. The network is full in a short time when 
> SeconaryNamenode do checkpoint, leading to Jobtracker access Namenode to get 
> relevant file data to fail in job initialization phase. So we limit 
> transmission speed and compress transmission to resolve the problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1065) Secondary Namenode fails to fetch image and edits files

2010-11-05 Thread Dmytro Molkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Molkov resolved HDFS-1065.
-

Resolution: Duplicate

This issue is being worked on in HDFS-1481, so closing this one as duplicate

> Secondary Namenode fails to fetch image and edits files
> ---
>
> Key: HDFS-1065
> URL: https://issues.apache.org/jira/browse/HDFS-1065
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Dmytro Molkov
>
> We recently started experiencing problems where Secondary Namenode fails to 
> fetch the image from the NameNode. The basic problem is described in 
> HDFS-1024, but that JIRA was only dealing with possible data corruption, 
> since then we got to a place where we could not compact the fsimage anymore 
> because failures were 100% of the time.
> Here is what we have found out:
> The fetch still fails with the same exception as the HDFS-1024 (Jetty closes 
> the connection before the file is sent)
> We suspect the underlying reason to be extensive garbage collection on the 
> NameNode (1/5 of all time is being spent in garbage collection). And the 
> reason for that might be the bug that is solved with HADOOP-6577 (we have a 
> lot of large RPC requests, which means we allocate and free a lot of memory 
> all the time).
> Because of GC the speed of the transfer drops to 700Kb/s
> Having said all of that current mechanism of fetching the image is still 
> potentially flawed. When dealing with large images namenode is under stress 
> of sending multigig files over the wire to the client while still serving 
> requests.
> This JIRA is to discuss the possible ways of separating NameNode and the 
> image fetching by the secondary namenode.
> One thought we had was fetching the image using SCP rather than HTTP download 
> from the NameNode. 
> This way the NameNode will have less pressure on it, on the other hand this 
> will introduce new components that are not exactly under hadoop control (ssh 
> client and server).
> To deal with possible data corruption with SCP copy we would also want to 
> extend CheckpointSignature to have checksum on the file, so it can be checked 
> on the client side.
> Please let me know what you think.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-11-05 Thread Diego Marron (JIRA)
breaking the dependency between FSEditLog and FSImage
-

 Key: HDFS-1489
 URL: https://issues.apache.org/jira/browse/HDFS-1489
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Diego Marron


This is a refactor patch which its main concerns are:
- breaking the dependency between FSEditLog and FSImage
- Splitting the abstracting the error handling and directory management, 
- Decoupling Storage from FSImage.

In order to accomplish the above goal, we will need to introduce new classes:
-  NNStorage: Will care about the storage. It extends Storage class, and will 
contain the StorageDirectories.
-  NNUtils: Some utility static methods on FSImage and FSEditLog will be moved 
here.
-  PersistenceManager: FSNameSystem will now be responsible for managing the 
FSImage & FSEditLog objects. There will be some logic that will have to moved 
out of FSImage to facilite this. For this we propose a PersistanceManager? 
object as follows.

For more deep details, see the design document uploaded.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-11-05 Thread Diego Marron (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Marron updated HDFS-1489:
---

Attachment: ph1p2.pdf

The design document with more details about the work on this patch

> breaking the dependency between FSEditLog and FSImage
> -
>
> Key: HDFS-1489
> URL: https://issues.apache.org/jira/browse/HDFS-1489
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Diego Marron
> Attachments: ph1p2.pdf
>
>
> This is a refactor patch which its main concerns are:
> - breaking the dependency between FSEditLog and FSImage
> - Splitting the abstracting the error handling and directory management, 
> - Decoupling Storage from FSImage.
> In order to accomplish the above goal, we will need to introduce new classes:
> -  NNStorage: Will care about the storage. It extends Storage class, and will 
> contain the StorageDirectories.
> -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
> moved here.
> -  PersistenceManager: FSNameSystem will now be responsible for managing the 
> FSImage & FSEditLog objects. There will be some logic that will have to moved 
> out of FSImage to facilite this. For this we propose a PersistanceManager? 
> object as follows.
> For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-903) NN should verify images and edit logs on startup

2010-11-05 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928735#action_12928735
 ] 

Hairong Kuang commented on HDFS-903:


antPatch.sh passed except for
 [exec] -1 release audit.  The applied patch generated 97 release audit 
warnings (more than the trunk's current 1 warnings).
The release warnings are all about license header. But my patch does not add 
any new file. I do think that there is a bug in the script.

All unit tests are passed except for the known failed ones.

> NN should verify images and edit logs on startup
> 
>
> Key: HDFS-903
> URL: https://issues.apache.org/jira/browse/HDFS-903
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Eli Collins
>Assignee: Hairong Kuang
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: trunkChecksumImage.patch, trunkChecksumImage1.patch, 
> trunkChecksumImage2.patch, trunkChecksumImage3.patch
>
>
> I was playing around with corrupting fsimage and edits logs when there are 
> multiple dfs.name.dirs specified. I noticed that:
>  * As long as your corruption does not make the image invalid, eg changes an 
> opcode so it's an invalid opcode HDFS doesn't notice and happily uses a 
> corrupt image or applies the corrupt edit.
> * If the first image in dfs.name.dir is "valid" it replaces the other copies 
> in the other name.dirs, even if they are different, with this first image, ie 
> if the first image is actually invalid/old/corrupt metadata than you've lost 
> your valid metadata, which can result in data loss if the namenode garbage 
> collects blocks that it thinks are no longer used.
> How about we maintain a checksum as part of the image and edit log and check 
> those on startup and refuse to startup if they are different. Or at least 
> provide a configuration option to do so if people are worried about the 
> overhead of maintaining checksums of these files. Even if we assume 
> dfs.name.dir is reliable storage this guards against operator errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1457) Limit transmission rate when transfering image between primary and secondary NNs

2010-11-05 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928736#action_12928736
 ] 

Ramkumar Vadali commented on HDFS-1457:
---

@Hairong, I see a lot of release audit warnings in a clean MR checkout too. I 
think this is due to HADOOP-7008. Please see MAPREDUCE-2172 for this.

> Limit transmission rate when transfering image between primary and secondary 
> NNs
> 
>
> Key: HDFS-1457
> URL: https://issues.apache.org/jira/browse/HDFS-1457
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: checkpoint-limitandcompress.patch, 
> trunkThrottleImage.patch, trunkThrottleImage1.patch
>
>
> If the fsimage is very big. The network is full in a short time when 
> SeconaryNamenode do checkpoint, leading to Jobtracker access Namenode to get 
> relevant file data to fail in job initialization phase. So we limit 
> transmission speed and compress transmission to resolve the problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-11-05 Thread Diego Marron (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Marron updated HDFS-1489:
---

Attachment: (was: ph1p2.pdf)

> breaking the dependency between FSEditLog and FSImage
> -
>
> Key: HDFS-1489
> URL: https://issues.apache.org/jira/browse/HDFS-1489
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Diego Marron
>
> This is a refactor patch which its main concerns are:
> - breaking the dependency between FSEditLog and FSImage
> - Splitting the abstracting the error handling and directory management, 
> - Decoupling Storage from FSImage.
> In order to accomplish the above goal, we will need to introduce new classes:
> -  NNStorage: Will care about the storage. It extends Storage class, and will 
> contain the StorageDirectories.
> -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
> moved here.
> -  PersistenceManager: FSNameSystem will now be responsible for managing the 
> FSImage & FSEditLog objects. There will be some logic that will have to moved 
> out of FSImage to facilite this. For this we propose a PersistanceManager? 
> object as follows.
> For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1490) TransferFSImage should timeout

2010-11-05 Thread Dmytro Molkov (JIRA)
TransferFSImage should timeout
--

 Key: HDFS-1490
 URL: https://issues.apache.org/jira/browse/HDFS-1490
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor


Sometimes when primary crashes during image transfer secondary namenode would 
hang trying to read the image from HTTP connection forever.
It would be great to set timeouts on the connection so if something like that 
happens there is no need to restart the secondary itself.
In our case restarting components is handled by the set of scripts and since 
the Secondary as the process is running it would just stay hung until we get an 
alarm saying the checkpointing doesn't happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-11-05 Thread Diego Marron (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Marron updated HDFS-1489:
---

Attachment: HDFS-1489.pdf

The design doc with more details about the work on this patch

> breaking the dependency between FSEditLog and FSImage
> -
>
> Key: HDFS-1489
> URL: https://issues.apache.org/jira/browse/HDFS-1489
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Diego Marron
> Attachments: HDFS-1489.pdf
>
>
> This is a refactor patch which its main concerns are:
> - breaking the dependency between FSEditLog and FSImage
> - Splitting the abstracting the error handling and directory management, 
> - Decoupling Storage from FSImage.
> In order to accomplish the above goal, we will need to introduce new classes:
> -  NNStorage: Will care about the storage. It extends Storage class, and will 
> contain the StorageDirectories.
> -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
> moved here.
> -  PersistenceManager: FSNameSystem will now be responsible for managing the 
> FSImage & FSEditLog objects. There will be some logic that will have to moved 
> out of FSImage to facilite this. For this we propose a PersistanceManager? 
> object as follows.
> For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

2010-11-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928764#action_12928764
 ] 

Todd Lipcon commented on HDFS-1073:
---

Hey all. Back in town after a few weeks in Japan, sorry for the relative 
absence.

bq. I do not see or did not understand the rational for "I'm quitting!" record. 
Why should NN care whether last record was lost or not, just keep going with 
what it has. Worked so far.

I think one complication here is that we currently never have to re-open an 
edits file for append, since when we start, we always save a "fresh" checkpoint 
image and empty "edits" if there were any edits to apply. One advantage of the 
new design is that we no longer have to do this - we just bump the edits log 
number to the next one in sequence - ie we roll on startup if the latest edit 
log is non-empty.

bq. Also the "rolled" transaction is a nice way to to tell the BN that the 
primary did a roll without any special message from NN to BNN

The patch currently does exactly that - we just don't write down the special 
"roll" entry in any file streams. We certainly could, though, if it's useful to 
know that a file was completely written.

bq. Todd, I briefly looked at the patch. It looks like you are trying to get 
rid of the Journal Spool in BN. Correct me if I am wrong. I don't think you can

In the patch, the spooling has just become a bit more of a general case. Rather 
than spooling to a special file, we simply ask the primary NN to roll, and then 
wait for the roll to happen. While waiting for the roll, we continue to apply 
edits. One we get the special "roll" record, we stop applying edits and make a 
checkpoint at that point. Once the checkpoint completes, we "converge" by 
continuing to read forward in the sequence of log files until we hit the end 
and are back "in sync"

bq. A backup NN should not ask for a roll. The primary should roll when it 
feels it is necessary.

I think the simplest will be if anyone may ask for a roll - ie CN, BN, or NN. 
The NN of course is the one that actually makes the decision, but the decision 
may be in response to a request from one of the other nodes. I think this 
ability is useful not just for CN,BN, and NN, but also for example in backup 
scripts - you may ask the NN to roll right before making a tarball of the edits 
directory, and thus be sure that you get all of the current edits in 
"finalized" files.




> Simpler model for Namenode's fs Image and edit Logs 
> 
>
> Key: HDFS-1073
> URL: https://issues.apache.org/jira/browse/HDFS-1073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sanjay Radia
>Assignee: Todd Lipcon
> Attachments: hdfs-1073.txt, hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-11-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928768#action_12928768
 ] 

Todd Lipcon commented on HDFS-1489:
---

big +1, I was trying to work on this last week as well, but it gets really 
hairy.

HDFS-1473 is also related - I had a patch for that one, but HDFS-1435 
complicated stuff a bit more.

> breaking the dependency between FSEditLog and FSImage
> -
>
> Key: HDFS-1489
> URL: https://issues.apache.org/jira/browse/HDFS-1489
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Diego Marron
> Attachments: HDFS-1489.pdf
>
>
> This is a refactor patch which its main concerns are:
> - breaking the dependency between FSEditLog and FSImage
> - Splitting the abstracting the error handling and directory management, 
> - Decoupling Storage from FSImage.
> In order to accomplish the above goal, we will need to introduce new classes:
> -  NNStorage: Will care about the storage. It extends Storage class, and will 
> contain the StorageDirectories.
> -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
> moved here.
> -  PersistenceManager: FSNameSystem will now be responsible for managing the 
> FSImage & FSEditLog objects. There will be some logic that will have to moved 
> out of FSImage to facilite this. For this we propose a PersistanceManager? 
> object as follows.
> For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1457) Limit transmission rate when transfering image between primary and secondary NNs

2010-11-05 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1457:


Attachment: trunkThrottleImage2.patch

After talking with Konstantin, we decided to create a throttler on the fly on 
each file transfer. This patch does this.

> Limit transmission rate when transfering image between primary and secondary 
> NNs
> 
>
> Key: HDFS-1457
> URL: https://issues.apache.org/jira/browse/HDFS-1457
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: checkpoint-limitandcompress.patch, 
> trunkThrottleImage.patch, trunkThrottleImage1.patch, trunkThrottleImage2.patch
>
>
> If the fsimage is very big. The network is full in a short time when 
> SeconaryNamenode do checkpoint, leading to Jobtracker access Namenode to get 
> relevant file data to fail in job initialization phase. So we limit 
> transmission speed and compress transmission to resolve the problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

2010-11-05 Thread Robert Chansler (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928794#action_12928794
 ] 

Robert Chansler commented on HDFS-1073:
---

bq. Worked so far.
 How would you know? I just feel better having some check that the log is 
complete, especially in the new world where the log is a sequence of files. 
It's conceivable that not only could  the last log file be truncated, any 
number of log _files_ at the end of the log could be missing entirely. Of 
course, if the log files were being written to a more robust file system like 
HDFS, the need for integrity checks would be less.

> Simpler model for Namenode's fs Image and edit Logs 
> 
>
> Key: HDFS-1073
> URL: https://issues.apache.org/jira/browse/HDFS-1073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sanjay Radia
>Assignee: Todd Lipcon
> Attachments: hdfs-1073.txt, hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially

2010-11-05 Thread Erik Steffl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Steffl updated HDFS-1448:
--

Attachment: HDFS-1448-0.22-1.patch

> Create multi-format parser for edits logs file, support binary and XML 
> formats initially
> 
>
> Key: HDFS-1448
> URL: https://issues.apache.org/jira/browse/HDFS-1448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.22.0
>Reporter: Erik Steffl
>Assignee: Erik Steffl
> Fix For: 0.22.0
>
> Attachments: editsStored, HDFS-1448-0.22-1.patch, 
> HDFS-1448-0.22.patch, Viewer hierarchy.pdf
>
>
> Create multi-format parser for edits logs file, support binary and XML 
> formats initially.
> Parsing should work from any supported format to any other supported format 
> (e.g. from binary to XML and from XML to binary).
> The binary format is the format used by FSEditLog class to read/write edits 
> file.
> Primary reason to develop this tool is to help with troubleshooting, the 
> binary format is hard to read and edit (for human troubleshooters).
> Longer term it could be used to clean up and minimize parsers for fsimage and 
> edits files. Edits parser OfflineEditsViewer is written in a very similar 
> fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
> and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
> is subject to change, specifically depending on adoption of avro (which would 
> completely change how objects are serialized as well as provide ways to 
> convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially

2010-11-05 Thread Erik Steffl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928893#action_12928893
 ] 

Erik Steffl commented on HDFS-1448:
---

Patch HDFS-1448-0.22-1.patch addresses the review comments 
https://issues.apache.org/jira/browse/HDFS-1448?focusedCommentId=12920717&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12920717

It also has few minor updates:
  - uses MiniDFSCluster.Builder instead of a constructor
  - handles edits that were not properly closed (e.g. namenode crushed) better
  - tests check that all op codes were tested

> Create multi-format parser for edits logs file, support binary and XML 
> formats initially
> 
>
> Key: HDFS-1448
> URL: https://issues.apache.org/jira/browse/HDFS-1448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.22.0
>Reporter: Erik Steffl
>Assignee: Erik Steffl
> Fix For: 0.22.0
>
> Attachments: editsStored, HDFS-1448-0.22-1.patch, 
> HDFS-1448-0.22.patch, Viewer hierarchy.pdf
>
>
> Create multi-format parser for edits logs file, support binary and XML 
> formats initially.
> Parsing should work from any supported format to any other supported format 
> (e.g. from binary to XML and from XML to binary).
> The binary format is the format used by FSEditLog class to read/write edits 
> file.
> Primary reason to develop this tool is to help with troubleshooting, the 
> binary format is hard to read and edit (for human troubleshooters).
> Longer term it could be used to clean up and minimize parsers for fsimage and 
> edits files. Edits parser OfflineEditsViewer is written in a very similar 
> fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
> and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
> is subject to change, specifically depending on adoption of avro (which would 
> completely change how objects are serialized as well as provide ways to 
> convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.