[jira] Commented: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life
[ https://issues.apache.org/jira/browse/HADOOP-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560192#action_12560192 ] Michael Bieniosek commented on HADOOP-2572: --- This should be re-written to test if the file exists rather than catching the exception. Exceptions should be saved for unexpected problems. This would make the patch much more complicated, because the code that synthesizes the filename is buried in the TaskLog.Reader constructor. TaskLogServlet returns 410 when trying to access log early in task life --- Key: HADOOP-2572 URL: https://issues.apache.org/jira/browse/HADOOP-2572 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: hadoop-2572.patch Early in a map task life, or for tasks that died quickly, the file $task/syslog might not exist. In this case, the TaskLogServlet gives a status 410. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2612) Mysterious ArrayOutOfBoundsException in HTable.commit
Mysterious ArrayOutOfBoundsException in HTable.commit - Key: HADOOP-2612 URL: https://issues.apache.org/jira/browse/HADOOP-2612 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek I got this exception using a post-0.15.0 hbase trunk: Caused by: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.HTable.commit(HTable.java:904) at org.apache.hadoop.hbase.HTable.commit(HTable.java:875) at xxx.PutHbase$HbaseUploader.writeHbaseNoRetry(PutHbase.java:107) Where writeHbaseNoRetry looks like: private void writeHbaseNoRetry(HTable table, String column, String row, File contents) throws IOException { long lockid = table.startUpdate(new Text(row)); try { table.put(lockid, new Text(column), FileUtil.readFile(contents)); table.commit(lockid); } finally { table.abort(lockid); } } I found this in my error logs -- it is rare, and I am not sure how to reproduce it. Contents could be 1kb-100kb long. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life
TaskLogServlet returns 410 when trying to access log early in task life --- Key: HADOOP-2572 URL: https://issues.apache.org/jira/browse/HADOOP-2572 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Fix For: 0.16.0 Early in a map task life, or for tasks that died quickly, the file $task/syslog might not exist. In this case, the TaskLogServlet gives a status 410. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life
[ https://issues.apache.org/jira/browse/HADOOP-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2572: -- Status: Patch Available (was: Open) TaskLogServlet returns 410 when trying to access log early in task life --- Key: HADOOP-2572 URL: https://issues.apache.org/jira/browse/HADOOP-2572 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Fix For: 0.16.0 Early in a map task life, or for tasks that died quickly, the file $task/syslog might not exist. In this case, the TaskLogServlet gives a status 410. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life
[ https://issues.apache.org/jira/browse/HADOOP-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2572: -- Attachment: hadoop-2572.patch Here is a patch that ignores FileNotFoundExceptions, and just creates empty boxes for portions of the output log that are not present. I marked this issue fix for 0.16.0 -- I'm not sure if I missed the cutoff, but this is a low-impact issue (only affects the TaskLogServlet), and it makes it less convenient for me to debug failed map tasks, so it is important for me. TaskLogServlet returns 410 when trying to access log early in task life --- Key: HADOOP-2572 URL: https://issues.apache.org/jira/browse/HADOOP-2572 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: hadoop-2572.patch Early in a map task life, or for tasks that died quickly, the file $task/syslog might not exist. In this case, the TaskLogServlet gives a status 410. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2538) NPE in TaskLog.java
[ https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2538: -- Status: Patch Available (was: Open) NPE in TaskLog.java --- Key: HADOOP-2538 URL: https://issues.apache.org/jira/browse/HADOOP-2538 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Attachments: hadoop-2538.patch In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2538) NPE in TaskLog.java
[ https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2538: -- Attachment: hadoop-2538.patch add a null check and warn user appropriately NPE in TaskLog.java --- Key: HADOOP-2538 URL: https://issues.apache.org/jira/browse/HADOOP-2538 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Attachments: hadoop-2538.patch In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2538) NPE in TaskLog.java
[ https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2538: -- Priority: Trivial (was: Major) NPE in TaskLog.java --- Key: HADOOP-2538 URL: https://issues.apache.org/jira/browse/HADOOP-2538 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Priority: Trivial Attachments: hadoop-2538.patch In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2538) NPE in TaskLog.java
[ https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2538: -- Description: In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) Note that /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true is an invalid url; the url should look like plaintext=truefilter=STDOUT was: In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) NPE in TaskLog.java --- Key: HADOOP-2538 URL: https://issues.apache.org/jira/browse/HADOOP-2538 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek Priority: Trivial Attachments: hadoop-2538.patch In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) Note that /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true is an invalid url; the url should look like plaintext=truefilter=STDOUT -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1770) Illegal state exception in printTaskLog - sendError
[ https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557506#action_12557506 ] Michael Bieniosek commented on HADOOP-1770: --- This is because response.sendError is called after out.write in TaskLogServlet. Illegal state exception in printTaskLog - sendError Key: HADOOP-1770 URL: https://issues.apache.org/jira/browse/HADOOP-1770 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Michael Bieniosek This error shows up in my logs: 2007-08-23 16:40:08,028 WARN /: /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2546) PHP class for Rest Interface
[ https://issues.apache.org/jira/browse/HADOOP-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557052#action_12557052 ] Michael Bieniosek commented on HADOOP-2546: --- Hi Billy, It might be easier for you to use php's built-in http client, rather than writing the http client yourself. That way, you won't have to deal with chunked-encoding, keepalives, etc. (which you ignore, but you might get some performance improvements from them). To do the cell fetching, for example, you could just do: {code} $xml = simplexml_load_file(http://$host:60050/api/$table/row/$row;); $content = array(); foreach ($xml-column as $column) { $content[base64_decode($column-name)] = base64_decode($column-value); } return $content; {code} PHP class for Rest Interface Key: HADOOP-2546 URL: https://issues.apache.org/jira/browse/HADOOP-2546 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Billy Pearson Priority: Trivial Attachments: hbase_rest.php This is a php class to interact with the rest interface this is my first copy so there could be bugs and changes to come as the rest interface changes. I will make this in to a patch once I am done with it. there is lots of comments in the file and notes on usage but here is some basic stuff to get you started. you are welcome to suggest changes to make it faster or more usable. Basic Usage here more details in notes with each function // open a new connection to rest server. Hbase Master default port is 60010 $hbase = new hbase_rest($ip, $port); // get list of tables $tables = $hbase-list_tables(); // get table column family names and compression stuff $table_info = $hbase-table_schema(search_index); // get start and end row keys of each region $regions = $hbase-regions($table); // select data from hbase $results = $hbase-select($table,$row_key); // insert data into hbase the $column and $data can be arrays with more then one column inserted in one request $hbase-insert($table,$row,$column(s),$data(s)); // delete a column from a row. Can not use * at this point to remove all I thank there is plans to add this. $hbase-remove($table,$row,$column); // start a scanner on a set range of table $handle = $hbase-scanner_start($table,$cols,$start_row,$end_row); // pull the next row of data for a scanner handle $results = $hbase-scanner_get($handle); // delete a scanner handle $hbase-scanner_delete($handle); Example of using a scanner this will loop each row until it out of rows. include(hbase_rest.php); $hbase = new hbase_rest($ip, $port); $handle = $hbase-scanner_start($table,$cols,$start_row,$end_row); $results = true; while ($results){ $results = $hbase-scanner_get($handle); if ($results){ foreach($results['column'] as $key = $value){ code here to work with the $key/column name and the $value of the column } // end foreach } // end if }// end while $hbase-scanner_delete($handle); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()
[ https://issues.apache.org/jira/browse/HADOOP-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556658#action_12556658 ] Michael Bieniosek commented on HADOOP-2364: --- I can't remember exactly. I believe the regionserver keeps trying to connect, until eventually the old lease times out on the master. when hbase regionserver restarts, it says impossible state for createLease() -- Key: HADOOP-2364 URL: https://issues.apache.org/jira/browse/HADOOP-2364 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor I restarted a regionserver, and got this error in its logs: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.AssertionError: Impossible state for createLease(): Lease -435227488/-435227488 is still held. at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145) at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278 ) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy0.regionServerStartup(Unknown Source) at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav a:1025) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1770) Illegal state exception in printTaskLog - sendError
[ https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556689#action_12556689 ] Michael Bieniosek commented on HADOOP-1770: --- This makes it impossible for me to view logs for tasks with a small amount of output. This is actually masking another exception, since the exception comes from a catch block. Illegal state exception in printTaskLog - sendError Key: HADOOP-1770 URL: https://issues.apache.org/jira/browse/HADOOP-1770 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Michael Bieniosek This error shows up in my logs: 2007-08-23 16:40:08,028 WARN /: /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-1770) Illegal state exception in printTaskLog - sendError
[ https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556689#action_12556689 ] bien edited comment on HADOOP-1770 at 1/7/08 12:59 PM: This makes it difficult for me to view logs for tasks with a small amount of output (I have to log in to the machine in question). This is masking another exception, since the exception comes from a catch block. was (Author: bien): This makes it impossible for me to view logs for tasks with a small amount of output. This is actually masking another exception, since the exception comes from a catch block. Illegal state exception in printTaskLog - sendError Key: HADOOP-1770 URL: https://issues.apache.org/jira/browse/HADOOP-1770 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Michael Bieniosek This error shows up in my logs: 2007-08-23 16:40:08,028 WARN /: /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1770) Illegal state exception in printTaskLog - sendError
[ https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556690#action_12556690 ] Michael Bieniosek commented on HADOOP-1770: --- This corresponds with an error in the web ui: HTTP ERROR: 410 Failed to retrieve syslog log for task: task_200801020752_0383_m_00_0 Illegal state exception in printTaskLog - sendError Key: HADOOP-1770 URL: https://issues.apache.org/jira/browse/HADOOP-1770 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Michael Bieniosek This error shows up in my logs: 2007-08-23 16:40:08,028 WARN /: /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2538) NPE in TaskLog.java
NPE in TaskLog.java --- Key: HADOOP-2538 URL: https://issues.apache.org/jira/browse/HADOOP-2538 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.15.0 Reporter: Michael Bieniosek In the tasktracker web ui, if I go to /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE in the tasktracker log: 2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_ 0all=trueplaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j ava:44) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134 ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2068) [hbase] RESTful interface
[ https://issues.apache.org/jira/browse/HADOOP-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556797#action_12556797 ] Michael Bieniosek commented on HADOOP-2068: --- Hey Billy, Could you post your PHP class? I need to use hbase from a PHP client and was wondering I could start from yours. Thanks. [hbase] RESTful interface - Key: HADOOP-2068 URL: https://issues.apache.org/jira/browse/HADOOP-2068 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: stack Assignee: Bryan Duxbury Priority: Minor Fix For: 0.16.0 Attachments: rest-11-27-07-v2.patch, rest-11-27-07.3.patc, rest-11-27-07.patch, rest-11-28-07.2.patch, rest-11-28-07.3.patch, rest-11-28-07.patch, rest.patch A RESTful interface would be one means of making hbase accessible to clients that are not java. It might look something like the below: + An HTTP GET of http://MASTER:PORT/ outputs the master's attributes: online meta regions, list of tables, etc.: i.e. what you see now when you go to http://MASTER:PORT/master.jsp. + An HTTP GET of http://MASTER:PORT/TABLENAME: 200 if tables exists and HTableDescription (mimetype: text/plain or text/xml) or 401 if no such table. HTTP DELETE would drop the table. HTTP PUT would add one. + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW: 200 if row exists and 401 if not. + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNFAMILY: HColumnDescriptor (mimetype: text/plain or text/xml) or 401 if no such table. + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/: 200 and latest version (mimetype: binary/octet-stream) or 401 if no such cell. HTTP DELETE would delete the cell. HTTP PUT would add a new version. + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/TIMESTAMP: 200 (mimetype: binary/octet-stream) or 401 if no such cell. HTTP DELETE would remove. HTTP PUT would put this record. + Browser originally goes against master but master then redirects to the hosting region server to serve, update, delete, etc. the addressed cell -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2545) hbase rest server should be started with hbase-daemon.sh
hbase rest server should be started with hbase-daemon.sh Key: HADOOP-2545 URL: https://issues.apache.org/jira/browse/HADOOP-2545 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Michael Bieniosek Currently, the hbase rest server is started with the hbase script. But it should be started with the hbase-daemon script, which allows better configuration options. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2545) hbase rest server should be started with hbase-daemon.sh
[ https://issues.apache.org/jira/browse/HADOOP-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556806#action_12556806 ] Michael Bieniosek commented on HADOOP-2545: --- Currently I can't specify a config to the hbase rest servlet. The hbase command line script only looks at hbase-config.sh in the same directory as hbase. I'd like to be able to specify an arbitrary location for my hbase-env.sh and hbase-site.xml files. hbase rest server should be started with hbase-daemon.sh Key: HADOOP-2545 URL: https://issues.apache.org/jira/browse/HADOOP-2545 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Michael Bieniosek Priority: Minor Currently, the hbase rest server is started with the hbase script. But it should be started with the hbase-daemon script, which allows better configuration options. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.17
[ https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555757#action_12555757 ] Michael Bieniosek commented on HADOOP-2325: --- Do we believe that the licensing situation is likely to change in the near future? I think we just have to wait for Apple to do an official 1.6 release for OSX. There was a lot of speculation that they would release shortly after OSX10.5 came out, but then they didn't. Require Java 6 for release 0.17 --- Key: HADOOP-2325 URL: https://issues.apache.org/jira/browse/HADOOP-2325 Project: Hadoop Issue Type: Improvement Components: build Reporter: Doug Cutting Fix For: 0.17.0 We should require Java 6 for release 0.17. Java 6 is now available for OS/X. Hadoop performs much better on Java 6. And, finally, there are features of Java 6 (like 'df') that would be nice to use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2510) Map-Reduce 2.0
[ https://issues.apache.org/jira/browse/HADOOP-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555412#action_12555412 ] Michael Bieniosek commented on HADOOP-2510: --- A couple points: 1) the job client currently submits a job, then exits. This means that the machine where the job client runs does not need to be reliable (it could be my laptop, for example). I think this is a valuable feature. The JobManager you suggest cannot be run on an unreliable machine -- I think you mention this in brownie point #3. 2) One of our problems is that we have substantial amounts of per-job software that is installed via rpm. Our current solution is to create a job-private mapreduce cluster (not using HoD), install a bunch of software, then start the job. This won't work if a machine might be running tasks from multiple jobs simultaneously. This proposal doesn't seem to affect our ability to run private mapreduce clusters. But it does make it less useful for us. You suggest xen, which would let us configure per-task; that might work but it will increase the task overhead. Another possibility is allocating a machine-at-a-time to jobs, so we only have to configure the machine once per job. I'm not totally sure what the point is here -- it seems like you mainly want to separate the jobtracker's scheduling and monitoring functions. Is there a scaling problem with the jobtracker currently? You discuss the jobtracker being a single point of failure, but the namenode is already a more serious point of failure, since it is much more work to rebuild a namenode if it dies. Are you also trying to replace HoD? Map-Reduce 2.0 -- Key: HADOOP-2510 URL: https://issues.apache.org/jira/browse/HADOOP-2510 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Arun C Murthy We, at Yahoo!, have been using Hadoop-On-Demand as the resource provisioning/scheduling mechanism. With HoD the user uses a self-service system to ask-for a set of nodes. HoD allocates these from a global pool and also provisions a private Map-Reduce cluster for the user. She then runs her jobs and shuts the cluster down via HoD when done. All user-private clusters use the same humongous, static HDFS (e.g. 2k node HDFS). More details about HoD are available here: HADOOP-1301. h3. Motivation The current deployment (Hadoop + HoD) has a couple of implications: * _Non-optimal Cluster Utilization_ 1. Job-private Map-Reduce clusters imply that the user-cluster potentially could be *idle* for atleast a while before being detected and shut-down. 2. Elastic Jobs: Map-Reduce jobs, typically, have lots of maps with much-smaller no. of reduces; with maps being light and quick and reduces being i/o heavy and longer-running. Users typically allocate clusters depending on the no. of maps (i.e. input size) which leads to the scenario where all the maps are done (idle nodes in the cluster) and the few reduces are chugging along. Right now, we do not have the ability to shrink the HoD'ed Map-Reduce clusters which would alleviate this issue. * _Impact on data-locality_ With the current setup of a static, large HDFS and much smaller (5/10/20/50 node) clusters there is a good chance of losing one of Map-Reduce's primary features: ability to execute tasks on the datanodes where the input splits are located. In fact, we have seen the data-local tasks go down to 20-25 percent in the GridMix benchmarks, from the 95-98 percent we see on the randomwriter+sort runs run as part of the hadoopqa benchmarks (admittedly a synthetic benchmark, but yet). Admittedly, HADOOP-1985 (rack-aware Map-Reduce) helps significantly here. Primarily, the notion of *job-level scheduling* leading to private clusers, as opposed to *task-level scheduling*, is a good peg to hang-on the majority of the blame. Keeping the above factors in mind, here are some thoughts on how to re-structure Hadoop Map-Reduce to solve some of these issues. h3. State of the Art As it exists today, a large, static, Hadoop Map-Reduce cluster (forget HoD for a bit) does provide task-level scheduling; however as it exists today, it's scalability to tens-of-thousands of user-jobs, per-week, is in question. Lets review it's current architecture and main components: * JobTracker: It does both *task-scheduling* and *task-monitoring* (tasktrackers send task-statuses via periodic heartbeats), which implies it is fairly loaded. It is also a _single-point of failure_ in the Map-Reduce framework i.e. its failure implies that all the jobs in the system fail. This means a static, large Map-Reduce cluster is fairly susceptible and a definite suspect. Clearly HoD solves this by having per-job clusters, albeit with the
[jira] Commented: (HADOOP-1336) turn on speculative execution by defaul
[ https://issues.apache.org/jira/browse/HADOOP-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552577 ] Michael Bieniosek commented on HADOOP-1336: --- -1 I agree with Marco -- I don't think speculative execution is something the average user wants by default. turn on speculative execution by defaul --- Key: HADOOP-1336 URL: https://issues.apache.org/jira/browse/HADOOP-1336 Project: Hadoop Issue Type: Task Components: mapred Affects Versions: 0.12.3 Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: spec-exec.patch Now that speculative execution is working again, we should enable it by default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x
[ https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551998 ] Michael Bieniosek commented on HADOOP-1650: --- Any news on this? Upgrade Jetty to 6.x Key: HADOOP-1650 URL: https://issues.apache.org/jira/browse/HADOOP-1650 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Devaraj Das Assignee: Devaraj Das Attachments: hadoop-1650-jetty6.1.5.patch, hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 has fixed some of the issues we discovered in jetty during HADOOP-736 and HADOOP-1273. I'd like to keep this issue open for sometime so that we have enough time to test out things. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551061 ] Michael Bieniosek commented on HADOOP-2341: --- How about idle connections? Is that ok? Because datanode does not use select for io, each idle connection seems to consume a thread. This gets expensive, not necessarily the connections themselves. Datanode active connections never returns to 0 -- Key: HADOOP-2341 URL: https://issues.apache.org/jira/browse/HADOOP-2341 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Paul Saab Attachments: dfsclient.patch, hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt On trunk i continue to see the following in my data node logs: 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 42 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 41 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 40 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 39 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 38 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 37 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 36 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 35 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 34 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 33 This number never returns to 0, even after many hours of no new data being manipulated or added into the DFS. Looking at netstat -tn i see significant amount of data in the send-q that never goes away: tcp0 34240 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:55792 ESTABLISHED tcp0 38968 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:38169 ESTABLISHED tcp0 38456 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:35456 ESTABLISHED tcp0 29640 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:59845 ESTABLISHED tcp0 50168 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:44584 ESTABLISHED When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window size of 0 16:11:41.760474 IP XX.XX.XX.XXX.50010 YY.YY.YY.YY.44584: . ack 3339984123 win 46 nop,nop,timestamp 1786247180 885681789 16:11:41.761597 IP YY.YY.YY.YY.44584 XX.XX.XX.XXX.50010: . ack 1 win 0 nop,nop,timestamp 885801786 1775711351 Then we look at the stack traces on each datanode, I will have tons of threads that *never* go away in the following trace: {code} Thread 6516 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) java.lang.Thread.run(Thread.java:619) {code} Unfortunately there's very little in the logs with exceptions that could point to this. I have some exceptions the following, but nothing that points to problems between XX and YY: {code} 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to delete block blk_4515246476002110310. Block not found in blockMap. 2007-12-02 11:19:47,922 WARN dfs.DataNode - java.io.IOException: Error in deleting blocks. at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750) at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675) at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551072 ] Michael Bieniosek commented on HADOOP-2341: --- Well, hbase does use select, so it doesn't consume threads to have idle connections. Datanode active connections never returns to 0 -- Key: HADOOP-2341 URL: https://issues.apache.org/jira/browse/HADOOP-2341 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Paul Saab Attachments: dfsclient.patch, hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt On trunk i continue to see the following in my data node logs: 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 42 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 41 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 40 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 39 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 38 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 37 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 36 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 35 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 34 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 33 This number never returns to 0, even after many hours of no new data being manipulated or added into the DFS. Looking at netstat -tn i see significant amount of data in the send-q that never goes away: tcp0 34240 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:55792 ESTABLISHED tcp0 38968 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:38169 ESTABLISHED tcp0 38456 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:35456 ESTABLISHED tcp0 29640 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:59845 ESTABLISHED tcp0 50168 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:44584 ESTABLISHED When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window size of 0 16:11:41.760474 IP XX.XX.XX.XXX.50010 YY.YY.YY.YY.44584: . ack 3339984123 win 46 nop,nop,timestamp 1786247180 885681789 16:11:41.761597 IP YY.YY.YY.YY.44584 XX.XX.XX.XXX.50010: . ack 1 win 0 nop,nop,timestamp 885801786 1775711351 Then we look at the stack traces on each datanode, I will have tons of threads that *never* go away in the following trace: {code} Thread 6516 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) java.lang.Thread.run(Thread.java:619) {code} Unfortunately there's very little in the logs with exceptions that could point to this. I have some exceptions the following, but nothing that points to problems between XX and YY: {code} 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to delete block blk_4515246476002110310. Block not found in blockMap. 2007-12-02 11:19:47,922 WARN dfs.DataNode - java.io.IOException: Error in deleting blocks. at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750) at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675) at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2396) NPE in HMaster.cancelLease
NPE in HMaster.cancelLease -- Key: HADOOP-2396 URL: https://issues.apache.org/jira/browse/HADOOP-2396 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor When I shut down the master, one regionserver fails to notify the master that it shut down: 2007-12-10 19:59:17,080 WARN org.apache.hadoop.hbase.HRegionServer: Failed to send exiting message to master: java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.HMaster.cancelLease(HMaster.java:1463) at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1331) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHan dler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler. java:48) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:863) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2400) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly
Where hbase/mapreduce have analogous configuration parameters, they should be named similarly - Key: HADOOP-2400 URL: https://issues.apache.org/jira/browse/HADOOP-2400 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor mapreduce has a configuration property called mapred.system.dir which determines where in the DFS a jobtracker stores its data. Similarly, hbase has a configuration property called hbase.rootdir which does something very similar. These should have the same name, eg. hbase.system.dir or mapred.rootdir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2400) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly
[ https://issues.apache.org/jira/browse/HADOOP-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2400: -- Description: mapreduce has a configuration property called mapred.system.dir which determines where in the DFS a jobtracker stores its data. Similarly, hbase has a configuration property called hbase.rootdir which does something very similar. These should have the same name, eg. hbase.system.dir and mapred.system.dir was: mapreduce has a configuration property called mapred.system.dir which determines where in the DFS a jobtracker stores its data. Similarly, hbase has a configuration property called hbase.rootdir which does something very similar. These should have the same name, eg. hbase.system.dir or mapred.rootdir. Where hbase/mapreduce have analogous configuration parameters, they should be named similarly - Key: HADOOP-2400 URL: https://issues.apache.org/jira/browse/HADOOP-2400 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor mapreduce has a configuration property called mapred.system.dir which determines where in the DFS a jobtracker stores its data. Similarly, hbase has a configuration property called hbase.rootdir which does something very similar. These should have the same name, eg. hbase.system.dir and mapred.system.dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.16.
[ https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549511 ] Michael Bieniosek commented on HADOOP-2325: --- Because java 6 (compiler + runtime) is not available under acceptable license on all platforms, I think references to functions like File.getFreeSpace should be done in a way that is backwards compatible with java 5. I suggest you put all the references to java 6 functions in an optional jar, with fallback methods for java 5 users. Require Java 6 for release 0.16. Key: HADOOP-2325 URL: https://issues.apache.org/jira/browse/HADOOP-2325 Project: Hadoop Issue Type: Improvement Components: build Reporter: Doug Cutting Fix For: 0.16.0 We should require Java 6 for release 0.16. Java 6 is now available for OS/X. Hadoop performs much better on Java 6. And, finally, there are features of Java 6 (like 'df') that would be nice to use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549128 ] Michael Bieniosek commented on HADOOP-2341: --- I've noticed that the regionserver is using select for its io, which doesn't tie up threads, whereas the datanode seems to have one thread open/blocked for each write. Is this intentional? Shouldn't they be using the same mechanism? Datanode active connections never returns to 0 -- Key: HADOOP-2341 URL: https://issues.apache.org/jira/browse/HADOOP-2341 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Paul Saab Attachments: hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt On trunk i continue to see the following in my data node logs: 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 42 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 41 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 40 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 39 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 38 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 37 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 36 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 35 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 34 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 33 This number never returns to 0, even after many hours of no new data being manipulated or added into the DFS. Looking at netstat -tn i see significant amount of data in the send-q that never goes away: tcp0 34240 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:55792 ESTABLISHED tcp0 38968 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:38169 ESTABLISHED tcp0 38456 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:35456 ESTABLISHED tcp0 29640 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:59845 ESTABLISHED tcp0 50168 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:44584 ESTABLISHED When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window size of 0 16:11:41.760474 IP XX.XX.XX.XXX.50010 YY.YY.YY.YY.44584: . ack 3339984123 win 46 nop,nop,timestamp 1786247180 885681789 16:11:41.761597 IP YY.YY.YY.YY.44584 XX.XX.XX.XXX.50010: . ack 1 win 0 nop,nop,timestamp 885801786 1775711351 Then we look at the stack traces on each datanode, I will have tons of threads that *never* go away in the following trace: {code} Thread 6516 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) java.lang.Thread.run(Thread.java:619) {code} Unfortunately there's very little in the logs with exceptions that could point to this. I have some exceptions the following, but nothing that points to problems between XX and YY: {code} 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to delete block blk_4515246476002110310. Block not found in blockMap. 2007-12-02 11:19:47,922 WARN dfs.DataNode - java.io.IOException: Error in deleting blocks. at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750) at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675) at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2350: -- Priority: Critical (was: Major) Bump priority because it is a correctness issue hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Assignee: stack Priority: Critical Fix For: 0.16.0 Attachments: TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()
when hbase regionserver restarts, it says impossible state for createLease() -- Key: HADOOP-2364 URL: https://issues.apache.org/jira/browse/HADOOP-2364 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor I restarted a regionserver, and got this error in its logs: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.AssertionError: Impossible state for createLease(): Lease -435227488/-435227488 is still held. at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145) at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278 ) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy0.regionServerStartup(Unknown Source) at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav a:1025) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548343 ] Michael Bieniosek commented on HADOOP-2341: --- It seems like these socket reads/writes should time out eventually. But the 70 threads on my datanode are still waiting on write. Datanode active connections never returns to 0 -- Key: HADOOP-2341 URL: https://issues.apache.org/jira/browse/HADOOP-2341 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Paul Saab Attachments: hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt On trunk i continue to see the following in my data node logs: 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 42 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 41 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 40 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 39 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 38 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 37 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 36 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 35 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 34 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 33 This number never returns to 0, even after many hours of no new data being manipulated or added into the DFS. Looking at netstat -tn i see significant amount of data in the send-q that never goes away: tcp0 34240 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:55792 ESTABLISHED tcp0 38968 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:38169 ESTABLISHED tcp0 38456 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:35456 ESTABLISHED tcp0 29640 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:59845 ESTABLISHED tcp0 50168 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:44584 ESTABLISHED When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window size of 0 16:11:41.760474 IP XX.XX.XX.XXX.50010 YY.YY.YY.YY.44584: . ack 3339984123 win 46 nop,nop,timestamp 1786247180 885681789 16:11:41.761597 IP YY.YY.YY.YY.44584 XX.XX.XX.XXX.50010: . ack 1 win 0 nop,nop,timestamp 885801786 1775711351 Then we look at the stack traces on each datanode, I will have tons of threads that *never* go away in the following trace: {code} Thread 6516 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) java.lang.Thread.run(Thread.java:619) {code} Unfortunately there's very little in the logs with exceptions that could point to this. I have some exceptions the following, but nothing that points to problems between XX and YY: {code} 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to delete block blk_4515246476002110310. Block not found in blockMap. 2007-12-02 11:19:47,922 WARN dfs.DataNode - java.io.IOException: Error in deleting blocks. at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750) at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675) at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.16.
[ https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548403 ] Michael Bieniosek commented on HADOOP-2325: --- That java6 requires agreement to the java research license, which is unacceptable for commercial use. http://java.net/jrl.csp Require Java 6 for release 0.16. Key: HADOOP-2325 URL: https://issues.apache.org/jira/browse/HADOOP-2325 Project: Hadoop Issue Type: Improvement Components: build Reporter: Doug Cutting Fix For: 0.16.0 We should require Java 6 for release 0.16. Java 6 is now available for OS/X. Hadoop performs much better on Java 6. And, finally, there are features of Java 6 (like 'df') that would be nice to use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.16.
[ https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548405 ] Michael Bieniosek commented on HADOOP-2325: --- To quote from the bikemonkey link: {quote} Licensing The Mac OS X work is based heavily on the BSD Java port, which is licensed under the JRL. The BSDs develop Java under the JRL; FreeBSD has negotiated a license with Sun to distribute FreeBSD Java binaries based on the JRL sources. As the Mac port stabilizes, I am merging my work upstream into the BSD port, and in turn, it is a goal of the FreeBSD Java project to merge their work into OpenJDK. I've signed a Sun Contributor Agreement in preparation for this, and an OpenJDK Porters group has been proposed: http://thread.gmane.org/gmane.comp.java.openjdk.general/630 While the JRL makes this initial port possible, OpenJDK's GPLv2+CE licensing makes development and distribution far simpler. I hope to contribute this work to OpenJDK as soon as is feasible. {quote} So, while IANAL, it appears this can only be used under JRL. But JRL only permits Research Use, defining it as: {quote} Research Use means research, evaluation, or development for the purpose of advancing knowledge, teaching, learning, or customizing the Technology or Modifications for personal use. Research Use expressly excludes use or distribution for direct or indirect commercial (including strategic) gain or advantage. {quote} So I suspect that running hadoop client libraries at a commercial search engine does not fall under Research Use. Require Java 6 for release 0.16. Key: HADOOP-2325 URL: https://issues.apache.org/jira/browse/HADOOP-2325 Project: Hadoop Issue Type: Improvement Components: build Reporter: Doug Cutting Fix For: 0.16.0 We should require Java 6 for release 0.16. Java 6 is now available for OS/X. Hadoop performs much better on Java 6. And, finally, there are features of Java 6 (like 'df') that would be nice to use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2325) Require Java 6 for release 0.16.
[ https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548405 ] bien edited comment on HADOOP-2325 at 12/4/07 1:02 PM: To quote from the bikemonkey link: {quote} By downloading these binaries, you certify that you are a Licensee in good standing under the Java Research License of the Java 2 SDK, and that your access, use, and distribution of code and information you may obtain at this site is subject to the License. Please review the license at http://java.net/jrl.csp, and submit your license acceptance to Sun. {quote} {quote} Licensing The Mac OS X work is based heavily on the BSD Java port, which is licensed under the JRL. The BSDs develop Java under the JRL; FreeBSD has negotiated a license with Sun to distribute FreeBSD Java binaries based on the JRL sources. As the Mac port stabilizes, I am merging my work upstream into the BSD port, and in turn, it is a goal of the FreeBSD Java project to merge their work into OpenJDK. I've signed a Sun Contributor Agreement in preparation for this, and an OpenJDK Porters group has been proposed: http://thread.gmane.org/gmane.comp.java.openjdk.general/630 While the JRL makes this initial port possible, OpenJDK's GPLv2+CE licensing makes development and distribution far simpler. I hope to contribute this work to OpenJDK as soon as is feasible. {quote} So, while IANAL, it appears this can only be used under JRL. But JRL only permits Research Use, defining it as: {quote} Research Use means research, evaluation, or development for the purpose of advancing knowledge, teaching, learning, or customizing the Technology or Modifications for personal use. Research Use expressly excludes use or distribution for direct or indirect commercial (including strategic) gain or advantage. {quote} So I suspect that running hadoop client libraries at a commercial search engine does not fall under Research Use. was (Author: bien): To quote from the bikemonkey link: {quote} Licensing The Mac OS X work is based heavily on the BSD Java port, which is licensed under the JRL. The BSDs develop Java under the JRL; FreeBSD has negotiated a license with Sun to distribute FreeBSD Java binaries based on the JRL sources. As the Mac port stabilizes, I am merging my work upstream into the BSD port, and in turn, it is a goal of the FreeBSD Java project to merge their work into OpenJDK. I've signed a Sun Contributor Agreement in preparation for this, and an OpenJDK Porters group has been proposed: http://thread.gmane.org/gmane.comp.java.openjdk.general/630 While the JRL makes this initial port possible, OpenJDK's GPLv2+CE licensing makes development and distribution far simpler. I hope to contribute this work to OpenJDK as soon as is feasible. {quote} So, while IANAL, it appears this can only be used under JRL. But JRL only permits Research Use, defining it as: {quote} Research Use means research, evaluation, or development for the purpose of advancing knowledge, teaching, learning, or customizing the Technology or Modifications for personal use. Research Use expressly excludes use or distribution for direct or indirect commercial (including strategic) gain or advantage. {quote} So I suspect that running hadoop client libraries at a commercial search engine does not fall under Research Use. Require Java 6 for release 0.16. Key: HADOOP-2325 URL: https://issues.apache.org/jira/browse/HADOOP-2325 Project: Hadoop Issue Type: Improvement Components: build Reporter: Doug Cutting Fix For: 0.16.0 We should require Java 6 for release 0.16. Java 6 is now available for OS/X. Hadoop performs much better on Java 6. And, finally, there are features of Java 6 (like 'df') that would be nice to use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2350: -- Attachment: TestScannerAPI.java Here's a test case which illustrates the problem hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548506 ] Michael Bieniosek commented on HADOOP-2350: --- A secondary problem is that the HScannerInterface.iterator().next() sometimes returns a Map.Entry with a null value for both the key and value. I think this may have something to do with the health of my cluster. hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2350: -- Attachment: TestScannerAPI.java Oops, re-upload without hardcoded hostname. hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: TestScannerAPI.java, TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548507 ] Michael Bieniosek commented on HADOOP-2350: --- This was not a problem in release 0.15; it has only occurred since we moved to trunk. hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: TestScannerAPI.java, TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2350: -- Attachment: (was: TestScannerAPI.java) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Fix For: 0.16.0 Attachments: TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548092 ] Michael Bieniosek commented on HADOOP-2341: --- I am seeing this too. If I look at datanode:50075/stacks, I see 70 threads stuck with: Thread 3023977 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(Unknown Source) java.net.SocketOutputStream.write(Unknown Source) java.io.BufferedOutputStream.flushBuffer(Unknown Source) java.io.BufferedOutputStream.write(Unknown Source) java.io.DataOutputStream.write(Unknown Source) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:850) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801) java.lang.Thread.run(Unknown Source) Datanode active connections never returns to 0 -- Key: HADOOP-2341 URL: https://issues.apache.org/jira/browse/HADOOP-2341 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Paul Saab On trunk i continue to see the following in my data node logs: 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 42 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 41 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 40 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 39 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 38 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 37 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 36 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 35 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 34 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 33 This number never returns to 0, even after many hours of no new data being manipulated or added into the DFS. Looking at netstat -tn i see significant amount of data in the send-q that never goes away: tcp0 34240 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:55792 ESTABLISHED tcp0 38968 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:38169 ESTABLISHED tcp0 38456 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:35456 ESTABLISHED tcp0 29640 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:59845 ESTABLISHED tcp0 50168 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:44584 ESTABLISHED When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window size of 0 16:11:41.760474 IP XX.XX.XX.XXX.50010 YY.YY.YY.YY.44584: . ack 3339984123 win 46 nop,nop,timestamp 1786247180 885681789 16:11:41.761597 IP YY.YY.YY.YY.44584 XX.XX.XX.XXX.50010: . ack 1 win 0 nop,nop,timestamp 885801786 1775711351 Then we look at the stack traces on each datanode, I will have tons of threads that *never* go away in the following trace: {code} Thread 6516 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) java.lang.Thread.run(Thread.java:619) {code} Unfortunately there's very little in the logs with exceptions that could point to this. I have some exceptions the following, but nothing that points to problems between XX and YY: {code} 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to delete block blk_4515246476002110310. Block not found in blockMap. 2007-12-02 11:19:47,922 WARN dfs.DataNode - java.io.IOException: Error in deleting blocks.
[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0
[ https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548101 ] Michael Bieniosek commented on HADOOP-2341: --- In DataNode.BlockSender.sendBlock, we have {code} while (endOffset offset) { // Write one data chunk per loop. long len = sendChunk(); offset += len; totalRead += len + checksumSize; } {code} In the BlockSender constructor, we have {code} if (length = 0) { // Make sure endOffset points to end of a checksumed chunk. long tmpLen = startOffset + length + (startOffset - offset); if (tmpLen % bytesPerChecksum != 0) { tmpLen += (bytesPerChecksum - tmpLen % bytesPerChecksum); } if (tmpLen endOffset) { endOffset = tmpLen; } } {code} So in some cases, endOffset can include extra bytes for checksums in the constructor. However, checksum bytes are never added to offset when it is compared to endOffset in the sendBlock method. I believe this may be the problem. Datanode active connections never returns to 0 -- Key: HADOOP-2341 URL: https://issues.apache.org/jira/browse/HADOOP-2341 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Paul Saab On trunk i continue to see the following in my data node logs: 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 42 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 41 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 40 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 39 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 38 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 37 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 36 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 35 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 34 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of active connections is: 33 This number never returns to 0, even after many hours of no new data being manipulated or added into the DFS. Looking at netstat -tn i see significant amount of data in the send-q that never goes away: tcp0 34240 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:55792 ESTABLISHED tcp0 38968 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:38169 ESTABLISHED tcp0 38456 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:35456 ESTABLISHED tcp0 29640 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:59845 ESTABLISHED tcp0 50168 :::XX.XX.XX.XXX:50010 :::YY.YY.YY.YY:44584 ESTABLISHED When sniffing the network I see that the remote side (YY.YY.YY.YY) is returning a window size of 0 16:11:41.760474 IP XX.XX.XX.XXX.50010 YY.YY.YY.YY.44584: . ack 3339984123 win 46 nop,nop,timestamp 1786247180 885681789 16:11:41.761597 IP YY.YY.YY.YY.44584 XX.XX.XX.XXX.50010: . ack 1 win 0 nop,nop,timestamp 885801786 1775711351 Then we look at the stack traces on each datanode, I will have tons of threads that *never* go away in the following trace: {code} Thread 6516 ([EMAIL PROTECTED]): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) java.io.DataOutputStream.write(DataOutputStream.java:90) org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400) org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433) org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904) org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849) java.lang.Thread.run(Thread.java:619) {code} Unfortunately there's very little in the logs with exceptions that could point to this. I have some exceptions the following, but nothing that points to problems between XX and YY: {code} 2007-12-02 11:19:47,889 WARN dfs.DataNode - Unexpected error trying to delete block blk_4515246476002110310. Block not found in blockMap. 2007-12-02 11:19:47,922 WARN dfs.DataNode -
[jira] Updated: (HADOOP-2297) [Hbase Shell] System.exit() Handling in Jar command
[ https://issues.apache.org/jira/browse/HADOOP-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2297: -- Attachment: Capture.java Hey Edward, I think I figured out how to suppress System.exit and capture stdout/stderr. Here, System.exit throws a SecurityException, which can be caught above. In this case, I catch in Thread.UncaughtExceptionHandler since I am running misbehaved threads. I also wrote a class that captures stdout/stderr. Since java only lets me set one printStream to capture stdout per jvm, I have to check Thread.currentThread, and then decide where to write the captured output. I am hoping to incorporate some of this code into my custom jetty server that submits hadoop jobs. [Hbase Shell] System.exit() Handling in Jar command --- Key: HADOOP-2297 URL: https://issues.apache.org/jira/browse/HADOOP-2297 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.15.0 Reporter: Edward Yoon Assignee: Edward Yoon Fix For: 0.16.0 Attachments: 2297_v02.patch, 2297_v03.patch, Capture.java I'd like to block the exitVM by System.exit(). Shell should terminate by quit command. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1905) Addition of unix ls command to FS command
[ https://issues.apache.org/jira/browse/HADOOP-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546328 ] Michael Bieniosek commented on HADOOP-1905: --- I agree with Enis. I think ftp lets you do {code} ! ls {code} And inside gdb you can do {code} shell ls {code} Addition of unix ls command to FS command - Key: HADOOP-1905 URL: https://issues.apache.org/jira/browse/HADOOP-1905 Project: Hadoop Issue Type: Sub-task Components: contrib/hbase Affects Versions: 0.14.1 Environment: All environments Reporter: Edward Yoon Priority: Minor Attachments: shell_fs.patch, shell_fs_v01.patch I think ls command would be a useful one. {code} Hbase fs; FSFilesystem commands Syntax: Hadoop FsShell operations DFS [-option] arguments...; Unix ls command LS [-option] arguments...; Hbase dfs; Usage: java FsShell [-ls path] [-lsr path] ... Hbase ls; bin buildbuild.xmlCHANGES.txt conf docs index.html lib LICENSE.txt NOTICE.txt output README.txt src ... Hbase ls -a ./conf; .svncommons-logging.properties configuration.xsl hadoop-default.xml hadoop-env.sh hadoop-env.sh.template ... ... Hbase ls -l ./build; rwd 0 Sep 10, 2007 11:05 AM ant rw- 6662Sep 10, 2007 2:35 PMant-hadoop-0.15.0-dev.jar rwd 0 Sep 5, 2007 10:05 AMc++ rwd 0 Sep 6, 2007 2:15 PM classes rwd 0 Sep 5, 2007 10:07 AMcontrib rwd 0 Sep 17, 2007 9:17 AMdocs ... {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2266) Provide a command line option to check if a Hadoop jobtracker is idle
[ https://issues.apache.org/jira/browse/HADOOP-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545943 ] Michael Bieniosek commented on HADOOP-2266: --- It's possible to do this through the JobClient API currently. Provide a command line option to check if a Hadoop jobtracker is idle - Key: HADOOP-2266 URL: https://issues.apache.org/jira/browse/HADOOP-2266 Project: Hadoop Issue Type: New Feature Components: mapred Reporter: Hemanth Yamijala Fix For: 0.16.0 This is an RFE for providing a way to determine from the hadoop command line whether a jobtracker is idle. One possibility is to have something like hadoop jobtracker -idle time. Hadoop would return true (maybe via some stdout output) if the jobtracker had no work to do (jobs running / prepared) since time seconds, false otherwise. This would be useful for management / provisioning systems like Hadoop-On-Demand [HADOOP-1301], which can then deallocate the idle, provisioned clusters automatically, and release resources. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2292) HBaseAdmin.disableTable/enableTable aren't synchronous
HBaseAdmin.disableTable/enableTable aren't synchronous -- Key: HADOOP-2292 URL: https://issues.apache.org/jira/browse/HADOOP-2292 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Reporter: Michael Bieniosek I'm trying to programmatically add a column family to a table. I have code that looks like: code admin.disableTable(table); try { admin.addColumn(table, new HColumnDescriptor(columnName)); } finally { admin.enableTable(table); } HTable ht = new HTable(config, table); code Two things sometimes go wrong here: 1. addColumn fails because the table is not disabled 2. new HTable() fails because the table is not enabled I suspect that the enableTable/disableTable calls are not synchronous, ie. they return before they are finished. I can work around this problem by inserting Thread.sleeps after the enableTable and disableTable calls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2261) [hbase] Change abort to finalize; does nothing if commit ran successfully
[ https://issues.apache.org/jira/browse/HADOOP-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546006 ] Michael Bieniosek commented on HADOOP-2261: --- No, I don't think you understand my complaint. The problem with the current API is that I have to explicitly catch all the exceptions that my code can throw, so I can catch them and call abort(). It is error-prone to list all the exceptions my code could possibly throw and attempt to catch and rethrow them. For example, it is easy to forget RuntimeException and Error. You can't just catch Throwable, because then you have to rethrow a Throwable, which changes the method's exception signature. When I wrote code against hibernate's API, (as far as I remember) I did something like pre Transaction transaction = session.beginTransaction(); try { ... transaction.commit(); } finally { transaction.rollback(); } /pre This had the nice property that calling rollback() after commit() was a no-op. This is what I'd like to see in hbase. [hbase] Change abort to finalize; does nothing if commit ran successfully - Key: HADOOP-2261 URL: https://issues.apache.org/jira/browse/HADOOP-2261 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt From Michael Bieniosek: {code}I'm trying to do an update row, so I write code like: long lockid = table.startUpdate(new Text(article.getName())); try { for (File articleInfo: article.listFiles(new NonDirectories())) { articleTable.put(lockid, columnName(articleInfo.getName()), readFile(articleInfo)); } table.commit(lockid); } finally { table.abort(lockid); } This doesn't work, because in the normal case it calls abort after commit. But I'm not sure what the code should be, eg.: long lockid = table.startUpdate(new Text(article.getName())); try { for (File articleInfo: article.listFiles(new NonDirectories())) { articleTable.put(lockid, columnName(articleInfo.getName()), readFile(articleInfo)); } table.commit(lockid); } catch (IOException e) { table.abort(lockid); throw e; } catch (RuntimeException e) { table.abort(lockid); throw e; } This gets unwieldy very quickly. Could you maybe change abort() to finalize() which either aborts or does nothing if a commit was successful? {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2296) hbase shell: phantom columns show up from select command
hbase shell: phantom columns show up from select command Key: HADOOP-2296 URL: https://issues.apache.org/jira/browse/HADOOP-2296 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Reporter: Michael Bieniosek {code} Hbase select * from hbase_test where row='2'; +--+--+ | Column | Cell | +--+--+ | test:a | a| +--+--+ | test:c | c| +--+--+ 2 row(s) in set (0.00 sec) Hbase select * from hbase_test where row='1'; +--+--+ | Column | Cell | +--+--+ | test:a | a| +--+--+ | test:b | b| +--+--+ 2 row(s) in set (0.00 sec) Hbase select * from hbase_test; +-+-+-+ | Row | Column | Cell| +-+-+-+ | 1 | test:a | a | +-+-+-+ | 1 | test:b | b | +-+-+-+ | 2 | test:a | a | +-+-+-+ | 2 | test:b | b | +-+-+-+ | 2 | test:c | c | +-+-+-+ 5 row(s) in set (0.14 sec) {code} Note the phantom value for test:b in row 2. I looked at the code, and it looks like SelectCommand.scanPrint incorrectly fails to call results.clear() every time it calls scan.next(). However, I also think that the HScannerInterface.next(HStoreKey key, SortedMapText,byte[] results) is confusing, since it requires the user to call results.clear() and key.clear() before calling next each time. Since the Iterable interface that provides the zero-arg next has been added, I suggest that it might be worthwhile to deprecate the two-arg next. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2261) [hbase] Change abort to finalize; does nothing if commit ran successfully
[ https://issues.apache.org/jira/browse/HADOOP-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545614 ] Michael Bieniosek commented on HADOOP-2261: --- The problem with your suggestion is that the enclosing function then needs to throw Exception. [hbase] Change abort to finalize; does nothing if commit ran successfully - Key: HADOOP-2261 URL: https://issues.apache.org/jira/browse/HADOOP-2261 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman From Michael Bieniosek: {code}I'm trying to do an update row, so I write code like: long lockid = table.startUpdate(new Text(article.getName())); try { for (File articleInfo: article.listFiles(new NonDirectories())) { articleTable.put(lockid, columnName(articleInfo.getName()), readFile(articleInfo)); } table.commit(lockid); } finally { table.abort(lockid); } This doesn't work, because in the normal case it calls abort after commit. But I'm not sure what the code should be, eg.: long lockid = table.startUpdate(new Text(article.getName())); try { for (File articleInfo: article.listFiles(new NonDirectories())) { articleTable.put(lockid, columnName(articleInfo.getName()), readFile(articleInfo)); } table.commit(lockid); } catch (IOException e) { table.abort(lockid); throw e; } catch (RuntimeException e) { table.abort(lockid); throw e; } This gets unwieldy very quickly. Could you maybe change abort() to finalize() which either aborts or does nothing if a commit was successful? {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2262) [hbase] Updating a row on non-existent table runs all the retries and timeouts instead of failing fast
[ https://issues.apache.org/jira/browse/HADOOP-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545674 ] Michael Bieniosek commented on HADOOP-2262: --- Yes, but it contacts several machines and takes a couple minutes to throw the TableNotFoundException. [hbase] Updating a row on non-existent table runs all the retries and timeouts instead of failing fast -- Key: HADOOP-2262 URL: https://issues.apache.org/jira/browse/HADOOP-2262 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman Priority: Minor If you try to access row in non-existent table, the client hangs waiting on all timeouts and retries. Rather it should be able to fail fast if no such table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x
[ https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543302 ] Michael Bieniosek commented on HADOOP-1650: --- Test failed because hudson used the old jetty jar Upgrade Jetty to 6.x Key: HADOOP-1650 URL: https://issues.apache.org/jira/browse/HADOOP-1650 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Devaraj Das Assignee: Devaraj Das Attachments: hadoop-1650-jetty6.1.5.patch, hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 has fixed some of the issues we discovered in jetty during HADOOP-736 and HADOOP-1273. I'd like to keep this issue open for sometime so that we have enough time to test out things. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2218) Generalize StatusHttpServer so webdav server can use it
Generalize StatusHttpServer so webdav server can use it --- Key: HADOOP-2218 URL: https://issues.apache.org/jira/browse/HADOOP-2218 Project: Hadoop Issue Type: New Feature Components: fs Affects Versions: 0.15.0 Reporter: Michael Bieniosek I'd like to make HADOOP-496 stand alone, so that I can make a hadoop-webdav jar that works against stock hadoop. The latest HADOOP-496 patch has only a small patch against StatusHttpServer, which generalizes it a little bit to make some private methods protected and changes HttpServlet to Servlet -- the rest is new files. I'd like to get the part against StatusHttpServer committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2218) Generalize StatusHttpServer so webdav server can use it
[ https://issues.apache.org/jira/browse/HADOOP-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek resolved HADOOP-2218. --- Resolution: Won't Fix Actually, now that I look at it, it's easier just to rewrite the webdav code so it doesn't use StatusHttpServer and calls the jetty api directly. I think a better fix would be to rewrite StatusHttpServer so that it uses a JettyWrapper superclass, and doesn't contain any mapreduce-specific code. Generalize StatusHttpServer so webdav server can use it --- Key: HADOOP-2218 URL: https://issues.apache.org/jira/browse/HADOOP-2218 Project: Hadoop Issue Type: New Feature Components: fs Affects Versions: 0.15.0 Reporter: Michael Bieniosek I'd like to make HADOOP-496 stand alone, so that I can make a hadoop-webdav jar that works against stock hadoop. The latest HADOOP-496 patch has only a small patch against StatusHttpServer, which generalizes it a little bit to make some private methods protected and changes HttpServlet to Servlet -- the rest is new files. I'd like to get the part against StatusHttpServer committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2024) Make StatusHttpServer (usefully) subclassable
[ https://issues.apache.org/jira/browse/HADOOP-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543176 ] Michael Bieniosek commented on HADOOP-2024: --- See also the StatusHttpServer portion of the webdav_wip2 patch on HADOOP-496 for more things that can be generalized. Also, the constructor should probably not add the TaskGraphServlet. Make StatusHttpServer (usefully) subclassable - Key: HADOOP-2024 URL: https://issues.apache.org/jira/browse/HADOOP-2024 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: stack Priority: Minor Attachments: statushttpserver.patch hbase puts up webapps modelled on those deployed by dfs and mapreduce. Currently it does this by copying the bulk of StatusHttpServer down to hbase util as a new class named InfoServer. StatusHttpServer is copied rather than subclassed because I need access to the currently-private resource loading. As is, understandably, all webapp-related resources are presumed under the first 'webapps' directory found. It doesn't allow for the new condition where some resources can be found in hadoop and then others in hbase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-496: - Attachment: hadoop-496-4.patch I deleted some unused code, I removed some excess logging, removed the usage of StatusHttpServer so the patch no longer modifies core hadoop code, and I added a separate start script that can take the namenode as a command-line argument to the server. I also moved the webdav package to org.apache.hadoop.fs.webdav. Currently writing to the DFS does not work, but I can browse and copy files out of the DFS (with Mac OSX webdav mount). I think this could become a separate (small) contrib project, since hadoop proper does not rely on it. Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-496-3.patch, hadoop-496-4.patch, hadoop-496-spool-cleanup.patch, hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1650) Upgrade Jetty to 6.x
[ https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1650: -- Status: Patch Available (was: Open) Upgrade Jetty to 6.x Key: HADOOP-1650 URL: https://issues.apache.org/jira/browse/HADOOP-1650 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Devaraj Das Assignee: Devaraj Das Attachments: hadoop-1650-jetty6.1.5.patch, hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 has fixed some of the issues we discovered in jetty during HADOOP-736 and HADOOP-1273. I'd like to keep this issue open for sometime so that we have enough time to test out things. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1650) Upgrade Jetty to 6.x
[ https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1650: -- Attachment: hadoop-1650-jetty6.1.5.patch Updated to trunk and jetty-6.1.5. Did some cleanup based on findbugs/javac warnings in previous patch. Upgrade Jetty to 6.x Key: HADOOP-1650 URL: https://issues.apache.org/jira/browse/HADOOP-1650 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Devaraj Das Assignee: Devaraj Das Attachments: hadoop-1650-jetty6.1.5.patch, hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 has fixed some of the issues we discovered in jetty during HADOOP-736 and HADOOP-1273. I'd like to keep this issue open for sometime so that we have enough time to test out things. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x
[ https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543242 ] Michael Bieniosek commented on HADOOP-1650: --- I didn't port the hbase code in my latest patch. Upgrade Jetty to 6.x Key: HADOOP-1650 URL: https://issues.apache.org/jira/browse/HADOOP-1650 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Devaraj Das Assignee: Devaraj Das Attachments: hadoop-1650-jetty6.1.5.patch, hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 has fixed some of the issues we discovered in jetty during HADOOP-736 and HADOOP-1273. I'd like to keep this issue open for sometime so that we have enough time to test out things. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2212) java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open
[ https://issues.apache.org/jira/browse/HADOOP-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2212: -- Attachment: hadoop-2212.patch java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open --- Key: HADOOP-2212 URL: https://issues.apache.org/jira/browse/HADOOP-2212 Project: Hadoop Issue Type: Bug Components: fs Affects Versions: 0.15.0 Reporter: Michael Bieniosek Priority: Critical Attachments: hadoop-2212.patch The ChecksumFileSystem uses a default bytesPerChecksum value of zero. This number appears as a divisor in ChecksumFileSystem.getSumBufferSize, if it is not overriden in config. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2212) java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open
[ https://issues.apache.org/jira/browse/HADOOP-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2212: -- Status: Patch Available (was: Open) Here is a patch that applies against the 0.15.0 release. java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open --- Key: HADOOP-2212 URL: https://issues.apache.org/jira/browse/HADOOP-2212 Project: Hadoop Issue Type: Bug Components: fs Affects Versions: 0.15.0 Reporter: Michael Bieniosek Priority: Critical Attachments: hadoop-2212.patch The ChecksumFileSystem uses a default bytesPerChecksum value of zero. This number appears as a divisor in ChecksumFileSystem.getSumBufferSize, if it is not overriden in config. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2212) java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open
java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open --- Key: HADOOP-2212 URL: https://issues.apache.org/jira/browse/HADOOP-2212 Project: Hadoop Issue Type: Bug Components: fs Affects Versions: 0.15.0 Reporter: Michael Bieniosek Priority: Critical Attachments: hadoop-2212.patch The ChecksumFileSystem uses a default bytesPerChecksum value of zero. This number appears as a divisor in ChecksumFileSystem.getSumBufferSize, if it is not overriden in config. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-496: - Attachment: hadoop-496-3.patch I implemented the DFSDavResource.spool method. This allows data to be copied out (previously a GET on any file returned an empty body). I also ported to hadoop trunk. On an unrelated note, I think the webdav sources should be in the org.apache.hadoop.fs directory, not org.apache.hadoop.dfs, since there is nothing specific to the dfs about this patch. Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-496-3.patch, hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-496: - Attachment: hadoop-496-spool-cleanup.patch Here's a cleaner version of the patch. Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-496-3.patch, hadoop-496-spool-cleanup.patch, hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540906 ] Michael Bieniosek commented on HADOOP-496: -- Pete, how are you bridging between fuse and dfs? There is a tgz for fusej-hadoop floating around somewhere, though it is out of date. Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539463 ] Michael Bieniosek commented on HADOOP-496: -- I added some extra debugging to DFSDavResource.java, and it looks like the getHref() function is returning malformed urls: 07/11/01 13:34:56 INFO webdav.DFSDavResource: getHref() for path:/dirs/to/my/file.tgz - http://localhost:20015hdfs%3a//dfs.cluster.powerset.com%3a1/dirs/to/my/file.tgz Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
[ https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek reopened HADOOP-1825: --- This patch is wrong. The first line should be if [ ! -d $HADOOP_PID_DIR ]; then Note the $ in front of HADOOP_PID_DIR hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist - Key: HADOOP-1825 URL: https://issues.apache.org/jira/browse/HADOOP-1825 Project: Hadoop Issue Type: Bug Components: scripts Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Minor Fix For: 0.15.0 Attachments: hadoop-1825.patch If I try to bring up a datanode on a fresh machine, it will fail with this error message: starting datanode, logging to /b/hadoop/logs/hadoop-me-datanode-example.com.out /p/share/hadoop/bin/hadoop-daemon.sh: line 99: /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
[ https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1825: -- Fix Version/s: (was: 0.15.0) Status: Patch Available (was: Reopened) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist - Key: HADOOP-1825 URL: https://issues.apache.org/jira/browse/HADOOP-1825 Project: Hadoop Issue Type: Bug Components: scripts Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Minor Attachments: hadoop-1825-refix.patch, hadoop-1825.patch If I try to bring up a datanode on a fresh machine, it will fail with this error message: starting datanode, logging to /b/hadoop/logs/hadoop-me-datanode-example.com.out /p/share/hadoop/bin/hadoop-daemon.sh: line 99: /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536733 ] Michael Bieniosek commented on HADOOP-496: -- You can use telnet to connect to the server and send handmade http requests. I tried telnetting to the server and doing a GET /path/to/file.tgz. This gave me a 200 with an empty body. If I try to GET a file that doesn't exist, I get a 404 with an html error page. Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536531 ] Michael Bieniosek commented on HADOOP-496: -- Hey, I tried this patch out, and I noticed a few things: 1. The webdav server is hardcoded to bind to localhost, so I changed it to bind to 0.0.0.0 instead. I'd prefer if clients didn't all have to run their own server: if the DNS doesn't match, or the client doesn't want to set up hadoop and configure it, it's much easier. 2. When I actually tried to copy files out, I get a funny error in the client (on Mac OSX, it says There is a problem with the file and it cannot be copied). I wish I could be more helpful, but I don't know how to issue raw HTTP to the webdav server and there's nothing indicative in the webdav server log. 3. If I point an ordinary browser (or wget) at the webdav server, I get a 200 with an empty body for files that exist, and a 404 for files that don't exist. Again, I don't know much about webdav, but it would be nice if you could browse and download with an ordinary browser, as in subversion. It was nice to see this almost work, though it's not really usable for me because of problem 2. Thanks! Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file
[ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536344 ] Michael Bieniosek commented on HADOOP-2073: --- I am attaching a patch that changes file size after writing the data rather than before. That's still not perfect, though, since the datanode could die while the file is being rewritten, or before the file is resized. Datanode corruption if machine dies while writing VERSION file -- Key: HADOOP-2073 URL: https://issues.apache.org/jira/browse/HADOOP-2073 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.14.0 Reporter: Michael Bieniosek Assignee: Raghu Angadi Attachments: versionFileSize.patch Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes. Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times. A large percentage of these datanodes will not come up now, and write this message to the logs: 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. When I check, /hadoop/dfs/data/current/VERSION is an empty file. Consequently, I have to delete all the blocks on the datanode and start over. Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster. I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: {{{ /** * Write version file. * * @throws IOException */ void write() throws IOException { corruptPreUpgradeStorage(root); write(getVersionFile()); } void write(File to) throws IOException { Properties props = new Properties(); setFields(props, this); RandomAccessFile file = new RandomAccessFile(to, rws); FileOutputStream out = null; try { file.setLength(0); file.seek(0); out = new FileOutputStream(file.getFD()); props.store(out, null); } finally { if (out != null) { out.close(); } file.close(); } } }}} So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see. Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file
[ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536383 ] Michael Bieniosek commented on HADOOP-2073: --- [...] The point is that although this approach does not work for arbitrary file modifications, it works for what we do with the version file. Konstantin, could you put a comment in your patch explaining this argument? Thanks. Datanode corruption if machine dies while writing VERSION file -- Key: HADOOP-2073 URL: https://issues.apache.org/jira/browse/HADOOP-2073 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.14.0 Reporter: Michael Bieniosek Assignee: Raghu Angadi Attachments: versionFileSize.patch Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes. Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times. A large percentage of these datanodes will not come up now, and write this message to the logs: 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. When I check, /hadoop/dfs/data/current/VERSION is an empty file. Consequently, I have to delete all the blocks on the datanode and start over. Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster. I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: {{{ /** * Write version file. * * @throws IOException */ void write() throws IOException { corruptPreUpgradeStorage(root); write(getVersionFile()); } void write(File to) throws IOException { Properties props = new Properties(); setFields(props, this); RandomAccessFile file = new RandomAccessFile(to, rws); FileOutputStream out = null; try { file.setLength(0); file.seek(0); out = new FileOutputStream(file.getFD()); props.store(out, null); } finally { if (out != null) { out.close(); } file.close(); } } }}} So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see. Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file
[ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535990 ] Michael Bieniosek commented on HADOOP-2073: --- There's no Exception in my logs: I'm assuming the linux OOM killer sends the jvm a SIGKILL (http://lxr.linux.no/source/mm/oom_kill.c#L271), so the jvm prints out the shutdown message and exits without giving exception handlers a chance to do anything. There aren't any upgrades involved here. FWIW, my logs look like (repeated over and over again): 2007-10-17 00:19:07,051 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = (the hostname) STARTUP_MSG: args = [] / 2007-10-17 00:19:08,338 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2007-10-17 00:19:08,439 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at (the hostname) / Note that it didn't take long before the process was killed. Datanode corruption if machine dies while writing VERSION file -- Key: HADOOP-2073 URL: https://issues.apache.org/jira/browse/HADOOP-2073 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.14.0 Reporter: Michael Bieniosek Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes. Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times. A large percentage of these datanodes will not come up now, and write this message to the logs: 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. When I check, /hadoop/dfs/data/current/VERSION is an empty file. Consequently, I have to delete all the blocks on the datanode and start over. Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster. I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: {{{ /** * Write version file. * * @throws IOException */ void write() throws IOException { corruptPreUpgradeStorage(root); write(getVersionFile()); } void write(File to) throws IOException { Properties props = new Properties(); setFields(props, this); RandomAccessFile file = new RandomAccessFile(to, rws); FileOutputStream out = null; try { file.setLength(0); file.seek(0); out = new FileOutputStream(file.getFD()); props.store(out, null); } finally { if (out != null) { out.close(); } file.close(); } } }}} So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see. Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file
Datanode corruption if machine dies while writing VERSION file -- Key: HADOOP-2073 URL: https://issues.apache.org/jira/browse/HADOOP-2073 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.0 Reporter: Michael Bieniosek Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes. Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times. A large percentage of these datanodes will not come up now, and write this message to the logs: 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. When I check, /hadoop/dfs/data/current/VERSION is an empty file. Consequently, I have to delete all the blocks on the datanode and start over. Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster. I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: {{{ /** * Write version file. * * @throws IOException */ void write() throws IOException { corruptPreUpgradeStorage(root); write(getVersionFile()); } void write(File to) throws IOException { Properties props = new Properties(); setFields(props, this); RandomAccessFile file = new RandomAccessFile(to, rws); FileOutputStream out = null; try { file.setLength(0); file.seek(0); out = new FileOutputStream(file.getFD()); props.store(out, null); } finally { if (out != null) { out.close(); } file.close(); } } }}} So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see. Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file
[ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2073: -- Component/s: dfs Datanode corruption if machine dies while writing VERSION file -- Key: HADOOP-2073 URL: https://issues.apache.org/jira/browse/HADOOP-2073 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.14.0 Reporter: Michael Bieniosek Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes. Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times. A large percentage of these datanodes will not come up now, and write this message to the logs: 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. When I check, /hadoop/dfs/data/current/VERSION is an empty file. Consequently, I have to delete all the blocks on the datanode and start over. Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster. I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: {{{ /** * Write version file. * * @throws IOException */ void write() throws IOException { corruptPreUpgradeStorage(root); write(getVersionFile()); } void write(File to) throws IOException { Properties props = new Properties(); setFields(props, this); RandomAccessFile file = new RandomAccessFile(to, rws); FileOutputStream out = null; try { file.setLength(0); file.seek(0); out = new FileOutputStream(file.getFD()); props.store(out, null); } finally { if (out != null) { out.close(); } file.close(); } } }}} So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see. Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker
[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533868 ] Michael Bieniosek commented on HADOOP-1245: --- It might be enough to use the value at the jobtracker as a default, and override with the tasktracker value if it's present (along with the compatibility note). If a tasktracker is configured differently than the jobtracker, it's more likely the configurer intended the tasktracker's value to be used, as opposed to the configurer expecting the tasktracker value to be ignored. So I doubt using the jobtracker as an overridable default will break anybody. value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker - Key: HADOOP-1245 URL: https://issues.apache.org/jira/browse/HADOOP-1245 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek Assignee: Michael Bieniosek Attachments: tasktracker-max-tasks-1245.patch I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. Originally, I thought the behavior was slightly different, so this issue contained this text: After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker
[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533881 ] Michael Bieniosek commented on HADOOP-1245: --- Rick, I forgot about that. So my suggestion would be rather useless. value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker - Key: HADOOP-1245 URL: https://issues.apache.org/jira/browse/HADOOP-1245 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek Assignee: Michael Bieniosek Fix For: 0.16.0 Attachments: tasktracker-max-tasks-1245.patch I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. Originally, I thought the behavior was slightly different, so this issue contained this text: After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker
[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1245: -- Description: I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. Originally, I thought the behavior was slightly different, so this issue contained this text: After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. was: I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on the jobtracker and the tasktracker. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. Summary: value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker (was: value for mapred.tasktracker.tasks.maximum taken from two different sources) Fixing issue description to reflect reality as reported by others value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker - Key: HADOOP-1245 URL: https://issues.apache.org/jira/browse/HADOOP-1245 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek Attachments: tasktracker-max-tasks-1245.patch I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. Originally, I thought the behavior was slightly different, so this issue contained this text: After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from two different sources
[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533164 ] Michael Bieniosek commented on HADOOP-1245: --- Patch looks reasonable. +1 value for mapred.tasktracker.tasks.maximum taken from two different sources --- Key: HADOOP-1245 URL: https://issues.apache.org/jira/browse/HADOOP-1245 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek Attachments: tasktracker-max-tasks-1245.patch I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on the jobtracker and the tasktracker. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from two different sources
[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1245: -- Status: Patch Available (was: Open) See what hudson thinks... value for mapred.tasktracker.tasks.maximum taken from two different sources --- Key: HADOOP-1245 URL: https://issues.apache.org/jira/browse/HADOOP-1245 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek Attachments: tasktracker-max-tasks-1245.patch I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on the jobtracker and the tasktracker. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2001) Deadlock in jobtracker
[ https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532930 ] Michael Bieniosek commented on HADOOP-2001: --- That seems wrong (though, like I say, I don't really know this code). If you have to hold the jobtracker lock in order to hold the per-job lock, then why bother acquiring the per-job lock at all? Deadlock in jobtracker -- Key: HADOOP-2001 URL: https://issues.apache.org/jira/browse/HADOOP-2001 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.0 Reporter: Michael Bieniosek Assignee: Devaraj Das Priority: Blocker Fix For: 0.15.0 Attachments: 2001.patch, 2001.patch My jobtracker deadlocked; the output from kill -QUIT is: Found one Java-level deadlock: = IPC Server handler 2 on 10001: waiting to lock monitor 0x0813724c (object 0xd5175488, a org.apache.hadoop.mapred.JobInProgress), which is held by SocketListener0-1 SocketListener0-1: waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a org.apache.hadoop.mapred.JobTracker), which is held by IPC Server handler 2 on 10001 Java stack information for the threads listed above: === IPC Server handler 2 on 10001: at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367) - waiting to lock 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) SocketListener0-1: at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907) - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) Found 1 deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2001) Deadlock in jobtracker
Deadlock in jobtracker -- Key: HADOOP-2001 URL: https://issues.apache.org/jira/browse/HADOOP-2001 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Critical My jobtracker deadlocked; the output from kill -QUIT is: Found one Java-level deadlock: = IPC Server handler 2 on 10001: waiting to lock monitor 0x0813724c (object 0xd5175488, a org.apache.hadoop.mapred.JobInProgress), which is held by SocketListener0-1 SocketListener0-1: waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a org.apache.hadoop.mapred.JobTracker), which is held by IPC Server handler 2 on 10001 Java stack information for the threads listed above: === IPC Server handler 2 on 10001: at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367) - waiting to lock 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) SocketListener0-1: at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907) - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) Found 1 deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2001) Deadlock in jobtracker
[ https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532798 ] Michael Bieniosek commented on HADOOP-2001: --- I could submit a quick fix patch that unmarks JobTracker.finalizeJob synchronized, but I don't really know if that would break other things, or if it could miss other deadlock paths. Anybody else know more about this code? Deadlock in jobtracker -- Key: HADOOP-2001 URL: https://issues.apache.org/jira/browse/HADOOP-2001 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Critical My jobtracker deadlocked; the output from kill -QUIT is: Found one Java-level deadlock: = IPC Server handler 2 on 10001: waiting to lock monitor 0x0813724c (object 0xd5175488, a org.apache.hadoop.mapred.JobInProgress), which is held by SocketListener0-1 SocketListener0-1: waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a org.apache.hadoop.mapred.JobTracker), which is held by IPC Server handler 2 on 10001 Java stack information for the threads listed above: === IPC Server handler 2 on 10001: at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367) - waiting to lock 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) SocketListener0-1: at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907) - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) Found 1 deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2001) Deadlock in jobtracker
[ https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2001: -- Fix Version/s: 0.15.0 It would be nice to get this fixed for 0.15.0, since it does deadlock the jobtracker. I can submit the patch I describe if someone thinks that's a good idea. Deadlock in jobtracker -- Key: HADOOP-2001 URL: https://issues.apache.org/jira/browse/HADOOP-2001 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Critical Fix For: 0.15.0 My jobtracker deadlocked; the output from kill -QUIT is: Found one Java-level deadlock: = IPC Server handler 2 on 10001: waiting to lock monitor 0x0813724c (object 0xd5175488, a org.apache.hadoop.mapred.JobInProgress), which is held by SocketListener0-1 SocketListener0-1: waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a org.apache.hadoop.mapred.JobTracker), which is held by IPC Server handler 2 on 10001 Java stack information for the threads listed above: === IPC Server handler 2 on 10001: at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367) - waiting to lock 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116) - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) SocketListener0-1: at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907) - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891) - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) Found 1 deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1319) NPE in TaskLog.getTaskLogDir, as called from tasklog.jsp
[ https://issues.apache.org/jira/browse/HADOOP-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek resolved HADOOP-1319. --- Resolution: Invalid I believe this code has been rewritten, so this bug is no longer valid. NPE in TaskLog.getTaskLogDir, as called from tasklog.jsp Key: HADOOP-1319 URL: https://issues.apache.org/jira/browse/HADOOP-1319 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek Priority: Minor Calling TaskCompletionEvent.getTaskTrackerHttp() gives me an url that looks like http://tasktracker.host:50060/tasklog.jsp?plaintext=truetaskid=task_0264_m_83_0all=true. If I try to access that URL, I get a Jetty 500 error. In my tasktracker logs, I then see: 2007-05-02 21:32:16,107 WARN /: /tasklog.jsp?taskid=task_0261_m_00_0all=true plaintext=true: java.lang.NullPointerException at org.apache.hadoop.mapred.TaskLog.getTaskLogDir(TaskLog.java:49) at org.apache.hadoop.mapred.TaskLog.access$000(TaskLog.java:33) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:313) at org.apache.hadoop.mapred.tasklog_jsp.printTaskLog(tasklog_jsp.java:26) at org.apache.hadoop.mapred.tasklog_jsp._jspService(tasklog_jsp.java:232) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplication Handler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567 ) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCo ntext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:24 4) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist - Key: HADOOP-1825 URL: https://issues.apache.org/jira/browse/HADOOP-1825 Project: Hadoop Issue Type: Bug Components: scripts Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Minor If I try to bring up a datanode on a fresh machine, it will fail with this error message: starting datanode, logging to /b/hadoop/logs/hadoop-me-datanode-example.com.out /p/share/hadoop/bin/hadoop-daemon.sh: line 99: /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
[ https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1825: -- Status: Patch Available (was: Open) Here's a patch to automatically creates the pid directory if it doesn't exist hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist - Key: HADOOP-1825 URL: https://issues.apache.org/jira/browse/HADOOP-1825 Project: Hadoop Issue Type: Bug Components: scripts Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Minor Attachments: hadoop-1825.patch If I try to bring up a datanode on a fresh machine, it will fail with this error message: starting datanode, logging to /b/hadoop/logs/hadoop-me-datanode-example.com.out /p/share/hadoop/bin/hadoop-daemon.sh: line 99: /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
[ https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1825: -- Attachment: hadoop-1825.patch hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist - Key: HADOOP-1825 URL: https://issues.apache.org/jira/browse/HADOOP-1825 Project: Hadoop Issue Type: Bug Components: scripts Affects Versions: 0.14.0 Reporter: Michael Bieniosek Priority: Minor Attachments: hadoop-1825.patch If I try to bring up a datanode on a fresh machine, it will fail with this error message: starting datanode, logging to /b/hadoop/logs/hadoop-me-datanode-example.com.out /p/share/hadoop/bin/hadoop-daemon.sh: line 99: /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1781) Need more complete API of JobClient class
[ https://issues.apache.org/jira/browse/HADOOP-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522657 ] Michael Bieniosek commented on HADOOP-1781: --- Most of these are available currently. You need to call client.getJob(jobid) to get a RunningJob, and then the RunningJob object has your desired APIs 2,3,4, and 5. This works for completed jobs as well as running jobs. Need more complete API of JobClient class - Key: HADOOP-1781 URL: https://issues.apache.org/jira/browse/HADOOP-1781 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Runping Qi We need a programmatic way to find out the information about a map/reduce cluster and the jobs on the cluster. The current API is not complete. In particular, the following API functions are needed: 1. jobs() currently, there is an API function JobsToComplete, which returns running/waiting jobs only. jobs() should return the complete list. 2. TaskReport[] getMap/ReduceTaskReports(String jobid) 3. getStartTime() 4. getJobStatus(String jobid); 5. getJobProfile(String jobid); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1770) Illegal state exception in printTaskLog - sendError
Illegal state exception in printTaskLog - sendError Key: HADOOP-1770 URL: https://issues.apache.org/jira/browse/HADOOP-1770 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Michael Bieniosek This error shows up in my logs: 2007-08-23 16:40:08,028 WARN /: /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1745) userlogs not showing up for new jobs
[ https://issues.apache.org/jira/browse/HADOOP-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1745: -- Component/s: mapred userlogs not showing up for new jobs Key: HADOOP-1745 URL: https://issues.apache.org/jira/browse/HADOOP-1745 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Michael Bieniosek When I start a new hadoop job, the logs do not show up for a while. If I check on the filesystem, the file userlogs/$task/stdout is a regular file with size 0. This was supposed to be fixed in 0.14 by HADOOP-1524. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from two different sources
[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1245: -- Description: I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on the jobtracker and the tasktracker. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. was: When I start a job, hadoop uses mapred.tasktracker.tasks.maximum on the jobtracker. Once these tasks finish, it is the tasktracker's value of mapred.tasktracker.tasks.maximum that decides how many new tasks are created for each host. This would probably be fixed if HADOOP-785 were implemented. value for mapred.tasktracker.tasks.maximum taken from two different sources --- Key: HADOOP-1245 URL: https://issues.apache.org/jira/browse/HADOOP-1245 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.12.3 Reporter: Michael Bieniosek I want to create a cluster with machines with different numbers of CPUs. Consequently, each machine should have a different value for mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on the jobtracker and the tasktracker. When a new job starts up, the jobtracker uses its (single) value for mapred.tasktracker.tasks.maximum to assign tasks. This means that each tasktracker gets the same number of tasks, regardless of how I configured that particular machine. After the first task finishes on each tasktracker, the tasktracker will request new tasks from the jobtracker according to the tasktracker's value for mapred.tasktracker.tasks.maximum. So after the first round of map tasks is done, the cluster reverts to a mode that works well for heterogeneous clusters. The jobtracker should not consult its config for the value of mapred.tasktracker.tasks.maximum. It should assign tasks (or allow tasktrackers to request tasks) according to each tasktracker's value of mapred.tasktracker.tasks.maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-416) Web UI JSP: need to HTML-Escape log file contents
[ https://issues.apache.org/jira/browse/HADOOP-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_1251 ] Michael Bieniosek commented on HADOOP-416: -- I've noticed that occasionally snippets of web pages make it to the log pages. This could potentially be a security problem, so we should fix this. I don't think pre is a great solution, since there could be a /pre in the text. It's probably better to escape , or set the content-type to text/plain. Web UI JSP: need to HTML-Escape log file contents - Key: HADOOP-416 URL: https://issues.apache.org/jira/browse/HADOOP-416 Project: Hadoop Issue Type: Bug Components: mapred Reporter: Michel Tourn Assignee: Owen O'Malley Web UI JSP: need to HTML-Escape log (file) contents Displaying the task's error log or the mapred.Reporter status String: the content should have all and converted to lt; and gt;, or use pre tag. Otherwise, ant HTML/XML tags within will not be displayed correctly This problem occurs for ex. when using hadoopStreaming and a MapRed record is a chunk of HTML/XML content (and a task fails) ex. problematic view: http://jobtracker:50030/taskdetails.jsp?jobid=job_0009taskid=tip_0009_m_00 Other jsp pages may also need a change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-785) Divide the server and client configurations
[ https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517343 ] Michael Bieniosek commented on HADOOP-785: -- Arun, Your proposal sounds reasonable. Thanks for looking at this issue. Currently, hadoop-default.xml is not supposed to be changed by users. Would you relax this convention in your proposal? There might be a few variables that I'd like to set for client and server at the same time (eg. namenode address). Why don't you want to split up namenode vs. jobtracker and datanode vs. tasktracker? I understand that it's desirable to keep things simple, but dfs and mapreduce don't interact very much in terms of their configs, so there is a natural separation. Instead of dividing configs into beginner and advanced, we should think about dividing into things you probably need to change (at the top of the file) and things you probably don't need to change (at the bottom of the file). This division could be done with xml comments -- I don't think it needs to be so formal as to need a new field. Divide the server and client configurations --- Key: HADOOP-785 URL: https://issues.apache.org/jira/browse/HADOOP-785 Project: Hadoop Issue Type: Improvement Components: conf Affects Versions: 0.9.0 Reporter: Owen O'Malley Assignee: Arun C Murthy Fix For: 0.15.0 The configuration system is easy to misconfigure and I think we need to strongly divide the server from client configs. An example of the problem was a configuration where the task tracker has a hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker had the right number of reduces, but the map task thought there was a single reduce. This lead to a hard to find diagnose failure. Therefore, I propose separating out the configuration types as: class Configuration; // reads site-default.xml, hadoop-default.xml class ServerConf extends Configuration; // reads hadoop-server.xml, $super class DfsServerConf extends ServerConf; // reads dfs-server.xml, $super class MapRedServerConf extends ServerConf; // reads mapred-server.xml, $super class ClientConf extends Configuration; // reads hadoop-client.xml, $super class JobConf extends ClientConf; // reads job.xml, $super Note in particular, that nothing corresponds to hadoop-site.xml, which overrides both client and server configs. Furthermore, the properties from the *-default.xml files should never be saved into the job.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1638) Master node unable to bind to DNS hostname
[ https://issues.apache.org/jira/browse/HADOOP-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514374 ] Michael Bieniosek commented on HADOOP-1638: --- I abandoned HADOOP-1202 because people didn't seem to see any value in it, and I changed the way I use hadoop on ec2 around the same time. You're welcome to pick it up and port the patch to trunk; it shouldn't be too much work. -Michael Master node unable to bind to DNS hostname -- Key: HADOOP-1638 URL: https://issues.apache.org/jira/browse/HADOOP-1638 Project: Hadoop Issue Type: Bug Components: contrib/ec2 Affects Versions: 0.13.0, 0.13.1, 0.14.0, 0.15.0 Reporter: Stu Hood Priority: Minor Fix For: 0.13.1, 0.14.0, 0.15.0 Attachments: hadoop-1638.patch With a release package of Hadoop 0.13.0 or with latest SVN, the Hadoop contrib/ec2 scripts fail to start Hadoop correctly. After working around issues HADOOP-1634 and HADOOP-1635, and setting up a DynDNS address pointing to the master's IP, the ec2/bin/start-hadoop script completes. But the cluster is unusable because the namenode and tasktracker have not started successfully. Looking at the namenode log on the master reveals the following error: {quote} 2007-07-19 16:54:53,156 ERROR org.apache.hadoop.dfs.NameNode: java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:186) at org.apache.hadoop.ipc.Server.init(Server.java:631) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:325) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:295) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:211) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:803) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:811) {quote} The master node refuses to bind to the DynDNS hostname in the generated hadoop-site.xml. Here is the relevant part of the generated file: {quote} property namefs.default.name/name valueblah-ec2.gotdns.org:50001/value /property property namemapred.job.tracker/name valueblah-ec2.gotdns.org:50002/value /property {quote} I'll attach a patch against hadoop-trunk that fixes the issue for me, but I'm not sure if this issue is something that someone can fix more thoroughly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY -- Key: HADOOP-1636 URL: https://issues.apache.org/jira/browse/HADOOP-1636 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.13.0 Reporter: Michael Bieniosek In JobTracker.java: static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100; This should be configurable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
[ https://issues.apache.org/jira/browse/HADOOP-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1636: -- Status: Patch Available (was: Open) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY -- Key: HADOOP-1636 URL: https://issues.apache.org/jira/browse/HADOOP-1636 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.13.0 Reporter: Michael Bieniosek In JobTracker.java: static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100; This should be configurable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
[ https://issues.apache.org/jira/browse/HADOOP-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-1636: -- Attachment: configure-max-completed-jobs.patch This patch creates a new configurable variable mapred.jobtracker.completeuserjobs.maximum, which defaults to 100 (the current hard-coded value). When this many jobs are completed (failed or succeeded), hadoop deletes finished jobs from memory, making them accessible only through the information-poor jobhistory page. This limit is supposedly per user, but I submit all jobs as the same user. I have tested this patch, and it seems to work. constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY -- Key: HADOOP-1636 URL: https://issues.apache.org/jira/browse/HADOOP-1636 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.13.0 Reporter: Michael Bieniosek Attachments: configure-max-completed-jobs.patch In JobTracker.java: static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100; This should be configurable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
[ https://issues.apache.org/jira/browse/HADOOP-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514076 ] Michael Bieniosek edited comment on HADOOP-1636 at 7/19/07 7:02 PM: This patch creates a new configurable variable mapred.jobtracker.completeuserjobs.maximum, which defaults to 100 (the current hard-coded value). When this many jobs are completed (failed or succeeded), hadoop deletes finished jobs from memory, making them accessible only through the information-poor jobhistory page. This limit is supposedly per user, but I submit all jobs as the same user. (this is the current behavior, which is unchanged by my patch) I have tested this patch, and it seems to work. was: This patch creates a new configurable variable mapred.jobtracker.completeuserjobs.maximum, which defaults to 100 (the current hard-coded value). When this many jobs are completed (failed or succeeded), hadoop deletes finished jobs from memory, making them accessible only through the information-poor jobhistory page. This limit is supposedly per user, but I submit all jobs as the same user. I have tested this patch, and it seems to work. constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY -- Key: HADOOP-1636 URL: https://issues.apache.org/jira/browse/HADOOP-1636 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.13.0 Reporter: Michael Bieniosek Attachments: configure-max-completed-jobs.patch In JobTracker.java: static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100; This should be configurable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.