from:"Michael Bieniosek \(JIRA\)"

[jira] Commented: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life

2008-01-17 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560192#action_12560192
 ] 

Michael Bieniosek commented on HADOOP-2572:
---

 This should be re-written to test if the file exists rather than catching the 
 exception. Exceptions should be saved for unexpected problems.

This would make the patch much more complicated, because the code that 
synthesizes the filename is buried in the TaskLog.Reader constructor.


 TaskLogServlet returns 410 when trying to access log early in task life
 ---

 Key: HADOOP-2572
 URL: https://issues.apache.org/jira/browse/HADOOP-2572
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: hadoop-2572.patch


 Early in a map task life, or for tasks that died quickly, the file 
 $task/syslog might not exist.  In this case, the TaskLogServlet gives a 
 status 410.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2612) Mysterious ArrayOutOfBoundsException in HTable.commit

2008-01-15 Thread Michael Bieniosek (JIRA)

Mysterious ArrayOutOfBoundsException in HTable.commit
-

 Key: HADOOP-2612
 URL: https://issues.apache.org/jira/browse/HADOOP-2612
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek


I got this exception using a post-0.15.0 hbase trunk:

Caused by: java.io.IOException: java.io.IOException: 
java.lang.ArrayIndexOutOfBoundsException

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
at org.apache.hadoop.hbase.HTable.commit(HTable.java:904)
at org.apache.hadoop.hbase.HTable.commit(HTable.java:875)
at xxx.PutHbase$HbaseUploader.writeHbaseNoRetry(PutHbase.java:107)


Where writeHbaseNoRetry looks like:

private void writeHbaseNoRetry(HTable table, String column, String row, 
File contents) throws IOException {
  long lockid = table.startUpdate(new Text(row));
  try {
table.put(lockid, new Text(column), FileUtil.readFile(contents));
table.commit(lockid);
  } finally {
table.abort(lockid);
  }
}

I found this in my error logs -- it is rare, and I am not sure how to reproduce 
it.  Contents could be 1kb-100kb long.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life

2008-01-09 Thread Michael Bieniosek (JIRA)

TaskLogServlet returns 410 when trying to access log early in task life
---

 Key: HADOOP-2572
 URL: https://issues.apache.org/jira/browse/HADOOP-2572
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
 Fix For: 0.16.0


Early in a map task life, or for tasks that died quickly, the file $task/syslog 
might not exist.  In this case, the TaskLogServlet gives a status 410.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life

2008-01-09 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2572:
--

Status: Patch Available  (was: Open)

 TaskLogServlet returns 410 when trying to access log early in task life
 ---

 Key: HADOOP-2572
 URL: https://issues.apache.org/jira/browse/HADOOP-2572
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
 Fix For: 0.16.0


 Early in a map task life, or for tasks that died quickly, the file 
 $task/syslog might not exist.  In this case, the TaskLogServlet gives a 
 status 410.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2572) TaskLogServlet returns 410 when trying to access log early in task life

2008-01-09 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2572:
--

Attachment: hadoop-2572.patch

Here is a patch that ignores FileNotFoundExceptions, and just creates empty 
boxes for portions of the output log that are not present.

I marked this issue fix for 0.16.0 -- I'm not sure if I missed the cutoff, but 
this is a low-impact issue (only affects the TaskLogServlet), and it makes it 
less convenient for me to debug failed map tasks, so it is important for me.

 TaskLogServlet returns 410 when trying to access log early in task life
 ---

 Key: HADOOP-2572
 URL: https://issues.apache.org/jira/browse/HADOOP-2572
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: hadoop-2572.patch


 Early in a map task life, or for tasks that died quickly, the file 
 $task/syslog might not exist.  In this case, the TaskLogServlet gives a 
 status 410.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2538) NPE in TaskLog.java

2008-01-09 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2538:
--

Status: Patch Available  (was: Open)

 NPE in TaskLog.java
 ---

 Key: HADOOP-2538
 URL: https://issues.apache.org/jira/browse/HADOOP-2538
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
 Attachments: hadoop-2538.patch


 In the tasktracker web ui, if I go to
 /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true
 which corresponds to a short log (4k), I get a 500 in the web ui, and this 
 NPE in the tasktracker log:
 2008-01-07 21:02:13,935 WARN /: 
 /tasklog?taskid=task_200801020752_0383_m_00_
 0all=trueplaintext=true: 
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
 at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
 ava:44)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2538) NPE in TaskLog.java

2008-01-09 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2538:
--

Attachment: hadoop-2538.patch

add a null check and warn user appropriately

 NPE in TaskLog.java
 ---

 Key: HADOOP-2538
 URL: https://issues.apache.org/jira/browse/HADOOP-2538
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
 Attachments: hadoop-2538.patch


 In the tasktracker web ui, if I go to
 /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true
 which corresponds to a short log (4k), I get a 500 in the web ui, and this 
 NPE in the tasktracker log:
 2008-01-07 21:02:13,935 WARN /: 
 /tasklog?taskid=task_200801020752_0383_m_00_
 0all=trueplaintext=true: 
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
 at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
 ava:44)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2538) NPE in TaskLog.java

2008-01-09 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2538:
--

Priority: Trivial  (was: Major)

 NPE in TaskLog.java
 ---

 Key: HADOOP-2538
 URL: https://issues.apache.org/jira/browse/HADOOP-2538
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
Priority: Trivial
 Attachments: hadoop-2538.patch


 In the tasktracker web ui, if I go to
 /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true
 which corresponds to a short log (4k), I get a 500 in the web ui, and this 
 NPE in the tasktracker log:
 2008-01-07 21:02:13,935 WARN /: 
 /tasklog?taskid=task_200801020752_0383_m_00_
 0all=trueplaintext=true: 
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
 at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
 ava:44)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
 )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2538) NPE in TaskLog.java

2008-01-09 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2538:
--

Description: 
In the tasktracker web ui, if I go to

/tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true

which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE 
in the tasktracker log:

2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_
0all=trueplaintext=true: 
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
ava:44)
at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
)

Note that 
/tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true is an 
invalid url; the url should look like plaintext=truefilter=STDOUT

  was:
In the tasktracker web ui, if I go to

/tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true

which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE 
in the tasktracker log:

2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_
0all=trueplaintext=true: 
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
ava:44)
at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
)



 NPE in TaskLog.java
 ---

 Key: HADOOP-2538
 URL: https://issues.apache.org/jira/browse/HADOOP-2538
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
Priority: Trivial
 Attachments: hadoop-2538.patch


 In the tasktracker web ui, if I go to
 /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true
 which corresponds to a short log (4k), I get a 500 in the web ui, and this 
 NPE in the tasktracker log:
 2008-01-07 21:02:13,935 WARN /: 
 /tasklog?taskid=task_200801020752_0383_m_00_
 0all=trueplaintext=true: 
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
 at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
 ava:44)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
 )
 Note that 
 /tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true is 
 an invalid url; the url should look like plaintext=truefilter=STDOUT

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1770) Illegal state exception in printTaskLog - sendError

2008-01-09 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557506#action_12557506
 ] 

Michael Bieniosek commented on HADOOP-1770:
---

This is because response.sendError is called after out.write in TaskLogServlet.

 Illegal state exception in printTaskLog - sendError
 

 Key: HADOOP-1770
 URL: https://issues.apache.org/jira/browse/HADOOP-1770
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 This error shows up in my logs:
 2007-08-23 16:40:08,028 WARN /: 
 /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: 
 java.lang.IllegalStateException: Committed
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2546) PHP class for Rest Interface

2008-01-08 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557052#action_12557052
 ] 

Michael Bieniosek commented on HADOOP-2546:
---

Hi Billy,

It might be easier for you to use php's built-in http client, rather than 
writing the http client yourself.  That way, you won't have to deal with 
chunked-encoding, keepalives, etc. (which you ignore, but you might get some 
performance improvements from them). 

To do the cell fetching, for example, you could just do:

{code}
$xml = simplexml_load_file(http://$host:60050/api/$table/row/$row;);

$content = array();

foreach ($xml-column as $column) {
$content[base64_decode($column-name)] = base64_decode($column-value);
}

return $content;
{code}

 PHP class for Rest Interface
 

 Key: HADOOP-2546
 URL: https://issues.apache.org/jira/browse/HADOOP-2546
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Billy Pearson
Priority: Trivial
 Attachments: hbase_rest.php


 This is a php class to interact with the rest interface this is my first copy 
 so there could be bugs and changes to come as the rest interface changes. I 
 will make this in to a patch once I am done with it. there is lots of 
 comments in the file and notes on usage but here is some basic stuff to get 
 you started. you are welcome to suggest changes to make it faster or more 
 usable.
 Basic Usage here more details in notes with each function
   // open a new connection to rest server. Hbase Master default port is 
 60010
   $hbase = new hbase_rest($ip, $port);
   // get list of tables
   $tables = $hbase-list_tables();
   // get table column family names and compression stuff
   $table_info = $hbase-table_schema(search_index);
   // get start and end row keys of each region
   $regions = $hbase-regions($table);
   
   // select data from hbase
   $results = $hbase-select($table,$row_key); 
   // insert data into hbase the $column and $data can be arrays with more 
 then one column inserted in one request
   $hbase-insert($table,$row,$column(s),$data(s));
   // delete a column from a row. Can not use * at this point to remove 
 all I thank there is plans to add this.
   $hbase-remove($table,$row,$column);
   
   // start a scanner on a set range of table
   $handle = $hbase-scanner_start($table,$cols,$start_row,$end_row);
   // pull the next row of data for a scanner handle
   $results = $hbase-scanner_get($handle);
   
   // delete a scanner handle
   $hbase-scanner_delete($handle);
   Example of using a scanner this will loop each row until it out of rows.
   include(hbase_rest.php);
   $hbase = new hbase_rest($ip, $port);
   $handle = $hbase-scanner_start($table,$cols,$start_row,$end_row);
   $results = true;
   while ($results){
   $results = $hbase-scanner_get($handle);
   if ($results){
   foreach($results['column'] as $key = $value){
   
   
   code here to work with the $key/column name and 
 the $value of the column
   
   
   } // end foreach
   } // end if
   }// end while
   $hbase-scanner_delete($handle);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()

2008-01-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556658#action_12556658
 ] 

Michael Bieniosek commented on HADOOP-2364:
---

I can't remember exactly.  I believe the regionserver keeps trying to connect, 
until eventually the old lease times out on the master.  

 when hbase regionserver restarts, it says impossible state for createLease()
 --

 Key: HADOOP-2364
 URL: https://issues.apache.org/jira/browse/HADOOP-2364
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor

 I restarted a regionserver, and got this error in its logs:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.AssertionError: Impossible state for createLease(): Lease 
 -435227488/-435227488 is still held.
 at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145)
 at 
 org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278
 )
 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at org.apache.hadoop.ipc.Client.call(Client.java:482)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
 at $Proxy0.regionServerStartup(Unknown Source)
 at 
 org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav
 a:1025)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659)
 at java.lang.Thread.run(Unknown Source)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1770) Illegal state exception in printTaskLog - sendError

2008-01-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556689#action_12556689
 ] 

Michael Bieniosek commented on HADOOP-1770:
---

This makes it impossible for me to view logs for tasks with a small amount of 
output.  This is actually masking another exception, since the exception comes 
from a catch block.

 Illegal state exception in printTaskLog - sendError
 

 Key: HADOOP-1770
 URL: https://issues.apache.org/jira/browse/HADOOP-1770
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 This error shows up in my logs:
 2007-08-23 16:40:08,028 WARN /: 
 /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: 
 java.lang.IllegalStateException: Committed
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-1770) Illegal state exception in printTaskLog - sendError

2008-01-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556689#action_12556689
 ] 

bien edited comment on HADOOP-1770 at 1/7/08 12:59 PM:


This makes it difficult for me to view logs for tasks with a small amount of 
output (I have to log in to the machine in question).  This is masking another 
exception, since the exception comes from a catch block.

  was (Author: bien):
This makes it impossible for me to view logs for tasks with a small amount 
of output.  This is actually masking another exception, since the exception 
comes from a catch block.
  
 Illegal state exception in printTaskLog - sendError
 

 Key: HADOOP-1770
 URL: https://issues.apache.org/jira/browse/HADOOP-1770
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 This error shows up in my logs:
 2007-08-23 16:40:08,028 WARN /: 
 /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: 
 java.lang.IllegalStateException: Committed
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1770) Illegal state exception in printTaskLog - sendError

2008-01-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556690#action_12556690
 ] 

Michael Bieniosek commented on HADOOP-1770:
---

This corresponds with an error in the web ui:

HTTP ERROR: 410

Failed to retrieve syslog log for task: task_200801020752_0383_m_00_0


 Illegal state exception in printTaskLog - sendError
 

 Key: HADOOP-1770
 URL: https://issues.apache.org/jira/browse/HADOOP-1770
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 This error shows up in my logs:
 2007-08-23 16:40:08,028 WARN /: 
 /tasklog?taskid=task_200708212126_0043_m_000100_0all=true: 
 java.lang.IllegalStateException: Committed
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
 at 
 org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2538) NPE in TaskLog.java

2008-01-07 Thread Michael Bieniosek (JIRA)

NPE in TaskLog.java
---

 Key: HADOOP-2538
 URL: https://issues.apache.org/jira/browse/HADOOP-2538
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.15.0
Reporter: Michael Bieniosek


In the tasktracker web ui, if I go to

/tasklog?taskid=task_200801020752_0383_m_00_0all=trueplaintext=true

which corresponds to a short log (4k), I get a 500 in the web ui, and this NPE 
in the tasktracker log:

2008-01-07 21:02:13,935 WARN /: /tasklog?taskid=task_200801020752_0383_m_00_
0all=trueplaintext=true: 
java.lang.NullPointerException
at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:48)
at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:124)
at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.j
ava:44)
at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:134
)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2068) [hbase] RESTful interface

2008-01-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556797#action_12556797
 ] 

Michael Bieniosek commented on HADOOP-2068:
---

Hey Billy,

Could you post your PHP class?  I need to use hbase from a PHP client and was 
wondering I could start from yours.

Thanks.

 [hbase] RESTful interface
 -

 Key: HADOOP-2068
 URL: https://issues.apache.org/jira/browse/HADOOP-2068
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: stack
Assignee: Bryan Duxbury
Priority: Minor
 Fix For: 0.16.0

 Attachments: rest-11-27-07-v2.patch, rest-11-27-07.3.patc, 
 rest-11-27-07.patch, rest-11-28-07.2.patch, rest-11-28-07.3.patch, 
 rest-11-28-07.patch, rest.patch


 A RESTful interface would be one means of making hbase accessible to clients 
 that are not java.  It might look something like the below:
 + An HTTP GET of  http://MASTER:PORT/ outputs the master's attributes: online 
 meta regions, list of tables, etc.: i.e. what you see now when you go to 
 http://MASTER:PORT/master.jsp.
 + An HTTP GET of http://MASTER:PORT/TABLENAME: 200 if tables exists and 
 HTableDescription (mimetype: text/plain or text/xml) or 401 if no such table. 
  HTTP DELETE would drop the table.  HTTP PUT would add one.
 + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW: 200 if row exists and 401 
 if not.
 + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNFAMILY: 
 HColumnDescriptor (mimetype: text/plain or text/xml) or 401 if no such table.
 + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/: 200 and latest 
 version (mimetype: binary/octet-stream) or 401 if no such cell. HTTP DELETE 
 would delete the cell.  HTTP PUT would add a new version.
 + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/TIMESTAMP: 200 
 (mimetype: binary/octet-stream) or 401 if no such cell. HTTP DELETE would 
 remove.  HTTP PUT would put this record.
 + Browser originally goes against master but master then redirects to the 
 hosting region server to serve, update, delete, etc. the addressed cell

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2545) hbase rest server should be started with hbase-daemon.sh

2008-01-07 Thread Michael Bieniosek (JIRA)

hbase rest server should be started with hbase-daemon.sh


 Key: HADOOP-2545
 URL: https://issues.apache.org/jira/browse/HADOOP-2545
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Michael Bieniosek


Currently, the hbase rest server is started with the hbase script.  But it 
should be started with the hbase-daemon script, which allows better 
configuration options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2545) hbase rest server should be started with hbase-daemon.sh

2008-01-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556806#action_12556806
 ] 

Michael Bieniosek commented on HADOOP-2545:
---

Currently I can't specify a config to the hbase rest servlet.  The hbase 
command line script only looks at hbase-config.sh in the same directory as 
hbase.  I'd like to be able to specify an arbitrary location for my 
hbase-env.sh and hbase-site.xml files.

 hbase rest server should be started with hbase-daemon.sh
 

 Key: HADOOP-2545
 URL: https://issues.apache.org/jira/browse/HADOOP-2545
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Michael Bieniosek
Priority: Minor

 Currently, the hbase rest server is started with the hbase script.  But it 
 should be started with the hbase-daemon script, which allows better 
 configuration options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.17

2008-01-03 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555757#action_12555757
 ] 

Michael Bieniosek commented on HADOOP-2325:
---

 Do we believe that the licensing situation is likely to change in the near 
 future? 

I think we just have to wait for Apple to do an official 1.6 release for OSX. 
 There was a lot of speculation that they would release shortly after OSX10.5 
came out, but then they didn't.


 Require Java 6 for release 0.17
 ---

 Key: HADOOP-2325
 URL: https://issues.apache.org/jira/browse/HADOOP-2325
 Project: Hadoop
  Issue Type: Improvement
  Components: build
Reporter: Doug Cutting
 Fix For: 0.17.0


 We should require Java 6 for release 0.17.  Java 6 is now available for OS/X. 
  Hadoop performs much better on Java 6.  And, finally, there are features of 
 Java 6 (like 'df') that would be nice to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2510) Map-Reduce 2.0

2008-01-02 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12555412#action_12555412
 ] 

Michael Bieniosek commented on HADOOP-2510:
---

A couple points:

1) the job client currently submits a job, then exits.  This means that the 
machine where the job client runs does not need to be reliable (it could be my 
laptop, for example).  I think this is a valuable feature.  The JobManager you 
suggest cannot be run on an unreliable machine -- I think you mention this in 
brownie point #3.

2) One of our problems is that we have substantial amounts of per-job software 
that is installed via rpm.  Our current solution is to create a job-private 
mapreduce cluster (not using HoD), install a bunch of software, then start the 
job.  This won't work if a machine might be running tasks from multiple jobs 
simultaneously.  This proposal doesn't seem to affect our ability to run 
private mapreduce clusters.  But it does make it less useful for us.  You 
suggest xen, which would let us configure per-task; that might work but it will 
increase the task overhead.  Another possibility is allocating a 
machine-at-a-time to jobs, so we only have to configure the machine once per 
job.

I'm not totally sure what the point is here -- it seems like you mainly want to 
separate the jobtracker's scheduling and monitoring functions.  Is there a 
scaling problem with the jobtracker currently?  You discuss the jobtracker 
being a single point of failure, but the namenode is already a more serious 
point of failure, since it is much more work to rebuild a namenode if it dies.  
Are you also trying to replace HoD?  


 Map-Reduce 2.0
 --

 Key: HADOOP-2510
 URL: https://issues.apache.org/jira/browse/HADOOP-2510
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Arun C Murthy

 We, at Yahoo!, have been using Hadoop-On-Demand as the resource 
 provisioning/scheduling mechanism. 
 With HoD the user uses a self-service system to ask-for a set of nodes. HoD 
 allocates these from a global pool and also provisions a private Map-Reduce 
 cluster for the user. She then runs her jobs and shuts the cluster down via 
 HoD when done. All user-private clusters use the same humongous, static HDFS 
 (e.g. 2k node HDFS). 
 More details about HoD are available here: HADOOP-1301.
 
 h3. Motivation
 The current deployment (Hadoop + HoD) has a couple of implications:
  * _Non-optimal Cluster Utilization_
1. Job-private Map-Reduce clusters imply that the user-cluster potentially 
 could be *idle* for atleast a while before being detected and shut-down.
2. Elastic Jobs: Map-Reduce jobs, typically, have lots of maps with 
 much-smaller no. of reduces; with maps being light and quick and reduces 
 being i/o heavy and longer-running. Users typically allocate clusters 
 depending on the no. of maps (i.e. input size) which leads to the scenario 
 where all the maps are done (idle nodes in the cluster) and the few reduces 
 are chugging along. Right now, we do not have the ability to shrink the 
 HoD'ed Map-Reduce clusters which would alleviate this issue. 
  * _Impact on data-locality_
 With the current setup of a static, large HDFS and much smaller (5/10/20/50 
 node) clusters there is a good chance of losing one of Map-Reduce's primary 
 features: ability to execute tasks on the datanodes where the input splits 
 are located. In fact, we have seen the data-local tasks go down to 20-25 
 percent in the GridMix benchmarks, from the 95-98 percent we see on the 
 randomwriter+sort runs run as part of the hadoopqa benchmarks (admittedly a 
 synthetic benchmark, but yet). Admittedly, HADOOP-1985 (rack-aware 
 Map-Reduce) helps significantly here.
 
 Primarily, the notion of *job-level scheduling* leading to private clusers, 
 as opposed to *task-level scheduling*, is a good peg to hang-on the majority 
 of the blame.
 Keeping the above factors in mind, here are some thoughts on how to 
 re-structure Hadoop Map-Reduce to solve some of these issues.
 
 h3. State of the Art
 As it exists today, a large, static, Hadoop Map-Reduce cluster (forget HoD 
 for a bit) does provide task-level scheduling; however as it exists today, 
 it's scalability to tens-of-thousands of user-jobs, per-week, is in question.
 Lets review it's current architecture and main components:
  * JobTracker: It does both *task-scheduling* and *task-monitoring* 
 (tasktrackers send task-statuses via periodic heartbeats), which implies it 
 is fairly loaded. It is also a _single-point of failure_ in the Map-Reduce 
 framework i.e. its failure implies that all the jobs in the system fail. This 
 means a static, large Map-Reduce cluster is fairly susceptible and a definite 
 suspect. Clearly HoD solves this by having per-job clusters, albeit with the

[jira] Commented: (HADOOP-1336) turn on speculative execution by defaul

2007-12-17 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552577
 ] 

Michael Bieniosek commented on HADOOP-1336:
---

-1

I agree with Marco -- I don't think speculative execution is something the 
average user wants by default.

 turn on speculative execution by defaul
 ---

 Key: HADOOP-1336
 URL: https://issues.apache.org/jira/browse/HADOOP-1336
 Project: Hadoop
  Issue Type: Task
  Components: mapred
Affects Versions: 0.12.3
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: spec-exec.patch


 Now that speculative execution is working again, we should enable it by 
 default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x

2007-12-14 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551998
 ] 

Michael Bieniosek commented on HADOOP-1650:
---

Any news on this?

 Upgrade Jetty to 6.x
 

 Key: HADOOP-1650
 URL: https://issues.apache.org/jira/browse/HADOOP-1650
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: hadoop-1650-jetty6.1.5.patch, 
 hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch


 This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 
 has fixed some of the issues we discovered in jetty during HADOOP-736 and 
 HADOOP-1273. I'd like to keep this issue open for sometime so that we have 
 enough time to test out things.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

2007-12-12 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551061
 ] 

Michael Bieniosek commented on HADOOP-2341:
---

 How about idle connections? Is that ok?

Because datanode does not use select for io, each idle connection seems to 
consume a thread.  This gets expensive, not necessarily the connections 
themselves.


 Datanode active connections never returns to 0
 --

 Key: HADOOP-2341
 URL: https://issues.apache.org/jira/browse/HADOOP-2341
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Paul Saab
 Attachments: dfsclient.patch, hregionserver-stack.txt, 
 stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt


 On trunk i continue to see the following in my data node logs:
 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 42
 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 41
 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 40
 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 39
 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 38
 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 37
 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 36
 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 35
 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 34
 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 33
 This number never returns to 0, even after many hours of no new data being 
 manipulated or added into the DFS.
 Looking at netstat -tn i see significant amount of data in the send-q that 
 never goes away:
 tcp0  34240 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:55792   
 ESTABLISHED 
 tcp0  38968 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:38169   
 ESTABLISHED 
 tcp0  38456 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:35456   
 ESTABLISHED 
 tcp0  29640 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:59845   
 ESTABLISHED 
 tcp0  50168 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:44584   
 ESTABLISHED 
 When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
 returning a window size of 0
 16:11:41.760474 IP XX.XX.XX.XXX.50010  YY.YY.YY.YY.44584: . ack 3339984123 
 win 46 nop,nop,timestamp 1786247180 885681789
 16:11:41.761597 IP YY.YY.YY.YY.44584  XX.XX.XX.XXX.50010: . ack 1 win 0 
 nop,nop,timestamp 885801786 1775711351
 Then we look at the stack traces on each datanode, I will have tons of 
 threads that *never* go away in the following trace:
 {code}
 Thread 6516 ([EMAIL PROTECTED]):
   State: RUNNABLE
   Blocked count: 0
   Waited count: 0
   Stack:
 java.net.SocketOutputStream.socketWrite0(Native Method)
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
 java.io.DataOutputStream.write(DataOutputStream.java:90)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
 org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
 java.lang.Thread.run(Thread.java:619)
 {code}
 Unfortunately there's very little in the logs with exceptions that could 
 point to this.  I have some exceptions the following, but nothing that points 
 to problems between XX and YY:
 {code}
 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
 delete block blk_4515246476002110310. Block not found in blockMap. 
 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
 deleting blocks.
 at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
 at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
 at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
 at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
 at java.lang.Thread.run(Thread.java:619)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

2007-12-12 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551072
 ] 

Michael Bieniosek commented on HADOOP-2341:
---

Well, hbase does use select, so it doesn't consume threads to have idle 
connections.

 Datanode active connections never returns to 0
 --

 Key: HADOOP-2341
 URL: https://issues.apache.org/jira/browse/HADOOP-2341
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Paul Saab
 Attachments: dfsclient.patch, hregionserver-stack.txt, 
 stacks-XX.XX.XX.XXX.txt, stacks-YY.YY.YY.YY.txt


 On trunk i continue to see the following in my data node logs:
 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 42
 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 41
 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 40
 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 39
 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 38
 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 37
 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 36
 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 35
 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 34
 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 33
 This number never returns to 0, even after many hours of no new data being 
 manipulated or added into the DFS.
 Looking at netstat -tn i see significant amount of data in the send-q that 
 never goes away:
 tcp0  34240 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:55792   
 ESTABLISHED 
 tcp0  38968 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:38169   
 ESTABLISHED 
 tcp0  38456 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:35456   
 ESTABLISHED 
 tcp0  29640 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:59845   
 ESTABLISHED 
 tcp0  50168 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:44584   
 ESTABLISHED 
 When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
 returning a window size of 0
 16:11:41.760474 IP XX.XX.XX.XXX.50010  YY.YY.YY.YY.44584: . ack 3339984123 
 win 46 nop,nop,timestamp 1786247180 885681789
 16:11:41.761597 IP YY.YY.YY.YY.44584  XX.XX.XX.XXX.50010: . ack 1 win 0 
 nop,nop,timestamp 885801786 1775711351
 Then we look at the stack traces on each datanode, I will have tons of 
 threads that *never* go away in the following trace:
 {code}
 Thread 6516 ([EMAIL PROTECTED]):
   State: RUNNABLE
   Blocked count: 0
   Waited count: 0
   Stack:
 java.net.SocketOutputStream.socketWrite0(Native Method)
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
 java.io.DataOutputStream.write(DataOutputStream.java:90)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
 org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
 java.lang.Thread.run(Thread.java:619)
 {code}
 Unfortunately there's very little in the logs with exceptions that could 
 point to this.  I have some exceptions the following, but nothing that points 
 to problems between XX and YY:
 {code}
 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
 delete block blk_4515246476002110310. Block not found in blockMap. 
 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
 deleting blocks.
 at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
 at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
 at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
 at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
 at java.lang.Thread.run(Thread.java:619)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2396) NPE in HMaster.cancelLease

2007-12-10 Thread Michael Bieniosek (JIRA)

NPE in HMaster.cancelLease
--

 Key: HADOOP-2396
 URL: https://issues.apache.org/jira/browse/HADOOP-2396
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor


When I shut down the master, one regionserver fails to notify the master that 
it shut down:

2007-12-10 19:59:17,080 WARN org.apache.hadoop.hbase.HRegionServer: Failed to 
send exiting message
 to master: 
java.io.IOException: java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.hbase.HMaster.cancelLease(HMaster.java:1463)
at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1331)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHan
dler.java:82)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.
java:48)
at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:863)
at java.lang.Thread.run(Unknown Source)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2400) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly

2007-12-10 Thread Michael Bieniosek (JIRA)

Where hbase/mapreduce have analogous configuration parameters, they should be 
named similarly
-

 Key: HADOOP-2400
 URL: https://issues.apache.org/jira/browse/HADOOP-2400
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor


mapreduce has a configuration property called mapred.system.dir which 
determines where in the DFS a jobtracker stores its data.  Similarly, hbase has 
a configuration property called hbase.rootdir which does something very 
similar.

These should have the same name, eg. hbase.system.dir or mapred.rootdir.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2400) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly

2007-12-10 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-2400:
--

Description:
mapreduce has a configuration property called mapred.system.dir which
determines where in the DFS a jobtracker stores its data. Similarly, hbase has
a configuration property called hbase.rootdir which does something very
similar.

These should have the same name, eg. hbase.system.dir and mapred.system.dir

was:
mapreduce has a configuration property called mapred.system.dir which
determines where in the DFS a jobtracker stores its data. Similarly, hbase has
a configuration property called hbase.rootdir which does something very
similar.

These should have the same name, eg. hbase.system.dir or mapred.rootdir.

Where hbase/mapreduce have analogous configuration parameters, they should be
named similarly
-

Key: HADOOP-2400
URL: https://issues.apache.org/jira/browse/HADOOP-2400
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor

mapreduce has a configuration property called mapred.system.dir which
determines where in the DFS a jobtracker stores its data. Similarly, hbase
has a configuration property called hbase.rootdir which does something very
similar.
These should have the same name, eg. hbase.system.dir and
mapred.system.dir

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.16.

2007-12-07 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549511
 ] 

Michael Bieniosek commented on HADOOP-2325:
---

Because java 6 (compiler + runtime) is not available under acceptable license 
on all platforms, I think references to functions like File.getFreeSpace should 
be done in a way that is backwards compatible with java 5.  I suggest you put 
all the references to java 6 functions in an optional jar, with fallback 
methods for java 5 users.

 Require Java 6 for release 0.16.
 

 Key: HADOOP-2325
 URL: https://issues.apache.org/jira/browse/HADOOP-2325
 Project: Hadoop
  Issue Type: Improvement
  Components: build
Reporter: Doug Cutting
 Fix For: 0.16.0


 We should require Java 6 for release 0.16.  Java 6 is now available for OS/X. 
  Hadoop performs much better on Java 6.  And, finally, there are features of 
 Java 6 (like 'df') that would be nice to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

2007-12-06 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549128
 ] 

Michael Bieniosek commented on HADOOP-2341:
---

I've noticed that the regionserver is using select for its io, which doesn't 
tie up threads, whereas the datanode seems to have one thread open/blocked for 
each write.  Is this intentional?  Shouldn't they be using the same mechanism?

 Datanode active connections never returns to 0
 --

 Key: HADOOP-2341
 URL: https://issues.apache.org/jira/browse/HADOOP-2341
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Paul Saab
 Attachments: hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, 
 stacks-YY.YY.YY.YY.txt


 On trunk i continue to see the following in my data node logs:
 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 42
 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 41
 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 40
 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 39
 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 38
 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 37
 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 36
 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 35
 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 34
 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 33
 This number never returns to 0, even after many hours of no new data being 
 manipulated or added into the DFS.
 Looking at netstat -tn i see significant amount of data in the send-q that 
 never goes away:
 tcp0  34240 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:55792   
 ESTABLISHED 
 tcp0  38968 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:38169   
 ESTABLISHED 
 tcp0  38456 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:35456   
 ESTABLISHED 
 tcp0  29640 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:59845   
 ESTABLISHED 
 tcp0  50168 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:44584   
 ESTABLISHED 
 When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
 returning a window size of 0
 16:11:41.760474 IP XX.XX.XX.XXX.50010  YY.YY.YY.YY.44584: . ack 3339984123 
 win 46 nop,nop,timestamp 1786247180 885681789
 16:11:41.761597 IP YY.YY.YY.YY.44584  XX.XX.XX.XXX.50010: . ack 1 win 0 
 nop,nop,timestamp 885801786 1775711351
 Then we look at the stack traces on each datanode, I will have tons of 
 threads that *never* go away in the following trace:
 {code}
 Thread 6516 ([EMAIL PROTECTED]):
   State: RUNNABLE
   Blocked count: 0
   Waited count: 0
   Stack:
 java.net.SocketOutputStream.socketWrite0(Native Method)
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
 java.io.DataOutputStream.write(DataOutputStream.java:90)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
 org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
 java.lang.Thread.run(Thread.java:619)
 {code}
 Unfortunately there's very little in the logs with exceptions that could 
 point to this.  I have some exceptions the following, but nothing that points 
 to problems between XX and YY:
 {code}
 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
 delete block blk_4515246476002110310. Block not found in blockMap. 
 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
 deleting blocks.
 at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
 at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
 at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
 at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
 at java.lang.Thread.run(Thread.java:619)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-05 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2350:
--

Priority: Critical  (was: Major)

Bump priority because it is a correctness issue

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
Assignee: stack
Priority: Critical
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()

2007-12-05 Thread Michael Bieniosek (JIRA)

when hbase regionserver restarts, it says impossible state for createLease()
--

 Key: HADOOP-2364
 URL: https://issues.apache.org/jira/browse/HADOOP-2364
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor


I restarted a regionserver, and got this error in its logs:

org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.lang.AssertionError: Impossible state for createLease(): Lease 
-435227488/-435227488 is still held.
at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145)
at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278
)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

at org.apache.hadoop.ipc.Client.call(Client.java:482)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
at $Proxy0.regionServerStartup(Unknown Source)
at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav
a:1025)
at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659)
at java.lang.Thread.run(Unknown Source)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

2007-12-04 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548343
 ] 

Michael Bieniosek commented on HADOOP-2341:
---

It seems like these socket reads/writes should time out eventually.  But the 70 
threads on my datanode are still waiting on write.

 Datanode active connections never returns to 0
 --

 Key: HADOOP-2341
 URL: https://issues.apache.org/jira/browse/HADOOP-2341
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Paul Saab
 Attachments: hregionserver-stack.txt, stacks-XX.XX.XX.XXX.txt, 
 stacks-YY.YY.YY.YY.txt


 On trunk i continue to see the following in my data node logs:
 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 42
 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 41
 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 40
 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 39
 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 38
 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 37
 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 36
 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 35
 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 34
 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 33
 This number never returns to 0, even after many hours of no new data being 
 manipulated or added into the DFS.
 Looking at netstat -tn i see significant amount of data in the send-q that 
 never goes away:
 tcp0  34240 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:55792   
 ESTABLISHED 
 tcp0  38968 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:38169   
 ESTABLISHED 
 tcp0  38456 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:35456   
 ESTABLISHED 
 tcp0  29640 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:59845   
 ESTABLISHED 
 tcp0  50168 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:44584   
 ESTABLISHED 
 When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
 returning a window size of 0
 16:11:41.760474 IP XX.XX.XX.XXX.50010  YY.YY.YY.YY.44584: . ack 3339984123 
 win 46 nop,nop,timestamp 1786247180 885681789
 16:11:41.761597 IP YY.YY.YY.YY.44584  XX.XX.XX.XXX.50010: . ack 1 win 0 
 nop,nop,timestamp 885801786 1775711351
 Then we look at the stack traces on each datanode, I will have tons of 
 threads that *never* go away in the following trace:
 {code}
 Thread 6516 ([EMAIL PROTECTED]):
   State: RUNNABLE
   Blocked count: 0
   Waited count: 0
   Stack:
 java.net.SocketOutputStream.socketWrite0(Native Method)
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
 java.io.DataOutputStream.write(DataOutputStream.java:90)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
 org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
 java.lang.Thread.run(Thread.java:619)
 {code}
 Unfortunately there's very little in the logs with exceptions that could 
 point to this.  I have some exceptions the following, but nothing that points 
 to problems between XX and YY:
 {code}
 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
 delete block blk_4515246476002110310. Block not found in blockMap. 
 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
 deleting blocks.
 at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:750)
 at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:675)
 at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:569)
 at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1720)
 at java.lang.Thread.run(Thread.java:619)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.16.

2007-12-04 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548403
 ] 

Michael Bieniosek commented on HADOOP-2325:
---

That java6 requires agreement to the java research license, which is 
unacceptable for commercial use.  

http://java.net/jrl.csp

 Require Java 6 for release 0.16.
 

 Key: HADOOP-2325
 URL: https://issues.apache.org/jira/browse/HADOOP-2325
 Project: Hadoop
  Issue Type: Improvement
  Components: build
Reporter: Doug Cutting
 Fix For: 0.16.0


 We should require Java 6 for release 0.16.  Java 6 is now available for OS/X. 
  Hadoop performs much better on Java 6.  And, finally, there are features of 
 Java 6 (like 'df') that would be nice to use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2325) Require Java 6 for release 0.16.

2007-12-04 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548405
]

Michael Bieniosek commented on HADOOP-2325:
---

To quote from the bikemonkey link:
{quote}
Licensing

The Mac OS X work is based heavily on the BSD Java port, which is licensed
under the JRL. The BSDs develop Java under the JRL; FreeBSD has negotiated a
license with Sun to distribute FreeBSD Java binaries based on the JRL sources.

As the Mac port stabilizes, I am merging my work upstream into the BSD port,
and in turn, it is a goal of the FreeBSD Java project to merge their work into
OpenJDK. I've signed a Sun Contributor Agreement in preparation for this, and
an OpenJDK Porters group has been proposed:
http://thread.gmane.org/gmane.comp.java.openjdk.general/630

While the JRL makes this initial port possible, OpenJDK's GPLv2+CE licensing
makes development and distribution far simpler. I hope to contribute this work
to OpenJDK as soon as is feasible.
{quote}

So, while IANAL, it appears this can only be used under JRL. But JRL only
permits Research Use, defining it as:
{quote}
Research Use means research, evaluation, or development for the
purpose of advancing knowledge, teaching, learning, or customizing the
Technology or Modifications for personal use. Research Use expressly
excludes use or distribution for direct or indirect commercial
(including strategic) gain or advantage.
{quote}

So I suspect that running hadoop client libraries at a commercial search engine
does not fall under Research Use.

Require Java 6 for release 0.16.

Key: HADOOP-2325
URL: https://issues.apache.org/jira/browse/HADOOP-2325
Project: Hadoop
Issue Type: Improvement
Components: build
Reporter: Doug Cutting
Fix For: 0.16.0

We should require Java 6 for release 0.16. Java 6 is now available for OS/X.
Hadoop performs much better on Java 6. And, finally, there are features of
Java 6 (like 'df') that would be nice to use.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2325) Require Java 6 for release 0.16.

2007-12-04 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548405
]

bien edited comment on HADOOP-2325 at 12/4/07 1:02 PM:

To quote from the bikemonkey link:

{quote}
By downloading these binaries, you certify that you are a Licensee in good
standing under the Java Research License of the Java 2 SDK, and that your
access, use, and distribution of code and information you may obtain at this
site is subject to the License. Please review the license at
http://java.net/jrl.csp, and submit your license acceptance to Sun.
{quote}

{quote}
Licensing

While the JRL makes this initial port possible, OpenJDK's GPLv2+CE licensing
makes development and distribution far simpler. I hope to contribute this work
to OpenJDK as soon as is feasible.
{quote}

So I suspect that running hadoop client libraries at a commercial search engine
does not fall under Research Use.

was (Author: bien):
To quote from the bikemonkey link:
{quote}
Licensing

While the JRL makes this initial port possible, OpenJDK's GPLv2+CE licensing
makes development and distribution far simpler. I hope to contribute this work
to OpenJDK as soon as is feasible.
{quote}

So I suspect that running hadoop client libraries at a commercial search engine
does not fall under Research Use.

Require Java 6 for release 0.16.

Key: HADOOP-2325
URL: https://issues.apache.org/jira/browse/HADOOP-2325
Project: Hadoop
Issue Type: Improvement
Components: build
Reporter: Doug Cutting
Fix For: 0.16.0

We should require Java 6 for release 0.16. Java 6 is now available for OS/X.
Hadoop performs much better on Java 6. And, finally, there are features of
Java 6 (like 'df') that would be nice to use.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-04 Thread Michael Bieniosek (JIRA)

hbase scanner api returns null row names, or skips row names if different 
column families do not have entries for some rows
---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
 Fix For: 0.16.0
 Attachments: TestScannerAPI.java

I'm attaching a test case that fails.

I noticed that if I create a table with two column families, and start a 
scanner on a row which only has an entry for one column family, the scanner 
will skip ahead to the row name for which the other column family has an entry.

eg., if I insert rows so my table will look like this:
{code}
row - a:a - b:b
aaa   a:1   nil
bbb   a:2   b:2
ccc   a:3   b:3
{code}

The scanner will tell me my table looks something like this:
{code}
row - a:a - b:b
bbb   a:1   b:2
bbb   a:2   b:3
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-04 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2350:
--

Attachment: TestScannerAPI.java

Here's a test case which illustrates the problem

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-04 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548506
 ] 

Michael Bieniosek commented on HADOOP-2350:
---

A secondary problem is that the HScannerInterface.iterator().next() sometimes 
returns a Map.Entry with a null value for both the key and value.  I think this 
may have something to do with the health of my cluster.

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-04 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2350:
--

Attachment: TestScannerAPI.java

Oops, re-upload without hardcoded hostname.

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java, TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-04 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548507
 ] 

Michael Bieniosek commented on HADOOP-2350:
---

This was not a problem in release 0.15; it has only occurred since we moved to 
trunk.

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java, TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-04 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2350:
--

Attachment: (was: TestScannerAPI.java)

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

2007-12-03 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548092
 ] 

Michael Bieniosek commented on HADOOP-2341:
---

I am seeing this too.  If I look at datanode:50075/stacks, I see 70 threads 
stuck with: 

Thread 3023977 ([EMAIL PROTECTED]):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(Unknown Source)
java.net.SocketOutputStream.write(Unknown Source)
java.io.BufferedOutputStream.flushBuffer(Unknown Source)
java.io.BufferedOutputStream.write(Unknown Source)
java.io.DataOutputStream.write(Unknown Source)
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:850)
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:801)
java.lang.Thread.run(Unknown Source)

 Datanode active connections never returns to 0
 --

 Key: HADOOP-2341
 URL: https://issues.apache.org/jira/browse/HADOOP-2341
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Paul Saab

 On trunk i continue to see the following in my data node logs:
 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 42
 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 41
 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 40
 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 39
 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 38
 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 37
 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 36
 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 35
 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 34
 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 33
 This number never returns to 0, even after many hours of no new data being 
 manipulated or added into the DFS.
 Looking at netstat -tn i see significant amount of data in the send-q that 
 never goes away:
 tcp0  34240 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:55792   
 ESTABLISHED 
 tcp0  38968 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:38169   
 ESTABLISHED 
 tcp0  38456 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:35456   
 ESTABLISHED 
 tcp0  29640 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:59845   
 ESTABLISHED 
 tcp0  50168 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:44584   
 ESTABLISHED 
 When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
 returning a window size of 0
 16:11:41.760474 IP XX.XX.XX.XXX.50010  YY.YY.YY.YY.44584: . ack 3339984123 
 win 46 nop,nop,timestamp 1786247180 885681789
 16:11:41.761597 IP YY.YY.YY.YY.44584  XX.XX.XX.XXX.50010: . ack 1 win 0 
 nop,nop,timestamp 885801786 1775711351
 Then we look at the stack traces on each datanode, I will have tons of 
 threads that *never* go away in the following trace:
 {code}
 Thread 6516 ([EMAIL PROTECTED]):
   State: RUNNABLE
   Blocked count: 0
   Waited count: 0
   Stack:
 java.net.SocketOutputStream.socketWrite0(Native Method)
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
 java.io.DataOutputStream.write(DataOutputStream.java:90)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
 org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
 java.lang.Thread.run(Thread.java:619)
 {code}
 Unfortunately there's very little in the logs with exceptions that could 
 point to this.  I have some exceptions the following, but nothing that points 
 to problems between XX and YY:
 {code}
 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
 delete block blk_4515246476002110310. Block not found in blockMap. 
 2007-12-02 11:19:47,922 WARN  dfs.DataNode - java.io.IOException: Error in 
 deleting blocks.

[jira] Commented: (HADOOP-2341) Datanode active connections never returns to 0

2007-12-03 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548101
 ] 

Michael Bieniosek commented on HADOOP-2341:
---

In DataNode.BlockSender.sendBlock, we have
{code}
while (endOffset  offset) {
  // Write one data chunk per loop.
  long len = sendChunk();
  offset += len;
  totalRead += len + checksumSize;
}
{code}

In the BlockSender constructor, we have
{code}
if (length = 0) {
  // Make sure endOffset points to end of a checksumed chunk.
  long tmpLen = startOffset + length + (startOffset - offset);
  if (tmpLen % bytesPerChecksum != 0) {
tmpLen += (bytesPerChecksum - tmpLen % bytesPerChecksum);
  }
  if (tmpLen  endOffset) {
endOffset = tmpLen;
  }
}
{code}

So in some cases, endOffset can include extra bytes for checksums in the 
constructor.  However, checksum bytes are never added to offset when it is 
compared to endOffset in the sendBlock method.  I believe this may be the 
problem.

 Datanode active connections never returns to 0
 --

 Key: HADOOP-2341
 URL: https://issues.apache.org/jira/browse/HADOOP-2341
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Paul Saab

 On trunk i continue to see the following in my data node logs:
 2007-12-03 15:46:47,696 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 42
 2007-12-03 15:46:48,135 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 41
 2007-12-03 15:46:48,439 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 40
 2007-12-03 15:46:48,479 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 39
 2007-12-03 15:46:48,611 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 38
 2007-12-03 15:46:48,898 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 37
 2007-12-03 15:46:48,989 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 36
 2007-12-03 15:46:51,010 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 35
 2007-12-03 15:46:51,758 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 34
 2007-12-03 15:46:52,148 DEBUG dfs.DataNode - XX.XX.XX.XXX:50010:Number of 
 active connections is: 33
 This number never returns to 0, even after many hours of no new data being 
 manipulated or added into the DFS.
 Looking at netstat -tn i see significant amount of data in the send-q that 
 never goes away:
 tcp0  34240 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:55792   
 ESTABLISHED 
 tcp0  38968 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:38169   
 ESTABLISHED 
 tcp0  38456 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:35456   
 ESTABLISHED 
 tcp0  29640 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:59845   
 ESTABLISHED 
 tcp0  50168 :::XX.XX.XX.XXX:50010   :::YY.YY.YY.YY:44584   
 ESTABLISHED 
 When sniffing the network I see that the remote side (YY.YY.YY.YY) is 
 returning a window size of 0
 16:11:41.760474 IP XX.XX.XX.XXX.50010  YY.YY.YY.YY.44584: . ack 3339984123 
 win 46 nop,nop,timestamp 1786247180 885681789
 16:11:41.761597 IP YY.YY.YY.YY.44584  XX.XX.XX.XXX.50010: . ack 1 win 0 
 nop,nop,timestamp 885801786 1775711351
 Then we look at the stack traces on each datanode, I will have tons of 
 threads that *never* go away in the following trace:
 {code}
 Thread 6516 ([EMAIL PROTECTED]):
   State: RUNNABLE
   Blocked count: 0
   Waited count: 0
   Stack:
 java.net.SocketOutputStream.socketWrite0(Native Method)
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
 java.io.DataOutputStream.write(DataOutputStream.java:90)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1400)
 org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1433)
 org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:904)
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
 java.lang.Thread.run(Thread.java:619)
 {code}
 Unfortunately there's very little in the logs with exceptions that could 
 point to this.  I have some exceptions the following, but nothing that points 
 to problems between XX and YY:
 {code}
 2007-12-02 11:19:47,889 WARN  dfs.DataNode - Unexpected error trying to 
 delete block blk_4515246476002110310. Block not found in blockMap. 
 2007-12-02 11:19:47,922 WARN  dfs.DataNode -

[jira] Updated: (HADOOP-2297) [Hbase Shell] System.exit() Handling in Jar command

2007-11-30 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2297:
--

Attachment: Capture.java

Hey Edward,

I think I figured out how to suppress System.exit and capture stdout/stderr.  

Here, System.exit throws a SecurityException, which can be caught above.  In 
this case, I catch in Thread.UncaughtExceptionHandler since I am running 
misbehaved threads.  

I also wrote a class that captures stdout/stderr.  Since java only lets me set 
one printStream to capture stdout per jvm, I have to check 
Thread.currentThread, and then decide where to write the captured output.

I am hoping to incorporate some of this code into my custom jetty server that 
submits hadoop jobs.

 [Hbase Shell] System.exit() Handling in Jar command
 ---

 Key: HADOOP-2297
 URL: https://issues.apache.org/jira/browse/HADOOP-2297
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.15.0
Reporter: Edward Yoon
Assignee: Edward Yoon
 Fix For: 0.16.0

 Attachments: 2297_v02.patch, 2297_v03.patch, Capture.java


 I'd like to block the exitVM by System.exit().
 Shell should terminate by quit command.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1905) Addition of unix ls command to FS command

2007-11-28 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546328
 ] 

Michael Bieniosek commented on HADOOP-1905:
---

I agree with Enis.  

I think ftp lets you do
{code}
 ! ls
{code}

And inside gdb you can do
{code}
 shell ls
{code}


 Addition of unix ls command to FS command
 -

 Key: HADOOP-1905
 URL: https://issues.apache.org/jira/browse/HADOOP-1905
 Project: Hadoop
  Issue Type: Sub-task
  Components: contrib/hbase
Affects Versions: 0.14.1
 Environment: All environments
Reporter: Edward Yoon
Priority: Minor
 Attachments: shell_fs.patch, shell_fs_v01.patch


 I think ls command would be a useful one.
 {code}
 Hbase  fs;
 FSFilesystem commands
 Syntax:
 Hadoop FsShell operations
 DFS [-option] arguments...;
 Unix ls command
 LS [-option] arguments...;
 Hbase  dfs;
 Usage: java FsShell
[-ls path]
[-lsr path]
...
 Hbase  ls;
 bin  buildbuild.xmlCHANGES.txt  conf docs
 index.html   lib  LICENSE.txt  NOTICE.txt   output   README.txt
 src  ...
 Hbase  ls -a ./conf;
 .svncommons-logging.properties
 configuration.xsl   hadoop-default.xml
 hadoop-env.sh   hadoop-env.sh.template
 ... ...
 Hbase  ls -l ./build;
 rwd 0   Sep 10, 2007 11:05 AM   ant
 rw- 6662Sep 10, 2007 2:35 PMant-hadoop-0.15.0-dev.jar
 rwd 0   Sep 5, 2007 10:05 AMc++
 rwd 0   Sep 6, 2007 2:15 PM classes
 rwd 0   Sep 5, 2007 10:07 AMcontrib
 rwd 0   Sep 17, 2007 9:17 AMdocs
 ...
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2266) Provide a command line option to check if a Hadoop jobtracker is idle

2007-11-27 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545943
 ] 

Michael Bieniosek commented on HADOOP-2266:
---

It's possible to do this through the JobClient API currently.

 Provide a command line option to check if a Hadoop jobtracker is idle
 -

 Key: HADOOP-2266
 URL: https://issues.apache.org/jira/browse/HADOOP-2266
 Project: Hadoop
  Issue Type: New Feature
  Components: mapred
Reporter: Hemanth Yamijala
 Fix For: 0.16.0


 This is an RFE for providing a way to determine from the hadoop command line 
 whether a jobtracker is idle. One possibility is to have something like 
 hadoop jobtracker -idle time. Hadoop would return true (maybe via some 
 stdout output) if the jobtracker had no work to do (jobs running / prepared) 
 since time seconds, false otherwise.
 This would be useful for management / provisioning systems like 
 Hadoop-On-Demand [HADOOP-1301], which can then deallocate the idle, 
 provisioned clusters automatically, and release resources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2292) HBaseAdmin.disableTable/enableTable aren't synchronous

2007-11-27 Thread Michael Bieniosek (JIRA)

HBaseAdmin.disableTable/enableTable aren't synchronous
--

 Key: HADOOP-2292
 URL: https://issues.apache.org/jira/browse/HADOOP-2292
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
Reporter: Michael Bieniosek


I'm trying to programmatically add a column family to a table.

I have code that looks like:

code
admin.disableTable(table);
try {
  admin.addColumn(table, new HColumnDescriptor(columnName));
} finally {
  admin.enableTable(table);
}

HTable ht = new HTable(config, table);
code

Two things sometimes go wrong here:

1. addColumn fails because the table is not disabled
2. new HTable() fails because the table is not enabled

I suspect that the enableTable/disableTable calls are not synchronous, ie. they 
return before they are finished.  I can work around this problem by inserting 
Thread.sleeps after the enableTable and disableTable calls.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2261) [hbase] Change abort to finalize; does nothing if commit ran successfully

2007-11-27 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546006
 ] 

Michael Bieniosek commented on HADOOP-2261:
---

No, I don't think you understand my complaint.

The problem with the current API is that I have to explicitly catch all the 
exceptions that my code can throw, so I can catch them and call abort().

It is error-prone to list all the exceptions my code could possibly throw and 
attempt to catch and rethrow them.  For example, it is easy to forget 
RuntimeException and Error.  You can't just catch Throwable, because then you 
have to rethrow a Throwable, which changes the method's exception signature.

When I wrote code against hibernate's API, (as far as I remember) I did 
something like 

pre
Transaction transaction = session.beginTransaction();
try {
  ...
  transaction.commit();
} finally {
  transaction.rollback();
}
/pre

This had the nice property that calling rollback() after commit() was a no-op.  
This is what I'd like to see in hbase.  

 [hbase] Change abort to finalize; does nothing if commit ran successfully
 -

 Key: HADOOP-2261
 URL: https://issues.apache.org/jira/browse/HADOOP-2261
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt


 From Michael Bieniosek:
 {code}I'm trying to do an update row, so I write code like:
 long lockid = table.startUpdate(new Text(article.getName())); try {
  for (File articleInfo: article.listFiles(new NonDirectories())) {
   articleTable.put(lockid, columnName(articleInfo.getName()), 
 readFile(articleInfo)); }
  table.commit(lockid);
 } finally {
 table.abort(lockid);
 }
 This doesn't work, because in the normal case it calls abort after commit.  
 But I'm not sure what the code should be, eg.:
 long lockid = table.startUpdate(new Text(article.getName())); try {
  for (File articleInfo: article.listFiles(new NonDirectories())) {
   articleTable.put(lockid, columnName(articleInfo.getName()), 
 readFile(articleInfo)); }
  table.commit(lockid);
 } catch (IOException e) {
 table.abort(lockid);
 throw e;
 } catch (RuntimeException e) {
 table.abort(lockid);
 throw e;
 }
 This gets unwieldy very quickly.  Could you maybe change abort() to 
 finalize() which either aborts or does nothing if a commit was successful?
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2296) hbase shell: phantom columns show up from select command

2007-11-27 Thread Michael Bieniosek (JIRA)

hbase shell: phantom columns show up from select command


 Key: HADOOP-2296
 URL: https://issues.apache.org/jira/browse/HADOOP-2296
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
Reporter: Michael Bieniosek


{code}
Hbase select * from hbase_test where row='2';
+--+--+
| Column   | Cell |
+--+--+
| test:a   | a|
+--+--+
| test:c   | c|
+--+--+
2 row(s) in set (0.00 sec)
Hbase select * from hbase_test where row='1';
+--+--+
| Column   | Cell |
+--+--+
| test:a   | a|
+--+--+
| test:b   | b|
+--+--+
2 row(s) in set (0.00 sec)
Hbase select * from hbase_test;  
+-+-+-+
| Row | Column  | Cell|
+-+-+-+
| 1   | test:a  | a   |
+-+-+-+
| 1   | test:b  | b   |
+-+-+-+
| 2   | test:a  | a   |
+-+-+-+
| 2   | test:b  | b   |
+-+-+-+
| 2   | test:c  | c   |
+-+-+-+
5 row(s) in set (0.14 sec)
{code}

Note the phantom value for test:b in row 2.

I looked at the code, and it looks like SelectCommand.scanPrint incorrectly 
fails to call results.clear() every time it calls scan.next().  

However, I also think that the HScannerInterface.next(HStoreKey key, 
SortedMapText,byte[] results) is confusing, since it requires the user to 
call results.clear() and key.clear() before calling next each time.  Since the 
Iterable interface that provides the zero-arg next has been added, I suggest 
that it might be worthwhile to deprecate the two-arg next.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2261) [hbase] Change abort to finalize; does nothing if commit ran successfully

2007-11-26 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545614
 ] 

Michael Bieniosek commented on HADOOP-2261:
---

The problem with your suggestion is that the enclosing function then needs to 
throw Exception.  


 [hbase] Change abort to finalize; does nothing if commit ran successfully
 -

 Key: HADOOP-2261
 URL: https://issues.apache.org/jira/browse/HADOOP-2261
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman

 From Michael Bieniosek:
 {code}I'm trying to do an update row, so I write code like:
 long lockid = table.startUpdate(new Text(article.getName())); try {
  for (File articleInfo: article.listFiles(new NonDirectories())) {
   articleTable.put(lockid, columnName(articleInfo.getName()), 
 readFile(articleInfo)); }
  table.commit(lockid);
 } finally {
 table.abort(lockid);
 }
 This doesn't work, because in the normal case it calls abort after commit.  
 But I'm not sure what the code should be, eg.:
 long lockid = table.startUpdate(new Text(article.getName())); try {
  for (File articleInfo: article.listFiles(new NonDirectories())) {
   articleTable.put(lockid, columnName(articleInfo.getName()), 
 readFile(articleInfo)); }
  table.commit(lockid);
 } catch (IOException e) {
 table.abort(lockid);
 throw e;
 } catch (RuntimeException e) {
 table.abort(lockid);
 throw e;
 }
 This gets unwieldy very quickly.  Could you maybe change abort() to 
 finalize() which either aborts or does nothing if a commit was successful?
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2262) [hbase] Updating a row on non-existent table runs all the retries and timeouts instead of failing fast

2007-11-26 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545674
 ] 

Michael Bieniosek commented on HADOOP-2262:
---

Yes, but it contacts several machines and takes a couple minutes to throw the 
TableNotFoundException.

 [hbase] Updating a row on non-existent table runs all the retries and 
 timeouts instead of failing fast
 --

 Key: HADOOP-2262
 URL: https://issues.apache.org/jira/browse/HADOOP-2262
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
Priority: Minor

 If you try to access row in non-existent table, the client hangs waiting on 
 all timeouts and retries.  Rather it should be able to fail fast if no such 
 table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x

2007-11-17 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543302
 ] 

Michael Bieniosek commented on HADOOP-1650:
---

Test failed because hudson used the old jetty jar

 Upgrade Jetty to 6.x
 

 Key: HADOOP-1650
 URL: https://issues.apache.org/jira/browse/HADOOP-1650
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: hadoop-1650-jetty6.1.5.patch, 
 hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch


 This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 
 has fixed some of the issues we discovered in jetty during HADOOP-736 and 
 HADOOP-1273. I'd like to keep this issue open for sometime so that we have 
 enough time to test out things.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2218) Generalize StatusHttpServer so webdav server can use it

2007-11-16 Thread Michael Bieniosek (JIRA)

Generalize StatusHttpServer so webdav server can use it
---

 Key: HADOOP-2218
 URL: https://issues.apache.org/jira/browse/HADOOP-2218
 Project: Hadoop
  Issue Type: New Feature
  Components: fs
Affects Versions: 0.15.0
Reporter: Michael Bieniosek


I'd like to make HADOOP-496 stand alone, so that I can make a hadoop-webdav jar 
that works against stock hadoop.  The latest HADOOP-496 patch has only a small 
patch against StatusHttpServer, which generalizes it a little bit to make some 
private methods protected and changes HttpServlet to Servlet -- the rest is new 
files.  I'd like to get the part against StatusHttpServer committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-2218) Generalize StatusHttpServer so webdav server can use it

2007-11-16 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek resolved HADOOP-2218.
---

Resolution: Won't Fix

Actually, now that I look at it, it's easier just to rewrite the webdav code so 
it doesn't use StatusHttpServer and calls the jetty api directly.

I think a better fix would be to rewrite StatusHttpServer so that it uses a 
JettyWrapper superclass, and doesn't contain any mapreduce-specific code.  


 Generalize StatusHttpServer so webdav server can use it
 ---

 Key: HADOOP-2218
 URL: https://issues.apache.org/jira/browse/HADOOP-2218
 Project: Hadoop
  Issue Type: New Feature
  Components: fs
Affects Versions: 0.15.0
Reporter: Michael Bieniosek

 I'd like to make HADOOP-496 stand alone, so that I can make a hadoop-webdav 
 jar that works against stock hadoop.  The latest HADOOP-496 patch has only a 
 small patch against StatusHttpServer, which generalizes it a little bit to 
 make some private methods protected and changes HttpServlet to Servlet -- the 
 rest is new files.  I'd like to get the part against StatusHttpServer 
 committed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2024) Make StatusHttpServer (usefully) subclassable

2007-11-16 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543176
 ] 

Michael Bieniosek commented on HADOOP-2024:
---

See also the StatusHttpServer portion of the webdav_wip2 patch on HADOOP-496 
for more things that can be generalized.  

Also, the constructor should probably not add the TaskGraphServlet.


 Make StatusHttpServer (usefully) subclassable
 -

 Key: HADOOP-2024
 URL: https://issues.apache.org/jira/browse/HADOOP-2024
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: stack
Priority: Minor
 Attachments: statushttpserver.patch


 hbase puts up webapps modelled on those deployed by dfs and mapreduce.  
 Currently it does this by copying the bulk of StatusHttpServer down to hbase 
 util as a new class named InfoServer.  StatusHttpServer is copied rather than 
 subclassed because I need access to the currently-private resource loading.
 As is, understandably, all webapp-related resources are presumed under the 
 first 'webapps' directory found.  It doesn't allow for the new condition 
 where some resources can be found in hadoop and then others in hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store

2007-11-16 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-496:
-

Attachment: hadoop-496-4.patch

I deleted some unused code, I removed some excess logging, removed the usage of
StatusHttpServer so the patch no longer modifies core hadoop code, and I added
a separate start script that can take the namenode as a command-line argument
to the server. I also moved the webdav package to org.apache.hadoop.fs.webdav.

Currently writing to the DFS does not work, but I can browse and copy files out
of the DFS (with Mac OSX webdav mount).

I think this could become a separate (small) contrib project, since hadoop
proper does not rely on it.

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-496-3.patch, hadoop-496-4.patch,
hadoop-496-spool-cleanup.patch, hadoop-webdav.zip, jetty-slide.xml,
lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties,
webdav_wip1.patch, webdav_wip2.patch

WebDAV stands for Distributed Authoring and Versioning. It is a set of
extensions to the HTTP protocol that lets users collaboratively edit and
manage files on a remote web server. It is often considered as a replacement
for NFS or SAMBA
HDFS (Hadoop Distributed File System) needs a friendly file system interface.
DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop
users to use a mountable network drive. A friendly interface to HDFS will be
used both for casual browsing of data and for bulk import/export.
The FUSE provider for HDFS is already available (
http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability
problems. WebDAV is a popular alternative.
The typical licensing terms for WebDAV tools are also attractive:
GPL for Linux client tools that Hadoop would not redistribute anyway.
More importantly, Apache Project/Apache license for Java tools and for server
components.
This allows for a tighter integration with the HDFS code base.
There are some interesting Apache projects that support WebDAV.
But these are probably too heavyweight for the needs of Hadoop:
Tomcat servlet:
http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html
Slide: http://jakarta.apache.org/slide/
Being HTTP-based and backwards-compatible with Web Browser clients, the
WebDAV server protocol could even be piggy-backed on the existing Web UI
ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty)
servlets. This minimizes server code bloat and this avoids additional network
traffic between HDFS and the WebDAV server.
General Clients (read-only):
Any web browser
Linux Clients:
Mountable GPL davfs2 http://dav.sourceforge.net/
FTP-like GPL Cadaver http://www.webdav.org/cadaver/
Server Protocol compliance tests:
http://www.webdav.org/neon/litmus/
A goal is for Hadoop HDFS to pass this test (minus support for Properties)
Pure Java clients:
DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/
WebDAV also makes it convenient to add advanced features in an incremental
fashion:
file locking, access control lists, hard links, symbolic links.
New WebDAV standards get accepted and more or less featured WebDAV clients
exist.
core http://www.webdav.org/specs/rfc2518.html
ACLs http://www.webdav.org/specs/rfc3744.html
redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html
BIND hard links http://www.webdav.org/bind/
quota http://tools.ietf.org/html/rfc4331

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1650) Upgrade Jetty to 6.x

2007-11-16 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1650:
--

Status: Patch Available  (was: Open)

 Upgrade Jetty to 6.x
 

 Key: HADOOP-1650
 URL: https://issues.apache.org/jira/browse/HADOOP-1650
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: hadoop-1650-jetty6.1.5.patch, 
 hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch


 This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 
 has fixed some of the issues we discovered in jetty during HADOOP-736 and 
 HADOOP-1273. I'd like to keep this issue open for sometime so that we have 
 enough time to test out things.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1650) Upgrade Jetty to 6.x

2007-11-16 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1650:
--

Attachment: hadoop-1650-jetty6.1.5.patch

Updated to trunk and jetty-6.1.5.  Did some cleanup based on findbugs/javac 
warnings in previous patch.

 Upgrade Jetty to 6.x
 

 Key: HADOOP-1650
 URL: https://issues.apache.org/jira/browse/HADOOP-1650
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: hadoop-1650-jetty6.1.5.patch, 
 hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch


 This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 
 has fixed some of the issues we discovered in jetty during HADOOP-736 and 
 HADOOP-1273. I'd like to keep this issue open for sometime so that we have 
 enough time to test out things.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x

2007-11-16 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543242
 ] 

Michael Bieniosek commented on HADOOP-1650:
---

I didn't port the hbase code in my latest patch.

 Upgrade Jetty to 6.x
 

 Key: HADOOP-1650
 URL: https://issues.apache.org/jira/browse/HADOOP-1650
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: hadoop-1650-jetty6.1.5.patch, 
 hadoop-jetty6.1.4-lib.tar.gz, jetty6.1.4.patch


 This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 
 has fixed some of the issues we discovered in jetty during HADOOP-736 and 
 HADOOP-1273. I'd like to keep this issue open for sometime so that we have 
 enough time to test out things.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2212) java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open

2007-11-15 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2212:
--

Attachment: hadoop-2212.patch

 java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open
 ---

 Key: HADOOP-2212
 URL: https://issues.apache.org/jira/browse/HADOOP-2212
 Project: Hadoop
  Issue Type: Bug
  Components: fs
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
Priority: Critical
 Attachments: hadoop-2212.patch


 The ChecksumFileSystem uses a default bytesPerChecksum value of zero.  This 
 number appears as a divisor in ChecksumFileSystem.getSumBufferSize, if it is 
 not overriden in config.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2212) java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open

2007-11-15 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2212:
--

Status: Patch Available  (was: Open)

Here is a patch that applies against the 0.15.0 release.

 java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open
 ---

 Key: HADOOP-2212
 URL: https://issues.apache.org/jira/browse/HADOOP-2212
 Project: Hadoop
  Issue Type: Bug
  Components: fs
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
Priority: Critical
 Attachments: hadoop-2212.patch


 The ChecksumFileSystem uses a default bytesPerChecksum value of zero.  This 
 number appears as a divisor in ChecksumFileSystem.getSumBufferSize, if it is 
 not overriden in config.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2212) java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open

2007-11-15 Thread Michael Bieniosek (JIRA)

java.lang.ArithmeticException: / by zero in ChecksumFileSystem.open
---

 Key: HADOOP-2212
 URL: https://issues.apache.org/jira/browse/HADOOP-2212
 Project: Hadoop
  Issue Type: Bug
  Components: fs
Affects Versions: 0.15.0
Reporter: Michael Bieniosek
Priority: Critical
 Attachments: hadoop-2212.patch

The ChecksumFileSystem uses a default bytesPerChecksum value of zero.  This 
number appears as a divisor in ChecksumFileSystem.getSumBufferSize, if it is 
not overriden in config.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store

2007-11-14 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-496:
-

Attachment: hadoop-496-3.patch

I implemented the DFSDavResource.spool method. This allows data to be copied
out (previously a GET on any file returned an empty body). I also ported to
hadoop trunk.

On an unrelated note, I think the webdav sources should be in the
org.apache.hadoop.fs directory, not org.apache.hadoop.dfs, since there is
nothing specific to the dfs about this patch.

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-496-3.patch, hadoop-webdav.zip, jetty-slide.xml,
lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties,
webdav_wip1.patch, webdav_wip2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store

2007-11-14 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-496:
-

Attachment: hadoop-496-spool-cleanup.patch

Here's a cleaner version of the patch.

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-496-3.patch, hadoop-496-spool-cleanup.patch,
hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg,
slideusers.properties, webdav_wip1.patch, webdav_wip2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store

2007-11-07 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540906
]

Michael Bieniosek commented on HADOOP-496:
--

Pete, how are you bridging between fuse and dfs? There is a tgz for
fusej-hadoop floating around somewhere, though it is out of date.

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz,
screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store

2007-11-01 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539463
]

Michael Bieniosek commented on HADOOP-496:
--

I added some extra debugging to DFSDavResource.java, and it looks like the
getHref() function is returning malformed urls:

07/11/01 13:34:56 INFO webdav.DFSDavResource: getHref() for
path:/dirs/to/my/file.tgz -
http://localhost:20015hdfs%3a//dfs.cluster.powerset.com%3a1/dirs/to/my/file.tgz

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz,
slideusers.properties, webdav_wip1.patch, webdav_wip2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist

2007-10-24 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek reopened HADOOP-1825:
---


This patch is wrong.  The first line should be

if [ ! -d $HADOOP_PID_DIR ]; then

Note the $ in front of HADOOP_PID_DIR

 hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
 -

 Key: HADOOP-1825
 URL: https://issues.apache.org/jira/browse/HADOOP-1825
 Project: Hadoop
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Minor
 Fix For: 0.15.0

 Attachments: hadoop-1825.patch


 If I try to bring up a datanode on a fresh machine, it will fail with this 
 error message:
 starting datanode, logging to 
 /b/hadoop/logs/hadoop-me-datanode-example.com.out
 /p/share/hadoop/bin/hadoop-daemon.sh: line 99: 
 /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist

2007-10-24 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1825:
--

Fix Version/s: (was: 0.15.0)
   Status: Patch Available  (was: Reopened)

 hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
 -

 Key: HADOOP-1825
 URL: https://issues.apache.org/jira/browse/HADOOP-1825
 Project: Hadoop
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Minor
 Attachments: hadoop-1825-refix.patch, hadoop-1825.patch


 If I try to bring up a datanode on a fresh machine, it will fail with this 
 error message:
 starting datanode, logging to 
 /b/hadoop/logs/hadoop-me-datanode-example.com.out
 /p/share/hadoop/bin/hadoop-daemon.sh: line 99: 
 /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store

2007-10-22 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536733
]

Michael Bieniosek commented on HADOOP-496:
--

You can use telnet to connect to the server and send handmade http requests.

I tried telnetting to the server and doing a GET /path/to/file.tgz. This gave
me a 200 with an empty body. If I try to GET a file that doesn't exist, I get
a 404 with an html error page.

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz,
slideusers.properties, webdav_wip1.patch, webdav_wip2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-496) Expose HDFS as a WebDAV store

2007-10-21 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536531
]

Michael Bieniosek commented on HADOOP-496:
--

Hey, I tried this patch out, and I noticed a few things:

1. The webdav server is hardcoded to bind to localhost, so I changed it to
bind to 0.0.0.0 instead. I'd prefer if clients didn't all have to run their
own server: if the DNS doesn't match, or the client doesn't want to set up
hadoop and configure it, it's much easier.
2. When I actually tried to copy files out, I get a funny error in the client
(on Mac OSX, it says There is a problem with the file and it cannot be
copied). I wish I could be more helpful, but I don't know how to issue raw
HTTP to the webdav server and there's nothing indicative in the webdav server
log.
3. If I point an ordinary browser (or wget) at the webdav server, I get a 200
with an empty body for files that exist, and a 404 for files that don't exist.
Again, I don't know much about webdav, but it would be nice if you could browse
and download with an ordinary browser, as in subversion.

It was nice to see this almost work, though it's not really usable for me
because of problem 2.

Thanks!

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz,
slideusers.properties, webdav_wip1.patch, webdav_wip2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

2007-10-19 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536344
 ] 

Michael Bieniosek commented on HADOOP-2073:
---

 I am attaching a patch that changes file size after writing the data rather 
 than before.

That's still not perfect, though, since the datanode could die while the file 
is being rewritten, or before the file is resized.

 Datanode corruption if machine dies while writing VERSION file
 --

 Key: HADOOP-2073
 URL: https://issues.apache.org/jira/browse/HADOOP-2073
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Assignee: Raghu Angadi
 Attachments: versionFileSize.patch


 Yesterday, due to a bad mapreduce job, some of my machines went on OOM 
 killing sprees and killed a bunch of datanodes, among other processes.  Since 
 my monitoring software kept trying to bring up the datanodes, only to have 
 the kernel kill them off again, each machine's datanode was probably killed 
 many times.  A large percentage of these datanodes will not come up now, and 
 write this message to the logs:
 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: 
 org.apache.hadoop.dfs.InconsistentFSStateException: Directory 
 /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
 When I check, /hadoop/dfs/data/current/VERSION is an empty file.  
 Consequently, I have to delete all the blocks on the datanode and start over. 
  Since the OOM killing sprees happened simultaneously on several datanodes in 
 my DFS cluster, this could have crippled my dfs cluster.
 I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
 {{{
 /**
  * Write version file.
  * 
  * @throws IOException
  */
 void write() throws IOException {
   corruptPreUpgradeStorage(root);
   write(getVersionFile());
 }
 void write(File to) throws IOException {
   Properties props = new Properties();
   setFields(props, this);
   RandomAccessFile file = new RandomAccessFile(to, rws);
   FileOutputStream out = null;
   try {
 file.setLength(0);
 file.seek(0);
 out = new FileOutputStream(file.getFD());
 props.store(out, null);
   } finally {
 if (out != null) {
   out.close();
 }
 file.close();
   }
 }
 }}}
 So if the datanode dies after file.setLength(0), but before props.store(out, 
 null), the VERSION file will get trashed in the corrupted state I see.  Maybe 
 it would be better if this method created a temporary file VERSION.tmp, and 
 then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION 
 was detected to be corrupt, the datanode could look at VERSION.tmp to recover 
 the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

2007-10-19 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536383
 ] 

Michael Bieniosek commented on HADOOP-2073:
---

 [...] The point is that although this approach does not work for arbitrary 
 file modifications, it works for what we do with the version file.

Konstantin, could you put a comment in your patch explaining this argument?  
Thanks.

 Datanode corruption if machine dies while writing VERSION file
 --

 Key: HADOOP-2073
 URL: https://issues.apache.org/jira/browse/HADOOP-2073
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Assignee: Raghu Angadi
 Attachments: versionFileSize.patch


 Yesterday, due to a bad mapreduce job, some of my machines went on OOM 
 killing sprees and killed a bunch of datanodes, among other processes.  Since 
 my monitoring software kept trying to bring up the datanodes, only to have 
 the kernel kill them off again, each machine's datanode was probably killed 
 many times.  A large percentage of these datanodes will not come up now, and 
 write this message to the logs:
 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: 
 org.apache.hadoop.dfs.InconsistentFSStateException: Directory 
 /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
 When I check, /hadoop/dfs/data/current/VERSION is an empty file.  
 Consequently, I have to delete all the blocks on the datanode and start over. 
  Since the OOM killing sprees happened simultaneously on several datanodes in 
 my DFS cluster, this could have crippled my dfs cluster.
 I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
 {{{
 /**
  * Write version file.
  * 
  * @throws IOException
  */
 void write() throws IOException {
   corruptPreUpgradeStorage(root);
   write(getVersionFile());
 }
 void write(File to) throws IOException {
   Properties props = new Properties();
   setFields(props, this);
   RandomAccessFile file = new RandomAccessFile(to, rws);
   FileOutputStream out = null;
   try {
 file.setLength(0);
 file.seek(0);
 out = new FileOutputStream(file.getFD());
 props.store(out, null);
   } finally {
 if (out != null) {
   out.close();
 }
 file.close();
   }
 }
 }}}
 So if the datanode dies after file.setLength(0), but before props.store(out, 
 null), the VERSION file will get trashed in the corrupted state I see.  Maybe 
 it would be better if this method created a temporary file VERSION.tmp, and 
 then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION 
 was detected to be corrupt, the datanode could look at VERSION.tmp to recover 
 the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

2007-10-18 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535990
 ] 

Michael Bieniosek commented on HADOOP-2073:
---

There's no Exception in my logs: I'm assuming the linux OOM killer sends the 
jvm a SIGKILL (http://lxr.linux.no/source/mm/oom_kill.c#L271), so the jvm 
prints out the shutdown message and exits without giving exception handlers a 
chance to do anything.

There aren't any upgrades involved here.

FWIW, my logs look like (repeated over and over again):

2007-10-17 00:19:07,051 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: 
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = (the hostname)
STARTUP_MSG:   args = []
/
2007-10-17 00:19:08,338 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics 
with processName=DataNode, sessionId=null
2007-10-17 00:19:08,439 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down DataNode at (the hostname)
/

Note that it didn't take long before the process was killed.

 Datanode corruption if machine dies while writing VERSION file
 --

 Key: HADOOP-2073
 URL: https://issues.apache.org/jira/browse/HADOOP-2073
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 Yesterday, due to a bad mapreduce job, some of my machines went on OOM 
 killing sprees and killed a bunch of datanodes, among other processes.  Since 
 my monitoring software kept trying to bring up the datanodes, only to have 
 the kernel kill them off again, each machine's datanode was probably killed 
 many times.  A large percentage of these datanodes will not come up now, and 
 write this message to the logs:
 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: 
 org.apache.hadoop.dfs.InconsistentFSStateException: Directory 
 /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
 When I check, /hadoop/dfs/data/current/VERSION is an empty file.  
 Consequently, I have to delete all the blocks on the datanode and start over. 
  Since the OOM killing sprees happened simultaneously on several datanodes in 
 my DFS cluster, this could have crippled my dfs cluster.
 I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
 {{{
 /**
  * Write version file.
  * 
  * @throws IOException
  */
 void write() throws IOException {
   corruptPreUpgradeStorage(root);
   write(getVersionFile());
 }
 void write(File to) throws IOException {
   Properties props = new Properties();
   setFields(props, this);
   RandomAccessFile file = new RandomAccessFile(to, rws);
   FileOutputStream out = null;
   try {
 file.setLength(0);
 file.seek(0);
 out = new FileOutputStream(file.getFD());
 props.store(out, null);
   } finally {
 if (out != null) {
   out.close();
 }
 file.close();
   }
 }
 }}}
 So if the datanode dies after file.setLength(0), but before props.store(out, 
 null), the VERSION file will get trashed in the corrupted state I see.  Maybe 
 it would be better if this method created a temporary file VERSION.tmp, and 
 then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION 
 was detected to be corrupt, the datanode could look at VERSION.tmp to recover 
 the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

2007-10-17 Thread Michael Bieniosek (JIRA)

Datanode corruption if machine dies while writing VERSION file
--

 Key: HADOOP-2073
 URL: https://issues.apache.org/jira/browse/HADOOP-2073
 Project: Hadoop
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Michael Bieniosek


Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing 
sprees and killed a bunch of datanodes, among other processes.  Since my 
monitoring software kept trying to bring up the datanodes, only to have the 
kernel kill them off again, each machine's datanode was probably killed many 
times.  A large percentage of these datanodes will not come up now, and write 
this message to the logs:

2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: 
org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data 
is in an inconsistent state: file VERSION is invalid.

When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, 
I have to delete all the blocks on the datanode and start over.  Since the OOM 
killing sprees happened simultaneously on several datanodes in my DFS cluster, 
this could have crippled my dfs cluster.

I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
{{{
/**
 * Write version file.
 * 
 * @throws IOException
 */
void write() throws IOException {
  corruptPreUpgradeStorage(root);
  write(getVersionFile());
}

void write(File to) throws IOException {
  Properties props = new Properties();
  setFields(props, this);
  RandomAccessFile file = new RandomAccessFile(to, rws);
  FileOutputStream out = null;
  try {
file.setLength(0);
file.seek(0);
out = new FileOutputStream(file.getFD());
props.store(out, null);
  } finally {
if (out != null) {
  out.close();
}
file.close();
  }
}
}}}

So if the datanode dies after file.setLength(0), but before props.store(out, 
null), the VERSION file will get trashed in the corrupted state I see.  Maybe 
it would be better if this method created a temporary file VERSION.tmp, and 
then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was 
detected to be corrupt, the datanode could look at VERSION.tmp to recover the 
data.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

2007-10-17 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2073:
--

Component/s: dfs

 Datanode corruption if machine dies while writing VERSION file
 --

 Key: HADOOP-2073
 URL: https://issues.apache.org/jira/browse/HADOOP-2073
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 Yesterday, due to a bad mapreduce job, some of my machines went on OOM 
 killing sprees and killed a bunch of datanodes, among other processes.  Since 
 my monitoring software kept trying to bring up the datanodes, only to have 
 the kernel kill them off again, each machine's datanode was probably killed 
 many times.  A large percentage of these datanodes will not come up now, and 
 write this message to the logs:
 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: 
 org.apache.hadoop.dfs.InconsistentFSStateException: Directory 
 /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
 When I check, /hadoop/dfs/data/current/VERSION is an empty file.  
 Consequently, I have to delete all the blocks on the datanode and start over. 
  Since the OOM killing sprees happened simultaneously on several datanodes in 
 my DFS cluster, this could have crippled my dfs cluster.
 I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
 {{{
 /**
  * Write version file.
  * 
  * @throws IOException
  */
 void write() throws IOException {
   corruptPreUpgradeStorage(root);
   write(getVersionFile());
 }
 void write(File to) throws IOException {
   Properties props = new Properties();
   setFields(props, this);
   RandomAccessFile file = new RandomAccessFile(to, rws);
   FileOutputStream out = null;
   try {
 file.setLength(0);
 file.seek(0);
 out = new FileOutputStream(file.getFD());
 props.store(out, null);
   } finally {
 if (out != null) {
   out.close();
 }
 file.close();
   }
 }
 }}}
 So if the datanode dies after file.setLength(0), but before props.store(out, 
 null), the VERSION file will get trashed in the corrupted state I see.  Maybe 
 it would be better if this method created a temporary file VERSION.tmp, and 
 then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION 
 was detected to be corrupt, the datanode could look at VERSION.tmp to recover 
 the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker

2007-10-10 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533868
]

Michael Bieniosek commented on HADOOP-1245:
---

It might be enough to use the value at the jobtracker as a default, and
override with the tasktracker value if it's present (along with the
compatibility note).

If a tasktracker is configured differently than the jobtracker, it's more
likely the configurer intended the tasktracker's value to be used, as opposed
to the configurer expecting the tasktracker value to be ignored. So I doubt
using the jobtracker as an overridable default will break anybody.

value for mapred.tasktracker.tasks.maximum taken from jobtracker, not
tasktracker
-

Key: HADOOP-1245
URL: https://issues.apache.org/jira/browse/HADOOP-1245
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.3
Reporter: Michael Bieniosek
Assignee: Michael Bieniosek
Attachments: tasktracker-max-tasks-1245.patch

I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
When a new job starts up, the jobtracker uses its (single) value for
mapred.tasktracker.tasks.maximum to assign tasks. This means that each
tasktracker gets the same number of tasks, regardless of how I configured
that particular machine.
The jobtracker should not consult its config for the value of
mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
tasktrackers to request tasks) according to each tasktracker's value of
mapred.tasktracker.tasks.maximum.
Originally, I thought the behavior was slightly different, so this issue
contained this text:
After the first task finishes on each tasktracker, the tasktracker will
request new tasks from the jobtracker according to the tasktracker's value
for mapred.tasktracker.tasks.maximum. So after the first round of map tasks
is done, the cluster reverts to a mode that works well for heterogeneous
clusters.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker

2007-10-10 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533881
]

Michael Bieniosek commented on HADOOP-1245:
---

Rick, I forgot about that. So my suggestion would be rather useless.

value for mapred.tasktracker.tasks.maximum taken from jobtracker, not
tasktracker
-

Attachments: tasktracker-max-tasks-1245.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from jobtracker, not tasktracker

2007-10-09 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-1245:
--

Description:
I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.

When a new job starts up, the jobtracker uses its (single) value for
mapred.tasktracker.tasks.maximum to assign tasks. This means that each
tasktracker gets the same number of tasks, regardless of how I configured that
particular machine.

The jobtracker should not consult its config for the value of
mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
tasktrackers to request tasks) according to each tasktracker's value of
mapred.tasktracker.tasks.maximum.

Originally, I thought the behavior was slightly different, so this issue
contained this text:
After the first task finishes on each tasktracker, the tasktracker will request
new tasks from the jobtracker according to the tasktracker's value for
mapred.tasktracker.tasks.maximum. So after the first round of map tasks is
done, the cluster reverts to a mode that works well for heterogeneous clusters.

was:
I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.

However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
the jobtracker and the tasktracker.

After the first task finishes on each tasktracker, the tasktracker will request
new tasks from the jobtracker according to the tasktracker's value for
mapred.tasktracker.tasks.maximum. So after the first round of map tasks is
done, the cluster reverts to a mode that works well for heterogeneous clusters.

Summary: value for mapred.tasktracker.tasks.maximum taken from
jobtracker, not tasktracker (was: value for mapred.tasktracker.tasks.maximum
taken from two different sources)

Fixing issue description to reflect reality as reported by others

value for mapred.tasktracker.tasks.maximum taken from jobtracker, not
tasktracker
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from two different sources

2007-10-08 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533164
]

Michael Bieniosek commented on HADOOP-1245:
---

Patch looks reasonable. +1

value for mapred.tasktracker.tasks.maximum taken from two different sources
---

I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
the jobtracker and the tasktracker.
When a new job starts up, the jobtracker uses its (single) value for
mapred.tasktracker.tasks.maximum to assign tasks. This means that each
tasktracker gets the same number of tasks, regardless of how I configured
that particular machine.
After the first task finishes on each tasktracker, the tasktracker will
request new tasks from the jobtracker according to the tasktracker's value
for mapred.tasktracker.tasks.maximum. So after the first round of map tasks
is done, the cluster reverts to a mode that works well for heterogeneous
clusters.
The jobtracker should not consult its config for the value of
mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
tasktrackers to request tasks) according to each tasktracker's value of
mapred.tasktracker.tasks.maximum.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from two different sources

2007-10-08 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-1245:
--

Status: Patch Available (was: Open)

See what hudson thinks...

value for mapred.tasktracker.tasks.maximum taken from two different sources
---

I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
the jobtracker and the tasktracker.
When a new job starts up, the jobtracker uses its (single) value for
mapred.tasktracker.tasks.maximum to assign tasks. This means that each
tasktracker gets the same number of tasks, regardless of how I configured
that particular machine.
After the first task finishes on each tasktracker, the tasktracker will
request new tasks from the jobtracker according to the tasktracker's value
for mapred.tasktracker.tasks.maximum. So after the first round of map tasks
is done, the cluster reverts to a mode that works well for heterogeneous
clusters.
The jobtracker should not consult its config for the value of
mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
tasktrackers to request tasks) according to each tasktracker's value of
mapred.tasktracker.tasks.maximum.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2001) Deadlock in jobtracker

2007-10-06 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532930
 ] 

Michael Bieniosek commented on HADOOP-2001:
---

That seems wrong (though, like I say, I don't really know this code).  If you 
have to hold the jobtracker lock in order to hold the per-job lock, then why 
bother acquiring the per-job lock at all?

 Deadlock in jobtracker
 --

 Key: HADOOP-2001
 URL: https://issues.apache.org/jira/browse/HADOOP-2001
 Project: Hadoop
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Assignee: Devaraj Das
Priority: Blocker
 Fix For: 0.15.0

 Attachments: 2001.patch, 2001.patch


 My jobtracker deadlocked; the output from kill -QUIT is:
 Found one Java-level deadlock:
 =
 IPC Server handler 2 on 10001:
   waiting to lock monitor 0x0813724c (object 0xd5175488, a 
 org.apache.hadoop.mapred.JobInProgress),
   which is held by SocketListener0-1
 SocketListener0-1:
   waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a 
 org.apache.hadoop.mapred.JobTracker),
   which is held by IPC Server handler 2 on 10001
 Java stack information for the threads listed above:
 ===
 IPC Server handler 2 on 10001:
 at 
 org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367)
 - waiting to lock 0xd5175488 (a 
 org.apache.hadoop.mapred.JobInProgress)
 at 
 org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719)
 at 
 org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240)
 - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116)
 - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
 SocketListener0-1:
 at 
 org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907)
 - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at 
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059)
 - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
 at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891)
 - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
 at 
 org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
 Found 1 deadlock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2001) Deadlock in jobtracker

2007-10-05 Thread Michael Bieniosek (JIRA)

Deadlock in jobtracker
--

 Key: HADOOP-2001
 URL: https://issues.apache.org/jira/browse/HADOOP-2001
 Project: Hadoop
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Critical


My jobtracker deadlocked; the output from kill -QUIT is:

Found one Java-level deadlock:
=
IPC Server handler 2 on 10001:
  waiting to lock monitor 0x0813724c (object 0xd5175488, a 
org.apache.hadoop.mapred.JobInProgress),
  which is held by SocketListener0-1
SocketListener0-1:
  waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a 
org.apache.hadoop.mapred.JobTracker),
  which is held by IPC Server handler 2 on 10001

Java stack information for the threads listed above:
===
IPC Server handler 2 on 10001:
at 
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367)
- waiting to lock 0xd5175488 (a 
org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719)
at 
org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240)
- locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116)
- locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
SocketListener0-1:
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907)
- waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
at 
org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059)
- locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891)
- locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

Found 1 deadlock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2001) Deadlock in jobtracker

2007-10-05 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532798
 ] 

Michael Bieniosek commented on HADOOP-2001:
---

I could submit a quick fix patch that unmarks JobTracker.finalizeJob 
synchronized, but I don't really know if that would break other things, or if 
it could miss other deadlock paths.

Anybody else know more about this code?


 Deadlock in jobtracker
 --

 Key: HADOOP-2001
 URL: https://issues.apache.org/jira/browse/HADOOP-2001
 Project: Hadoop
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Critical

 My jobtracker deadlocked; the output from kill -QUIT is:
 Found one Java-level deadlock:
 =
 IPC Server handler 2 on 10001:
   waiting to lock monitor 0x0813724c (object 0xd5175488, a 
 org.apache.hadoop.mapred.JobInProgress),
   which is held by SocketListener0-1
 SocketListener0-1:
   waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a 
 org.apache.hadoop.mapred.JobTracker),
   which is held by IPC Server handler 2 on 10001
 Java stack information for the threads listed above:
 ===
 IPC Server handler 2 on 10001:
 at 
 org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367)
 - waiting to lock 0xd5175488 (a 
 org.apache.hadoop.mapred.JobInProgress)
 at 
 org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719)
 at 
 org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240)
 - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116)
 - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
 SocketListener0-1:
 at 
 org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907)
 - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at 
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059)
 - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
 at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891)
 - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
 at 
 org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
 Found 1 deadlock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2001) Deadlock in jobtracker

2007-10-05 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2001:
--

Fix Version/s: 0.15.0

It would be nice to get this fixed for 0.15.0, since it does deadlock the 
jobtracker.  I can submit the patch I describe if someone thinks that's a good 
idea.

 Deadlock in jobtracker
 --

 Key: HADOOP-2001
 URL: https://issues.apache.org/jira/browse/HADOOP-2001
 Project: Hadoop
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Critical
 Fix For: 0.15.0


 My jobtracker deadlocked; the output from kill -QUIT is:
 Found one Java-level deadlock:
 =
 IPC Server handler 2 on 10001:
   waiting to lock monitor 0x0813724c (object 0xd5175488, a 
 org.apache.hadoop.mapred.JobInProgress),
   which is held by SocketListener0-1
 SocketListener0-1:
   waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a 
 org.apache.hadoop.mapred.JobTracker),
   which is held by IPC Server handler 2 on 10001
 Java stack information for the threads listed above:
 ===
 IPC Server handler 2 on 10001:
 at 
 org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367)
 - waiting to lock 0xd5175488 (a 
 org.apache.hadoop.mapred.JobInProgress)
 at 
 org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719)
 at 
 org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240)
 - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116)
 - locked 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
 SocketListener0-1:
 at 
 org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907)
 - waiting to lock 0xd24d9c50 (a org.apache.hadoop.mapred.JobTracker)
 at 
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059)
 - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
 at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891)
 - locked 0xd5175488 (a org.apache.hadoop.mapred.JobInProgress)
 at 
 org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
 Found 1 deadlock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-1319) NPE in TaskLog.getTaskLogDir, as called from tasklog.jsp

2007-09-25 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek resolved HADOOP-1319.
---

Resolution: Invalid

I believe this code has been rewritten, so this bug is no longer valid.

 NPE in TaskLog.getTaskLogDir, as called from tasklog.jsp
 

 Key: HADOOP-1319
 URL: https://issues.apache.org/jira/browse/HADOOP-1319
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.12.3
Reporter: Michael Bieniosek
Priority: Minor

 Calling TaskCompletionEvent.getTaskTrackerHttp() gives me an url that looks 
 like 
 http://tasktracker.host:50060/tasklog.jsp?plaintext=truetaskid=task_0264_m_83_0all=true.
   If I try to access that URL, I get a Jetty 500 error.  In my tasktracker 
 logs, I then see:
 2007-05-02 21:32:16,107 WARN /: 
 /tasklog.jsp?taskid=task_0261_m_00_0all=true
 plaintext=true: 
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.TaskLog.getTaskLogDir(TaskLog.java:49)
 at org.apache.hadoop.mapred.TaskLog.access$000(TaskLog.java:33)
 at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:313)
 at 
 org.apache.hadoop.mapred.tasklog_jsp.printTaskLog(tasklog_jsp.java:26)
 at 
 org.apache.hadoop.mapred.tasklog_jsp._jspService(tasklog_jsp.java:232)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplication
 Handler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567
 )
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCo
 ntext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:24
 4)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist

2007-08-31 Thread Michael Bieniosek (JIRA)

hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
-

 Key: HADOOP-1825
 URL: https://issues.apache.org/jira/browse/HADOOP-1825
 Project: Hadoop
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Minor


If I try to bring up a datanode on a fresh machine, it will fail with this 
error message:

starting datanode, logging to /b/hadoop/logs/hadoop-me-datanode-example.com.out
/p/share/hadoop/bin/hadoop-daemon.sh: line 99: 
/b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist

2007-08-31 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1825:
--

Status: Patch Available  (was: Open)

Here's a patch to automatically creates the pid directory if it doesn't exist

 hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
 -

 Key: HADOOP-1825
 URL: https://issues.apache.org/jira/browse/HADOOP-1825
 Project: Hadoop
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Minor
 Attachments: hadoop-1825.patch


 If I try to bring up a datanode on a fresh machine, it will fail with this 
 error message:
 starting datanode, logging to 
 /b/hadoop/logs/hadoop-me-datanode-example.com.out
 /p/share/hadoop/bin/hadoop-daemon.sh: line 99: 
 /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1825) hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist

2007-08-31 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1825:
--

Attachment: hadoop-1825.patch

 hadoop-daemon.sh script fails if HADOOP_PID_DIR doesn't exist
 -

 Key: HADOOP-1825
 URL: https://issues.apache.org/jira/browse/HADOOP-1825
 Project: Hadoop
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.14.0
Reporter: Michael Bieniosek
Priority: Minor
 Attachments: hadoop-1825.patch


 If I try to bring up a datanode on a fresh machine, it will fail with this 
 error message:
 starting datanode, logging to 
 /b/hadoop/logs/hadoop-me-datanode-example.com.out
 /p/share/hadoop/bin/hadoop-daemon.sh: line 99: 
 /b/hadoop/pid/hadoop-me-datanode.pid: No such file or directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1781) Need more complete API of JobClient class

2007-08-24 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522657
 ] 

Michael Bieniosek commented on HADOOP-1781:
---

Most of these are available currently.  You need to call client.getJob(jobid) 
to get a RunningJob, and then the RunningJob object has your desired APIs 
2,3,4, and 5.  This works for completed jobs as well as running jobs.


 Need more complete API of JobClient class
 -

 Key: HADOOP-1781
 URL: https://issues.apache.org/jira/browse/HADOOP-1781
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Runping Qi

 We need a programmatic way to find out the information about a map/reduce 
 cluster and the jobs on the cluster.
 The current API is not complete.
 In particular, the following API functions are needed:
 1. jobs()  currently, there is an API function JobsToComplete, which returns 
 running/waiting jobs only.  jobs() should return the complete list.
 2. TaskReport[] getMap/ReduceTaskReports(String jobid)
 3. getStartTime()
 4. getJobStatus(String jobid);
 5. getJobProfile(String jobid);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-1770) Illegal state exception in printTaskLog - sendError

2007-08-23 Thread Michael Bieniosek (JIRA)

Illegal state exception in printTaskLog - sendError


 Key: HADOOP-1770
 URL: https://issues.apache.org/jira/browse/HADOOP-1770
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.14.0
Reporter: Michael Bieniosek


This error shows up in my logs:

2007-08-23 16:40:08,028 WARN /: 
/tasklog?taskid=task_200708212126_0043_m_000100_0all=true: 
java.lang.IllegalStateException: Committed
at 
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
at 
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
at 
org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:61)
at 
org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:125)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1745) userlogs not showing up for new jobs

2007-08-21 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1745:
--

Component/s: mapred

 userlogs not showing up for new jobs
 

 Key: HADOOP-1745
 URL: https://issues.apache.org/jira/browse/HADOOP-1745
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.14.0
Reporter: Michael Bieniosek

 When I start a new hadoop job, the logs do not show up for a while.  If I 
 check on the filesystem, the file userlogs/$task/stdout is a regular file 
 with size 0.
 This was supposed to be fixed in 0.14 by HADOOP-1524.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1245) value for mapred.tasktracker.tasks.maximum taken from two different sources

2007-08-17 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Bieniosek updated HADOOP-1245:
--

However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
the jobtracker and the tasktracker.

was:
When I start a job, hadoop uses mapred.tasktracker.tasks.maximum on the
jobtracker. Once these tasks finish, it is the tasktracker's value of
mapred.tasktracker.tasks.maximum that decides how many new tasks are created
for each host.

This would probably be fixed if HADOOP-785 were implemented.

value for mapred.tasktracker.tasks.maximum taken from two different sources
---

Key: HADOOP-1245
URL: https://issues.apache.org/jira/browse/HADOOP-1245
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.3
Reporter: Michael Bieniosek

I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
the jobtracker and the tasktracker.
When a new job starts up, the jobtracker uses its (single) value for
mapred.tasktracker.tasks.maximum to assign tasks. This means that each
tasktracker gets the same number of tasks, regardless of how I configured
that particular machine.
After the first task finishes on each tasktracker, the tasktracker will
request new tasks from the jobtracker according to the tasktracker's value
for mapred.tasktracker.tasks.maximum. So after the first round of map tasks
is done, the cluster reverts to a mode that works well for heterogeneous
clusters.
The jobtracker should not consult its config for the value of
mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
tasktrackers to request tasks) according to each tasktracker's value of
mapred.tasktracker.tasks.maximum.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-416) Web UI JSP: need to HTML-Escape log file contents

2007-08-09 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_1251
 ] 

Michael Bieniosek commented on HADOOP-416:
--

I've noticed that occasionally snippets of web pages make it to the log pages.  
This could potentially be a security problem, so we should fix this.  I don't 
think pre is a great solution, since there could be a /pre in the text.

It's probably better to escape , or set the content-type to text/plain.




 Web UI JSP: need to HTML-Escape log file contents
 -

 Key: HADOOP-416
 URL: https://issues.apache.org/jira/browse/HADOOP-416
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Reporter: Michel Tourn
Assignee: Owen O'Malley

 Web UI JSP: need to HTML-Escape log (file) contents
 Displaying the task's error log or the mapred.Reporter status String:
 the content should 
 have all  and  converted to lt; and gt;, 
 or use pre tag. 
 Otherwise, ant HTML/XML tags within will not be displayed correctly
 This problem occurs for ex. when using hadoopStreaming and 
 a MapRed record is a chunk of HTML/XML content (and a task fails)
 ex. problematic view:
 http://jobtracker:50030/taskdetails.jsp?jobid=job_0009taskid=tip_0009_m_00
 Other jsp pages may also need a change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-785) Divide the server and client configurations

2007-08-02 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517343
 ] 

Michael Bieniosek commented on HADOOP-785:
--

Arun,

Your proposal sounds reasonable.  Thanks for looking at this issue.

Currently, hadoop-default.xml is not supposed to be changed by users.  Would 
you relax this convention in your proposal?  There might be a few variables 
that I'd like to set for client and server at the same time (eg. namenode 
address).

Why don't you want to split up namenode vs. jobtracker and datanode vs. 
tasktracker?  I understand that it's desirable to keep things simple, but dfs 
and mapreduce don't interact very much in terms of their configs, so there is a 
natural separation.

Instead of dividing configs into beginner and advanced, we should think 
about dividing into things you probably need to change (at the top of the 
file) and things you probably don't need to change (at the bottom of the 
file).  This division could be done with xml comments -- I don't think it needs 
to be so formal as to need a new field.  



 Divide the server and client configurations
 ---

 Key: HADOOP-785
 URL: https://issues.apache.org/jira/browse/HADOOP-785
 Project: Hadoop
  Issue Type: Improvement
  Components: conf
Affects Versions: 0.9.0
Reporter: Owen O'Malley
Assignee: Arun C Murthy
 Fix For: 0.15.0


 The configuration system is easy to misconfigure and I think we need to 
 strongly divide the server from client configs. 
 An example of the problem was a configuration where the task tracker has a 
 hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
 had the right number of reduces, but the map task thought there was a single 
 reduce. This lead to a hard to find diagnose failure.
 Therefore, I propose separating out the configuration types as:
 class Configuration;
 // reads site-default.xml, hadoop-default.xml
 class ServerConf extends Configuration;
 // reads hadoop-server.xml, $super
 class DfsServerConf extends ServerConf;
 // reads dfs-server.xml, $super
 class MapRedServerConf extends ServerConf;
 // reads mapred-server.xml, $super
 class ClientConf extends Configuration;
 // reads hadoop-client.xml, $super
 class JobConf extends ClientConf;
 // reads job.xml, $super
 Note in particular, that nothing corresponds to hadoop-site.xml, which 
 overrides both client and server configs. Furthermore, the properties from 
 the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1638) Master node unable to bind to DNS hostname

2007-07-20 Thread Michael Bieniosek (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514374
 ] 

Michael Bieniosek commented on HADOOP-1638:
---

I abandoned HADOOP-1202 because people didn't seem to see any value in it,
and I changed the way I use hadoop on ec2 around the same time.  You're
welcome to pick it up and port the patch to trunk; it shouldn't be too much
work.

-Michael





 Master node unable to bind to DNS hostname
 --

 Key: HADOOP-1638
 URL: https://issues.apache.org/jira/browse/HADOOP-1638
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/ec2
Affects Versions: 0.13.0, 0.13.1, 0.14.0, 0.15.0
Reporter: Stu Hood
Priority: Minor
 Fix For: 0.13.1, 0.14.0, 0.15.0

 Attachments: hadoop-1638.patch


 With a release package of Hadoop 0.13.0 or with latest SVN, the Hadoop 
 contrib/ec2 scripts fail to start Hadoop correctly. After working around 
 issues HADOOP-1634 and HADOOP-1635, and setting up a DynDNS address pointing 
 to the master's IP, the ec2/bin/start-hadoop script completes.
 But the cluster is unusable because the namenode and tasktracker have not 
 started successfully. Looking at the namenode log on the master reveals the 
 following error:
 {quote}
 2007-07-19 16:54:53,156 ERROR org.apache.hadoop.dfs.NameNode: 
 java.net.BindException: Cannot assign requested address
 at sun.nio.ch.Net.bind(Native Method)
 at 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
 at org.apache.hadoop.ipc.Server$Listener.init(Server.java:186)
 at org.apache.hadoop.ipc.Server.init(Server.java:631)
 at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:325)
 at org.apache.hadoop.ipc.RPC.getServer(RPC.java:295)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:211)
 at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:803)
 at org.apache.hadoop.dfs.NameNode.main(NameNode.java:811)
 {quote}
 The master node refuses to bind to the DynDNS hostname in the generated 
 hadoop-site.xml. Here is the relevant part of the generated file:
 {quote}
 property
   namefs.default.name/name
   valueblah-ec2.gotdns.org:50001/value
 /property
 property
   namemapred.job.tracker/name
   valueblah-ec2.gotdns.org:50002/value
 /property
 {quote}
 I'll attach a patch against hadoop-trunk that fixes the issue for me, but I'm 
 not sure if this issue is something that someone can fix more thoroughly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY

2007-07-19 Thread Michael Bieniosek (JIRA)

constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
--

 Key: HADOOP-1636
 URL: https://issues.apache.org/jira/browse/HADOOP-1636
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.13.0
Reporter: Michael Bieniosek


In JobTracker.java:   static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100;

This should be configurable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY

2007-07-19 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1636:
--

Status: Patch Available  (was: Open)

 constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
 --

 Key: HADOOP-1636
 URL: https://issues.apache.org/jira/browse/HADOOP-1636
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.13.0
Reporter: Michael Bieniosek

 In JobTracker.java:   static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100;
 This should be configurable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY

2007-07-19 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1636:
--

Attachment: configure-max-completed-jobs.patch

This patch creates a new configurable variable 
mapred.jobtracker.completeuserjobs.maximum, which defaults to 100 (the current 
hard-coded value).  

When this many jobs are completed (failed or succeeded), hadoop deletes 
finished jobs from memory, making them accessible only through the 
information-poor jobhistory page.   This limit is supposedly per user, but I 
submit all jobs as the same user.

I have tested this patch, and it seems to work.


 constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
 --

 Key: HADOOP-1636
 URL: https://issues.apache.org/jira/browse/HADOOP-1636
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Affects Versions: 0.13.0
Reporter: Michael Bieniosek
 Attachments: configure-max-completed-jobs.patch


 In JobTracker.java:   static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100;
 This should be configurable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-1636) constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY

2007-07-19 Thread Michael Bieniosek (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514076
]

Michael Bieniosek edited comment on HADOOP-1636 at 7/19/07 7:02 PM:

This patch creates a new configurable variable
mapred.jobtracker.completeuserjobs.maximum, which defaults to 100 (the current
hard-coded value).

When this many jobs are completed (failed or succeeded), hadoop deletes
finished jobs from memory, making them accessible only through the
information-poor jobhistory page. This limit is supposedly per user, but I
submit all jobs as the same user. (this is the current behavior, which is
unchanged by my patch)

I have tested this patch, and it seems to work.

was:
This patch creates a new configurable variable
mapred.jobtracker.completeuserjobs.maximum, which defaults to 100 (the current
hard-coded value).

I have tested this patch, and it seems to work.

constant should be user-configurable: MAX_COMPLETE_USER_JOBS_IN_MEMORY
--

Key: HADOOP-1636
URL: https://issues.apache.org/jira/browse/HADOOP-1636
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.13.0
Reporter: Michael Bieniosek
Attachments: configure-max-completed-jobs.patch

In JobTracker.java: static final int MAX_COMPLETE_USER_JOBS_IN_MEMORY = 100;
This should be configurable.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 >

1 - 100 of 179 matches

Mail list logo