[jira] Created: (HADOOP-2580) Ability to apply user specified filters on the task logs

2008-01-11 Thread Amar Kamat (JIRA)
Ability to apply user specified filters on the task logs


 Key: HADOOP-2580
 URL: https://issues.apache.org/jira/browse/HADOOP-2580
 Project: Hadoop
  Issue Type: Improvement
Reporter: Amar Kamat
Priority: Minor


It would be great if the user can specify some filters on the task logs for 
example _grep 'Thread'_ to view log messages and timings on the _Thread_ 
related messages in the task logs. It would be of great use in case of 
debugging/analysis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2178) Job history on HDFS

2008-01-11 Thread Amareshwari Sri Ramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558010#action_12558010
 ] 

Amareshwari Sri Ramadasu commented on HADOOP-2178:
--

To address use cases 1 and 2 suggested by Eric,  I propose the following 
approach.

If the job tracker is static, we will store history logs in a location 
specified by hadoop.job.history.location, by default it is local file system.
If the job tracker is not static (like HOD JT) we will store log files in user 
specified location, by default it is job output directory.

We will not have index file any more, because appending becomes an issue in 
DFS. And we dont need one in case of non-static JT.
For static JT, We can do listing of files in the log directory to show the 
first page.

 Job history on HDFS
 ---

 Key: HADOOP-2178
 URL: https://issues.apache.org/jira/browse/HADOOP-2178
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Amareshwari Sri Ramadasu
Assignee: Amareshwari Sri Ramadasu
 Fix For: 0.16.0


 This issue addresses the following items :
 1.  Check for accuracy of job tracker history logs.
 2.  After completion of the job, copy the JobHistory.log(Master index file) 
 and the job history files to the DFS.
 3. User can load the history with commands
 bin/hadoop job -history directory 
 or
 bin/hadoop job -history jobid
 This will start a stand-alone jetty and load jsps

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart

2008-01-11 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2311:
--

Priority: Trivial  (was: Critical)

Dropping priority since this bug has not re-occurred.

 [hbase] Could not complete hdfs write out to flush file forcing regionserver 
 restart
 

 Key: HADOOP-2311
 URL: https://issues.apache.org/jira/browse/HADOOP-2311
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Trivial
 Attachments: delete-logging.patch


 I've spent some time looking into this issue but there are not enough clues 
 in the logs to tell where the problem is. Here's what I know.
 Two region servers went down last night, a minute apart, during Paul Saab's 
 6hr run inserting 300million rows into hbase. The regionservers went down to 
 force rerun of hlog and avoid possible data loss after a failure writing 
 memory flushes to hdfs.
 Here is the lead up to the failed flush:
 ...
 2007-11-28 22:40:02,231 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: 
 img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: 
 {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
 max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
 compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
 none}}}
 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no 
 reconstruction log)
 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/cookie is 29077708
 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no 
 reconstruction log)
 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/ip is 29077708
 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegion - region 
 postlog,img149/4699/133lm0.jpg,1196318393738 available
 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 
 21357623 to 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, 
 startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, 
 families: {cookie:={name: cookie, max versions: 1, compression: NONE, in 
 memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, 
 max versions: 1, compression: NONE, in memory: false, max length: 2147483647, 
 bloom filter: none}}}
 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no 
 reconstruction log)
 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/cookie is 29077708
 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no 
 reconstruction log)
 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/ip is 29077708
 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709
 2007-11-28 22:40:04,701 INFO  hbase.HRegion - region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available
 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region 
 postlog,img143/1310/yashrk3.jpg,1196317258704
 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined 
 that there was nothing to do
 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703
 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() 
 determined that there was nothing to do
 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region 
 postlog,img149/4699/133lm0.jpg,1196318393738
 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k
 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 
 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 
 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k
 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, 
 sequenceid=29081563
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore 
 postlog,img149/4699/133lm0.jpg,1196318393738/ip needed.
 

[jira] Commented: (HADOOP-2178) Job history on HDFS

2008-01-11 Thread Runping Qi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558029#action_12558029
 ] 

Runping Qi commented on HADOOP-2178:



Even with hod JT, we still need to address case 3. That is,
we need a central place to store the job history files so that they can be 
analyzed offline.
It can be the same place in the local file system as it is now, or some common 
directory in DFS.
This is in addition to the one in the output directory.



 Job history on HDFS
 ---

 Key: HADOOP-2178
 URL: https://issues.apache.org/jira/browse/HADOOP-2178
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Amareshwari Sri Ramadasu
Assignee: Amareshwari Sri Ramadasu
 Fix For: 0.16.0


 This issue addresses the following items :
 1.  Check for accuracy of job tracker history logs.
 2.  After completion of the job, copy the JobHistory.log(Master index file) 
 and the job history files to the DFS.
 3. User can load the history with commands
 bin/hadoop job -history directory 
 or
 bin/hadoop job -history jobid
 This will start a stand-alone jetty and load jsps

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart

2008-01-11 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558025#action_12558025
 ] 

Jim Kellerman commented on HADOOP-2311:
---

Have we seen any more occurrences of this problem? 

If not should we close this issue as not reproducable and open a new one if it 
should happen again?

 [hbase] Could not complete hdfs write out to flush file forcing regionserver 
 restart
 

 Key: HADOOP-2311
 URL: https://issues.apache.org/jira/browse/HADOOP-2311
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Critical
 Attachments: delete-logging.patch


 I've spent some time looking into this issue but there are not enough clues 
 in the logs to tell where the problem is. Here's what I know.
 Two region servers went down last night, a minute apart, during Paul Saab's 
 6hr run inserting 300million rows into hbase. The regionservers went down to 
 force rerun of hlog and avoid possible data loss after a failure writing 
 memory flushes to hdfs.
 Here is the lead up to the failed flush:
 ...
 2007-11-28 22:40:02,231 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: 
 img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: 
 {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
 max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
 compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
 none}}}
 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no 
 reconstruction log)
 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/cookie is 29077708
 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no 
 reconstruction log)
 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/ip is 29077708
 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegion - region 
 postlog,img149/4699/133lm0.jpg,1196318393738 available
 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 
 21357623 to 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, 
 startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, 
 families: {cookie:={name: cookie, max versions: 1, compression: NONE, in 
 memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, 
 max versions: 1, compression: NONE, in memory: false, max length: 2147483647, 
 bloom filter: none}}}
 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no 
 reconstruction log)
 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/cookie is 29077708
 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no 
 reconstruction log)
 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/ip is 29077708
 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709
 2007-11-28 22:40:04,701 INFO  hbase.HRegion - region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available
 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region 
 postlog,img143/1310/yashrk3.jpg,1196317258704
 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined 
 that there was nothing to do
 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703
 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() 
 determined that there was nothing to do
 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region 
 postlog,img149/4699/133lm0.jpg,1196318393738
 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k
 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 
 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 
 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k
 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, 
 sequenceid=29081563
 2007-11-28 

[jira] Created: (HADOOP-2581) Counters and other useful stats should be logged into Job History log

2008-01-11 Thread Runping Qi (JIRA)
Counters and other useful stats should be logged into Job History log
-

 Key: HADOOP-2581
 URL: https://issues.apache.org/jira/browse/HADOOP-2581
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Runping Qi



The following stats are useful and  available to JT but not logged job history 
log:

1. The counters of each job
2. The counters of each mapper/reducer attempt
3. The info about the input splits (filename, split size, on which nodes)
3. The input split for each mapper attempt

Those data is useful and important for mining to find out performance related 
problems.







-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2394) Add supprt for migrating between hbase versions

2008-01-11 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558037#action_12558037
 ] 

Jim Kellerman commented on HADOOP-2394:
---

stack wrote:
 I ain't too invested in our supporting reverse migrations but its worth 
 noting that any migration system worth its salt -
 systems I've worked on in the past and ruby on rails - go both ways if only 
 to facilitate testing of the forward migration
 (inevitably there's a bug when you try to migrate real data).

That's what backups are for :)

More importantly though, HADOOP-2478 incorporates a migration tool. The 
specifics of what the tool does will have to be
rewritten for each upgrade, but I think the framework is good.

 Add supprt for migrating between hbase versions
 ---

 Key: HADOOP-2394
 URL: https://issues.apache.org/jira/browse/HADOOP-2394
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Johan Oskarsson

 If Hbase is to be used to serve data to live systems we would need a way to 
 upgrade both the underlying hadoop installation and hbase to newer versions 
 with minimal downtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558048#action_12558048
 ] 

Hadoop QA commented on HADOOP-2570:
---

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372956/patch-2570.txt
against trunk revision r611056.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests -1.  The patch failed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/console

This message is automatically generated.

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2500) [HBase] Unreadable region kills region servers

2008-01-11 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558032#action_12558032
 ] 

Jim Kellerman commented on HADOOP-2500:
---

Bryan Duxbury wrote:
 At the very least, we should not assign a region to a region server if it is 
 detected as no good.

That is an unfortunate wording of a log message in the Master. It is saying 
that the current 
assignment of the region is no good because the information it read from the 
meta region
had a server or start code that did not match a known server. It does not mean 
that the
master thinks the region itself is no good.

 Also, if a RegionServer tries to access a region and it has difficulties, it 
 should report to the
 master that it can't read the region, and the master should stop trying to 
 serve it.
 From a more general standpoint, maybe when a bad region is detected, its 
 files should be 
 moved to a different location and generally excluded from the cluster. This 
 would allow you to 
 recover from problems better.

Yes, we absolutely need to do something, just not sure exactly what yet.

One thing for certain. zero length files should be ignored/deleted.


 [HBase] Unreadable region kills region servers
 --

 Key: HADOOP-2500
 URL: https://issues.apache.org/jira/browse/HADOOP-2500
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: CentOS 5
Reporter: Chris Kline
Priority: Critical

 Backgound: The name node (also a DataNode and RegionServer) in our cluster 
 ran out of disk space.  I created some space, restarted HDFS and fsck 
 reported corruption with an HBase file.  I cleared up that corruption and 
 restarted HBase.  I was still unable to read anything from HBase even though 
 HSFS was now healthy.
 The following was gather from the log files.  When HMaster starts up, it 
 finds a region that is no good (Key: 17_125736271):
 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current 
 assignment of spider_pages,17_125736271,1198286140018 is no good
 HMaster then assigns this region to RegionServer X.60:
 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning 
 region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020
 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 The RegionServer has trouble reading that region (from the RegionServer log 
 on X.60); Note that the worker thread exits
 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting 
 spider_pages,17_125736271,1198286140018/meta (2062710340/meta with 
 reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log
 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum 
 sequence id for hstore spider_pages,17_125736271,1198286140018/meta 
 (2062710340/meta) is 4549496
 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error 
 opening region spider_pages,17_125736271,1198286140018
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344)
 at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697)
 at org.apache.hadoop.hbase.HStore.init(HStore.java:632)
 at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: 
 Unhandled exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker 
 thread exiting
 The HMaster then tries to assign the same region to X.60 again and fails.  
 The HMaster tries to assign the region to X.31 with the same result (X.31 
 worker thread exits).
 The file it is complaining about, 
 /data/hbase1/hregion_2062710340/oldlogfile.log, is a 

[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558078#action_12558078
 ] 

Hairong Kuang commented on HADOOP-2566:
---

I do not see why we need globStatus. GlobPath is essentially pattern matching. 
If the provided path does not contain any pattern, the given path is returned 
without talking to the namenode.

 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2500) [HBase] Unreadable region kills region servers

2008-01-11 Thread Bryan Duxbury (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558036#action_12558036
 ] 

Bryan Duxbury commented on HADOOP-2500:
---

So, we should:

 * Change the no good message to something a tad more descriptive, like 
assignment of region is invalid
 * Enumerate the known ways that a RegionServer can fail to serve a region, 
trap those problems, and figure out what responses we'd like to give to those 
events
 

 [HBase] Unreadable region kills region servers
 --

 Key: HADOOP-2500
 URL: https://issues.apache.org/jira/browse/HADOOP-2500
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: CentOS 5
Reporter: Chris Kline
Priority: Critical

 Backgound: The name node (also a DataNode and RegionServer) in our cluster 
 ran out of disk space.  I created some space, restarted HDFS and fsck 
 reported corruption with an HBase file.  I cleared up that corruption and 
 restarted HBase.  I was still unable to read anything from HBase even though 
 HSFS was now healthy.
 The following was gather from the log files.  When HMaster starts up, it 
 finds a region that is no good (Key: 17_125736271):
 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current 
 assignment of spider_pages,17_125736271,1198286140018 is no good
 HMaster then assigns this region to RegionServer X.60:
 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning 
 region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020
 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 The RegionServer has trouble reading that region (from the RegionServer log 
 on X.60); Note that the worker thread exits
 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting 
 spider_pages,17_125736271,1198286140018/meta (2062710340/meta with 
 reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log
 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum 
 sequence id for hstore spider_pages,17_125736271,1198286140018/meta 
 (2062710340/meta) is 4549496
 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error 
 opening region spider_pages,17_125736271,1198286140018
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344)
 at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697)
 at org.apache.hadoop.hbase.HStore.init(HStore.java:632)
 at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: 
 Unhandled exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker 
 thread exiting
 The HMaster then tries to assign the same region to X.60 again and fails.  
 The HMaster tries to assign the region to X.31 with the same result (X.31 
 worker thread exits).
 The file it is complaining about, 
 /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero-length file in 
 HDFS.  After deleting that file and restarting HBase, HBase appears to be 
 back to normal.
 One thing I can't figure out is that the HMaster log show several entries 
 after the worker thread on X.60 has exited suggesting that the RegionServer 
 is talking with HMaster:
 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 There is no corresponding entry in the RegionServer's log.

-- 
This message is 

[jira] Updated: (HADOOP-2562) globPaths does not support {ab,cd} as it claims to

2008-01-11 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2562:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Hairong!

 globPaths does not support {ab,cd} as it claims to
 --

 Key: HADOOP-2562
 URL: https://issues.apache.org/jira/browse/HADOOP-2562
 Project: Hadoop
  Issue Type: Bug
  Components: fs
Affects Versions: 0.15.2
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.15.3

 Attachments: globFix.patch


 Olga reports: 
 According to 0.15 documentation, FileSystem::globPaths supports {ab,cd} 
 matching. However, when I tried to use it with pattern 
 /data/mydata/{data1,data2} I got no results even though I could find the 
 individual files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2077) Logging version number (and compiled date) at STARTUP_MSG

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558079#action_12558079
 ] 

Hadoop QA commented on HADOOP-2077:
---

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372845/HADOOP-2077_0_20080110.patch
against trunk revision r611056.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests -1.  The patch failed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/console

This message is automatically generated.

 Logging version number (and compiled date) at STARTUP_MSG  
 ---

 Key: HADOOP-2077
 URL: https://issues.apache.org/jira/browse/HADOOP-2077
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs, mapred
Reporter: Koji Noguchi
Assignee: Arun C Murthy
Priority: Trivial
 Fix For: 0.16.0

 Attachments: HADOOP-2077_0_20080110.patch, 
 HADOOP-2077_0_20080110.patch


 This will help us figure out which version of hadoop we were running when 
 looking back the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558067#action_12558067
 ] 

Hairong Kuang commented on HADOOP-2566:
---

Did you mean that we need FileStatus[] listStatus rather than Path[] listPaths?

 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2431) Test HDFS File Permissions

2008-01-11 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-2431:
--

Attachment: (was: PermissionsTestPlan.pdf)

 Test HDFS File Permissions
 --

 Key: HADOOP-2431
 URL: https://issues.apache.org/jira/browse/HADOOP-2431
 Project: Hadoop
  Issue Type: Test
  Components: test
Affects Versions: 0.15.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.16.0

 Attachments: HDFSPermissionSpecification6.pdf, 
 PermissionsTestPlan1.pdf, testDFSPermission.patch, testDFSPermission1.patch


 This jira is intended to provide junit tests to HADOOP-1298.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1298) adding user info to file

2008-01-11 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-1298:
--

Attachment: HDFSPermissionSpecification6.pdf

Updated specification reflects a change that the permissions of listing a 
directory requires rx permissions.

 adding user info to file
 

 Key: HADOOP-1298
 URL: https://issues.apache.org/jira/browse/HADOOP-1298
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs, fs
Affects Versions: 0.16.0
Reporter: Kurtis Heimerl
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.16.0

 Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 
 1298_20071221b.patch, 1298_20071228s.patch, 1298_20080103.patch, 
 hadoop-user-munncha.patch17, HDFSPermissionSpecification5.pdf, 
 HDFSPermissionSpecification6.pdf


 I'm working on adding a permissions model to hadoop's DFS. The first step is 
 this change, which associates user info with files. Following this I'll 
 assoicate permissions info, then block methods based on that user info, then 
 authorization of the user info. 
 So, right now i've implemented adding user info to files. I'm looking for 
 feedback before I clean this up and make it offical. 
 I wasn't sure what release, i'm working off trunk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2431) Test HDFS File Permissions

2008-01-11 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-2431:
--

Attachment: testDFSPermission1.patch

I found one permission checking semantics error. Because dfs list is equivalent 
to unix ls -l, listing a directory needs both SEARCH and READ permissions on 
the directory. This patch fixed the problem. It also added javadoc to the unit 
tests.

 Test HDFS File Permissions
 --

 Key: HADOOP-2431
 URL: https://issues.apache.org/jira/browse/HADOOP-2431
 Project: Hadoop
  Issue Type: Test
  Components: test
Affects Versions: 0.15.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.16.0

 Attachments: PermissionsTestPlan.pdf, testDFSPermission.patch, 
 testDFSPermission1.patch


 This jira is intended to provide junit tests to HADOOP-1298.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2394) Add supprt for migrating between hbase versions

2008-01-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558040#action_12558040
 ] 

stack commented on HADOOP-2394:
---

There is no framework that I can see in HADOOP-2478.  The is just a single 
script that addresses a single migration incident.

 Add supprt for migrating between hbase versions
 ---

 Key: HADOOP-2394
 URL: https://issues.apache.org/jira/browse/HADOOP-2394
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Johan Oskarsson

 If Hbase is to be used to serve data to live systems we would need a way to 
 upgrade both the underlying hadoop installation and hbase to newer versions 
 with minimal downtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

2008-01-11 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558045#action_12558045
 ] 

Devaraj Das commented on HADOOP-2116:
-

The only problem with this are the incompatible changes (like ../work and 
../work/script); code, especially scripts that assume paths will break. So,  is 
everyone okay with this for 0.16? Should we do the symlink stuff to maintain 
backward compatibility. As an aside, in the directory organization Owen 
suggested, one thing that needs to be added is the common scratch space for all 
tasks (like the file cache). 

Another thing IMO is that we should probably just do the basic dir organization 
as was proposed by Amareshwari earlier and the streaming fix. The magnitude of 
the change required by the dir organization proposed by Owen seems pretty 
significant and seems aggressive for 0.16. Maybe we can do the remaining for 
0.17. Thoughts?

 Job.local.dir to be exposed to tasks
 

 Key: HADOOP-2116
 URL: https://issues.apache.org/jira/browse/HADOOP-2116
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Affects Versions: 0.14.3
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Amareshwari Sri Ramadasu
 Fix For: 0.16.0

 Attachments: patch-2116.txt, patch-2116.txt


 Currently, since all task cwds are created under a jobcache directory, users 
 that need a job-specific shared directory for use as scratch space, create 
 ../work. This is hacky, and will break when HADOOP-2115 is addressed. For 
 such jobs, hadoop mapred should expose job.local.dir via localized 
 configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1965) Handle map output buffers better

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558105#action_12558105
 ] 

Hadoop QA commented on HADOOP-1965:
---

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372632/HADOOP-2419.patch
against trunk revision r611264.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/console

This message is automatically generated.

 Handle map output buffers better
 

 Key: HADOOP-1965
 URL: https://issues.apache.org/jira/browse/HADOOP-1965
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Affects Versions: 0.16.0
Reporter: Devaraj Das
Assignee: Amar Kamat
 Fix For: 0.16.0

 Attachments: 1965_single_proc_150mb_gziped.jpeg, 
 1965_single_proc_150mb_gziped.pdf, 1965_single_proc_150mb_gziped_breakup.png, 
 HADOOP-1965-1.patch, HADOOP-1965-Benchmark.patch, 
 HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, 
 HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-2419.patch, 
 HADOOP-2419.patch, HADOOP-2419.patch, HADOOP-2419.patch


 Today, the map task stops calling the map method while sort/spill is using 
 the (single instance of) map output buffer. One improvement that can be done 
 to improve performance of the map task is to have another buffer for writing 
 the map outputs to, while sort/spill is using the first buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558129#action_12558129
 ] 

Doug Cutting commented on HADOOP-2566:
--

Globbing is implemented on top of listPaths() which is implemented on top of 
listStatus().  The primitive globbing API should not throw away that status 
information.  It should keep it so that glob clients which need it do not have 
to call getStatus() for each file that matches.  Currently the cache of 
FileStatus hides the cost of these getStatus() calls, but that cache will break 
things once files and their status can change.  So we need globStatus() before 
we can remove the cache.

FileInputFormat, for example, uses globPaths() to list files matching the input 
specification, then it uses getStatus() on each matching path when building 
splits.  This must change to call globStatus() before the cache is removed.

Long-term, globPaths() and listPaths() may perhaps still be useful as a utility 
methods implemented in terms of of globStatus() and listStatus(), but since 
most current users of these will be broken performancewise once the cache is 
removed, we should deprecate them now to strongly encourage folks to stop using 
them before that cache is removed, to give fair warning.


 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558071#action_12558071
 ] 

Doug Cutting commented on HADOOP-2566:
--

No, we need 'FileStatus[] globStatus(Path pattern)' instead of 'Path[] 
globPaths(Path pattern)'.

 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558080#action_12558080
 ] 

Arun C Murthy commented on HADOOP-2570:
---

All tests fail with:

{noformat}
2008-01-11 17:35:53,433 INFO  mapred.TaskTracker 
(TaskTracker.java:launchTaskForJob(703)) - 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
taskTracker/jobcache/job_20080735_0001/work in any of the configured local 
directories
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1395)
at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:1469)
at 
org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:693)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:686)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1279)
at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:920)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1315)
at 
org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.run(MiniMRCluster.java:144)
at java.lang.Thread.run(Thread.java:595)
{noformat}

The problem is that the LocalDirAllocator.getLocalPathToRead throws and 
exception when the path is not found - this patch should handle that exception 
and go-ahead to create the symlink...


 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2431) Test HDFS File Permissions

2008-01-11 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-2431:
--

Attachment: PermissionsTestPlan1.pdf

Attach an updated test plan

 Test HDFS File Permissions
 --

 Key: HADOOP-2431
 URL: https://issues.apache.org/jira/browse/HADOOP-2431
 Project: Hadoop
  Issue Type: Test
  Components: test
Affects Versions: 0.15.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.16.0

 Attachments: HDFSPermissionSpecification6.pdf, 
 PermissionsTestPlan1.pdf, testDFSPermission.patch, testDFSPermission1.patch


 This jira is intended to provide junit tests to HADOOP-1298.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2431) Test HDFS File Permissions

2008-01-11 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-2431:
--

Attachment: HDFSPermissionSpecification6.pdf

Attach the permission checking specification.

 Test HDFS File Permissions
 --

 Key: HADOOP-2431
 URL: https://issues.apache.org/jira/browse/HADOOP-2431
 Project: Hadoop
  Issue Type: Test
  Components: test
Affects Versions: 0.15.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.16.0

 Attachments: HDFSPermissionSpecification6.pdf, 
 PermissionsTestPlan.pdf, testDFSPermission.patch, testDFSPermission1.patch


 This jira is intended to provide junit tests to HADOOP-1298.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2389) [hbase] provide multiple language bindings for HBase

2008-01-11 Thread David Simpson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Simpson updated HADOOP-2389:
--

Attachment: (was: hbase-thrift.patch)

 [hbase] provide multiple language bindings for HBase
 

 Key: HADOOP-2389
 URL: https://issues.apache.org/jira/browse/HADOOP-2389
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Jim Kellerman
Priority: Minor
 Attachments: hbase-thrift.patch, hbase-thrift.patch, 
 Hbase.thrift.txt, libthrift-r746.jar


 There have been a number of requests for multiple language bindings for 
 HBase.  While there is now a REST interface, this may not be suited for 
 high-volume applications. A couple of suggested approaches have been proposed:
 - Provide a Thrift based API (very fast socket based but some of the 
 languages are not well supported)
 - Provide a JSON based API over sockets. (faster than REST, but probably 
 slower than Thrift)
 Others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2574) bugs in mapred tutorial

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2574:
--

Attachment: HADOOP-2574_0_20080110.patch

Here is patch which addresses most of Phu's concerns...

 bugs in mapred tutorial
 ---

 Key: HADOOP-2574
 URL: https://issues.apache.org/jira/browse/HADOOP-2574
 Project: Hadoop
  Issue Type: Bug
  Components: documentation
Reporter: Doug Cutting
Assignee: Arun C Murthy
 Fix For: 0.15.3, 0.16.0

 Attachments: HADOOP-2574_0_20080110.patch


 Sam Pullara sends me:
 {noformat}
 Phu was going through the WordCount example... lines 52 and 53 should have 
 args[0] and args[1]:
 http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
 The javac and jar command are also wrong, they don't include the directories 
 for the packages, should be:
 $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d 
 classes WordCount.java 
 $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2574) bugs in mapred tutorial

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned HADOOP-2574:
-

Assignee: Arun C Murthy

 bugs in mapred tutorial
 ---

 Key: HADOOP-2574
 URL: https://issues.apache.org/jira/browse/HADOOP-2574
 Project: Hadoop
  Issue Type: Bug
  Components: documentation
Reporter: Doug Cutting
Assignee: Arun C Murthy
 Fix For: 0.15.3, 0.16.0

 Attachments: HADOOP-2574_0_20080110.patch


 Sam Pullara sends me:
 {noformat}
 Phu was going through the WordCount example... lines 52 and 53 should have 
 args[0] and args[1]:
 http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
 The javac and jar command are also wrong, they don't include the directories 
 for the packages, should be:
 $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d 
 classes WordCount.java 
 $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2574) bugs in mapred tutorial

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2574:
--

Attachment: mapred_tutorial.html

Here is how the tutorial looks with this patch...

 bugs in mapred tutorial
 ---

 Key: HADOOP-2574
 URL: https://issues.apache.org/jira/browse/HADOOP-2574
 Project: Hadoop
  Issue Type: Bug
  Components: documentation
Reporter: Doug Cutting
Assignee: Arun C Murthy
 Fix For: 0.15.3, 0.16.0

 Attachments: HADOOP-2574_0_20080110.patch, mapred_tutorial.html


 Sam Pullara sends me:
 {noformat}
 Phu was going through the WordCount example... lines 52 and 53 should have 
 args[0] and args[1]:
 http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
 The javac and jar command are also wrong, they don't include the directories 
 for the packages, should be:
 $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d 
 classes WordCount.java 
 $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558087#action_12558087
 ] 

Arun C Murthy commented on HADOOP-2570:
---

Sigh, this exception seems to stem from the fact that the LocalDirAllocator is 
not used to create the *taskTracker/jobcache/jobid/work* directory at all. It 
is always created in the same partition as the *taskTracker/jobcache/jobid/* 
directory.

This means LocalDirAllocator doesn't know about the 
*taskTracker/jobcache/jobid/work* directory at all and hence the 
DiskErrorException.

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache

2008-01-11 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1707:
-

Status: Patch Available  (was: Open)

 Remove the DFS Client disk-based cache
 --

 Key: HADOOP-1707
 URL: https://issues.apache.org/jira/browse/HADOOP-1707
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, 
 clientDiskBuffer11.patch, clientDiskBuffer12.patch, clientDiskBuffer14.patch, 
 clientDiskBuffer15.patch, clientDiskBuffer16.patch, clientDiskBuffer17.patch, 
 clientDiskBuffer18.patch, clientDiskBuffer19.patch, clientDiskBuffer2.patch, 
 clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, 
 clientDiskBuffer9.patch, DataTransferProtocol.doc, DataTransferProtocol.html


 The DFS client currently uses a staging file on local disk to cache all 
 user-writes to a file. When the staging file accumulates 1 block worth of 
 data, its contents are flushed to a HDFS datanode. These operations occur 
 sequentially.
 A simple optimization of allowing the user to write to another staging file 
 while simultaneously uploading the contents of the first staging file to HDFS 
 will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2560) Combining multiple input blocks into one mapper

2008-01-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558042#action_12558042
 ] 

Owen O'Malley commented on HADOOP-2560:
---

I like that approach, Doug. We should also have a entry for splits that are 
local to no nodes in the map/reduce cluster and prefer to steal from them 
rather than other nodes. This would solve HADOOP-2014...

 Combining multiple input blocks into one mapper
 ---

 Key: HADOOP-2560
 URL: https://issues.apache.org/jira/browse/HADOOP-2560
 Project: Hadoop
  Issue Type: Bug
Reporter: Runping Qi

 Currently, an input split contains a consecutive chunk of input file, which 
 by default, corresponding to a DFS block.
 This may lead to a large number of mapper tasks if the input data is large. 
 This leads to the following problems:
 1. Shuffling cost: since the framework has to move M * R map output segments 
 to the nodes running reducers, 
 larger M means larger shuffling cost.
 2. High JVM initialization overhead
 3. Disk fragmentation: larger number of map output files means lower read 
 throughput for accessing them.
 Ideally, you want to keep the number of mappers to no more than 16 times the 
 number of  nodes in the cluster.
 To achive that, we can increase the input split size. However, if a split 
 span over more than one dfs block,
 you lose the data locality scheduling benefits.
 One way to address this problem is to combine multiple input blocks with the 
 same rack into one split.
 If in average we combine B blocks into one split, then we will reduce the 
 number of mappers by a factor of B.
 Since all the blocks for one mapper share a rack, thus we can benefit from 
 rack-aware scheduling.
 Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache

2008-01-11 Thread Mukund Madhugiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558041#action_12558041
 ] 

Mukund Madhugiri commented on HADOOP-1707:
--

Running on a 100 node cluster, with the patch clientDiskBuffer19.patch, the 
sort benchmark showed these results:

|*100 nodes*|*trunk*|*trunk + patch*|
|randomWriter (hrs)|0.44|0.45|
|sort (hrs)|1.03|1|
|sortValidation (hrs)|0.39|0.3|

 Remove the DFS Client disk-based cache
 --

 Key: HADOOP-1707
 URL: https://issues.apache.org/jira/browse/HADOOP-1707
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, 
 clientDiskBuffer11.patch, clientDiskBuffer12.patch, clientDiskBuffer14.patch, 
 clientDiskBuffer15.patch, clientDiskBuffer16.patch, clientDiskBuffer17.patch, 
 clientDiskBuffer18.patch, clientDiskBuffer19.patch, clientDiskBuffer2.patch, 
 clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, 
 clientDiskBuffer9.patch, DataTransferProtocol.doc, DataTransferProtocol.html


 The DFS client currently uses a staging file on local disk to cache all 
 user-writes to a file. When the staging file accumulates 1 block worth of 
 data, its contents are flushed to a HDFS datanode. These operations occur 
 sequentially.
 A simple optimization of allowing the user to write to another staging file 
 while simultaneously uploading the contents of the first staging file to HDFS 
 will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558082#action_12558082
 ] 

Raghu Angadi commented on HADOOP-2566:
--

Also, this would not duplicate code. {{globPaths()}} would just be implemented 
with {{globStatus()}} (when there is a glob in the path).


 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558081#action_12558081
 ] 

rangadi edited comment on HADOOP-2566 at 1/11/08 11:36 AM:


globStatus would certainly be useful since globPaths() is used in many places 
where we really want to do globStatus(). globStatus is much more efficient in 
those cases since we aften do '{{for(path : globPaths(pattern)) { stat = 
listStatus(path) ... }}}'.

I am not sure if globPaths() can go away. One difference I see is that 
globPath(/non/existent/path/withoutglob) returns simple path without any 
filesystem interaction (as expected). But 
globStatus(/non/existent/path/withoutglob)  will ask filesystem and will 
return NULL (or array with zero entries).


  was (Author: rangadi):
globStatus would certainly be useful since globPaths() is used in many 
places where we really want to do globStatus(). globStatus is much more 
efficient in those cases since we aften do {{for(path : globPaths(pattern)) { 
stat = listStatus(path) ... }.

I am not sure if globPaths() can go away. One difference I see is that 
globPath(/non/existent/path/withoutglob) returns simple path without any 
filesystem interaction (as expected). But 
globStatus(/non/existent/path/withoutglob)  will ask filesystem and will 
return NULL (or array with zero entries).

  
 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2481) NNBench should periodically report its progress

2008-01-11 Thread Mukund Madhugiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558084#action_12558084
 ] 

Mukund Madhugiri commented on HADOOP-2481:
--

+1

I tested it on a 500 node cluster and it works fine. Thanks Hairong

 NNBench should periodically report its progress
 ---

 Key: HADOOP-2481
 URL: https://issues.apache.org/jira/browse/HADOOP-2481
 Project: Hadoop
  Issue Type: Bug
  Components: test
Affects Versions: 0.15.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.16.0

 Attachments: NNBench.patch


 When I run NNBench on a 100-node cluster, some map tasks fail with the error  
 message Task xx failed to report status for yy seconds. Killing!. Map tasks 
 should periodically reports its progress to prevent itself being killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2389) [hbase] provide multiple language bindings for HBase

2008-01-11 Thread David Simpson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Simpson updated HADOOP-2389:
--

Attachment: hbase-thrift.patch

re-uploading to apply patch

 [hbase] provide multiple language bindings for HBase
 

 Key: HADOOP-2389
 URL: https://issues.apache.org/jira/browse/HADOOP-2389
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Jim Kellerman
Priority: Minor
 Attachments: hbase-thrift.patch, hbase-thrift.patch, 
 Hbase.thrift.txt, libthrift-r746.jar


 There have been a number of requests for multiple language bindings for 
 HBase.  While there is now a REST interface, this may not be suited for 
 high-volume applications. A couple of suggested approaches have been proposed:
 - Provide a Thrift based API (very fast socket based but some of the 
 languages are not well supported)
 - Provide a JSON based API over sockets. (faster than REST, but probably 
 slower than Thrift)
 Others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2389) [hbase] provide multiple language bindings for HBase

2008-01-11 Thread David Simpson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Simpson updated HADOOP-2389:
--

Attachment: hbase-thrift.patch

re-uploading

 [hbase] provide multiple language bindings for HBase
 

 Key: HADOOP-2389
 URL: https://issues.apache.org/jira/browse/HADOOP-2389
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Jim Kellerman
Priority: Minor
 Attachments: hbase-thrift.patch, hbase-thrift.patch, 
 Hbase.thrift.txt, libthrift-r746.jar


 There have been a number of requests for multiple language bindings for 
 HBase.  While there is now a REST interface, this may not be suited for 
 high-volume applications. A couple of suggested approaches have been proposed:
 - Provide a Thrift based API (very fast socket based but some of the 
 languages are not well supported)
 - Provide a JSON based API over sockets. (faster than REST, but probably 
 slower than Thrift)
 Others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558088#action_12558088
 ] 

Hairong Kuang commented on HADOOP-2566:
---

GlobPath is intended to return all pathes that matches the given glob. It is 
not intended to do 'for(path : globPaths(pattern)) { stat = listStatus(path) 
... }'. The feature that you want is listing all the pathes that matches the 
glob.


 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558081#action_12558081
 ] 

Raghu Angadi commented on HADOOP-2566:
--

globStatus would certainly be useful since globPaths() is used in many places 
where we really want to do globStatus(). globStatus is much more efficient in 
those cases since we aften do {{for(path : globPaths(pattern)) { stat = 
listStatus(path) ... }.

I am not sure if globPaths() can go away. One difference I see is that 
globPath(/non/existent/path/withoutglob) returns simple path without any 
filesystem interaction (as expected). But 
globStatus(/non/existent/path/withoutglob)  will ask filesystem and will 
return NULL (or array with zero entries).


 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2481) NNBench should periodically report its progress

2008-01-11 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-2481:
--

Status: Patch Available  (was: Open)

 NNBench should periodically report its progress
 ---

 Key: HADOOP-2481
 URL: https://issues.apache.org/jira/browse/HADOOP-2481
 Project: Hadoop
  Issue Type: Bug
  Components: test
Affects Versions: 0.15.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.16.0

 Attachments: NNBench.patch


 When I run NNBench on a 100-node cluster, some map tasks fail with the error  
 message Task xx failed to report status for yy seconds. Killing!. Map tasks 
 should periodically reports its progress to prevent itself being killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2449) Restore the old NN Bench that was replaced by a MR NN Bench

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558144#action_12558144
 ] 

Hadoop QA commented on HADOOP-2449:
---

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12371829/fixNNBenchPatch.txt
against trunk revision r611264.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests -1.  The patch failed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/console

This message is automatically generated.

 Restore the  old NN Bench that was replaced by a MR NN Bench
 

 Key: HADOOP-2449
 URL: https://issues.apache.org/jira/browse/HADOOP-2449
 Project: Hadoop
  Issue Type: Test
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Attachments: fixNNBenchPatch.txt


 The old NN Bench did not use Map Reduce.
 It was replaced by a new NN Bench that uses Map reduce.
 The old NN Bench is useful and should be restored.
   - useful ofr simulated data niodes which do not work for Map reduce since 
 the job configs need to be persistent.
   - a NN test that is independent of map reduce can be useful as it is one 
 less variable in figuring out bottlenecks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2574) bugs in mapred tutorial

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2574:
--

Status: Patch Available  (was: Open)

 bugs in mapred tutorial
 ---

 Key: HADOOP-2574
 URL: https://issues.apache.org/jira/browse/HADOOP-2574
 Project: Hadoop
  Issue Type: Bug
  Components: documentation
Reporter: Doug Cutting
Assignee: Arun C Murthy
 Fix For: 0.15.3, 0.16.0

 Attachments: HADOOP-2574_0_20080110.patch, mapred_tutorial.html


 Sam Pullara sends me:
 {noformat}
 Phu was going through the WordCount example... lines 52 and 53 should have 
 args[0] and args[1]:
 http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
 The javac and jar command are also wrong, they don't include the directories 
 for the packages, should be:
 $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d 
 classes WordCount.java 
 $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks

2008-01-11 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558150#action_12558150
 ] 

Milind Bhandarkar commented on HADOOP-2116:
---

Since this bug is scheduled for 0.16, having incompatible changes in that 
release is fine (of course, as long as it is flagged such in the release notes.)

 Job.local.dir to be exposed to tasks
 

 Key: HADOOP-2116
 URL: https://issues.apache.org/jira/browse/HADOOP-2116
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Affects Versions: 0.14.3
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Amareshwari Sri Ramadasu
 Fix For: 0.16.0

 Attachments: patch-2116.txt, patch-2116.txt


 Currently, since all task cwds are created under a jobcache directory, users 
 that need a job-specific shared directory for use as scratch space, create 
 ../work. This is hacky, and will break when HADOOP-2115 is addressed. For 
 such jobs, hadoop mapred should expose job.local.dir via localized 
 configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558093#action_12558093
 ] 

Raghu Angadi commented on HADOOP-2566:
--

 'for(path : globPaths(pattern)) { stat = listStatus(path) ... }'.
FsShell.setReplication() is an example of this pattern of use (essentially).

I agree that globStatus() may not replace all uses of globPaths().

 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2581) Counters and other useful stats should be logged into Job History log

2008-01-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558101#action_12558101
 ] 

Owen O'Malley commented on HADOOP-2581:
---

The counters *are* logged in job history, as of HADOOP-1210.

 Counters and other useful stats should be logged into Job History log
 -

 Key: HADOOP-2581
 URL: https://issues.apache.org/jira/browse/HADOOP-2581
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Runping Qi

 The following stats are useful and  available to JT but not logged job 
 history log:
 1. The counters of each job
 2. The counters of each mapper/reducer attempt
 3. The info about the input splits (filename, split size, on which nodes)
 3. The input split for each mapper attempt
 Those data is useful and important for mining to find out performance related 
 problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558133#action_12558133
 ] 

Arun C Murthy commented on HADOOP-2570:
---

Please ignore my previous comments... it's been a long day (maybe the following 
ones too! *smile*)

It seems like the test cases don't have a jar and hence there is an 'if' check 
in TaskTracker.localizeJob which fails and hence the work directory isn't 
created. This explains the exception seen in the TaskTracker.launchTaskForJob 
function. 

I didn't make any headway after that...

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-1965) Handle map output buffers better

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-1965:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Amar - this was a long-drawn affair!

 Handle map output buffers better
 

 Key: HADOOP-1965
 URL: https://issues.apache.org/jira/browse/HADOOP-1965
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Affects Versions: 0.16.0
Reporter: Devaraj Das
Assignee: Amar Kamat
 Fix For: 0.16.0

 Attachments: 1965_single_proc_150mb_gziped.jpeg, 
 1965_single_proc_150mb_gziped.pdf, 1965_single_proc_150mb_gziped_breakup.png, 
 HADOOP-1965-1.patch, HADOOP-1965-Benchmark.patch, 
 HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, 
 HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-2419.patch, 
 HADOOP-2419.patch, HADOOP-2419.patch, HADOOP-2419.patch


 Today, the map task stops calling the map method while sort/spill is using 
 the (single instance of) map output buffer. One improvement that can be done 
 to improve performance of the map task is to have another buffer for writing 
 the map outputs to, while sort/spill is using the first buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2581) Counters and other useful stats should be logged into Job History log

2008-01-11 Thread Runping Qi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558147#action_12558147
 ] 

Runping Qi commented on HADOOP-2581:



Cool. 

That will address the first 2 items.

Is it intended to be in 0.16?
Is it possible to included in 0.15.3?


We still need to log the split info.



 Counters and other useful stats should be logged into Job History log
 -

 Key: HADOOP-2581
 URL: https://issues.apache.org/jira/browse/HADOOP-2581
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Runping Qi

 The following stats are useful and  available to JT but not logged job 
 history log:
 1. The counters of each job
 2. The counters of each mapper/reducer attempt
 3. The info about the input splits (filename, split size, on which nodes)
 3. The input split for each mapper attempt
 Those data is useful and important for mining to find out performance related 
 problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists

2008-01-11 Thread lohit vijayarenu (JIRA)
hadoop dfs -copyToLocal creates zero byte files, when source file does not 
exists 
--

 Key: HADOOP-2582
 URL: https://issues.apache.org/jira/browse/HADOOP-2582
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.2
Reporter: lohit vijayarenu


hadoop dfs -copyToLocal with an no existing source file creates a zero byte 
destination file. It should throw an error message indicating the source file 
does not exists.

{noformat}
[lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile
[lohit@ hadoop-trunk]$ ls -l nosuchfile 
-rw-r--r--  1 lohit users 0 Jan 11 21:58 nosuchfile
[lohit@ hadoop-trunk]$
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2570:
--

Attachment: HADOOP-2570_1_20080112.patch

bq. It seems like the test cases don't have a jar and hence there is an 'if' 
check in TaskTracker.localizeJob which fails and hence the work directory 
isn't created. This explains the exception seen in the 
TaskTracker.launchTaskForJob function.

Here is patch which fixes TaskTracker.localizeJob to fix the problem described 
above, along with Amareshwari's original fix.

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2570:
--

Status: Patch Available  (was: Open)

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2014) Job Tracker should not clobber the data locality of tasks

2008-01-11 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558155#action_12558155
 ] 

eric baldeschwieler commented on HADOOP-2014:
-

An ideal solution would maintain some sort of prioritized list of maps / node / 
rack so that we execute work first that is unlikely to find another efficient 
location to execute.

It would also make sense to place some no local work early, since these tasks 
run slowly, on nodes that are likely to run out of local work relatively early.

One could also pay attention to IO load on each source node...

At a minimum we should track maps that have no local option and schedule them 
first when a node has no local option.  (As doug cutting suggested in 
hadoop-2560

 Job Tracker should not clobber the data locality of tasks
 -

 Key: HADOOP-2014
 URL: https://issues.apache.org/jira/browse/HADOOP-2014
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Reporter: Runping Qi
Assignee: Devaraj Das

 Currently, when the Job Tracker assigns a mapper task to a task tracker and 
 there is no local split to the task tracker, the
 job tracker will find the first runable task in the mast task list  and 
 assign the task to the task tracker.
 The split for the task is not local to the task tracker, of course. However, 
 the split may be local to other task trackers.
 Assigning the that task, to that task tracker may decrease the potential 
 number of mapper attempts with data locality.
 The desired behavior in this situation is to choose a task whose split is not 
 local to any  task tracker. 
 Resort to the current behavior only if no such task is found.
 In general, it will be useful to know the number of task trackers to which 
 each split is local.
 To assign a task to a task tracker, the job tracker should first  try to pick 
 a task that is local to the task tracker  and that has minimal number of task 
 trackers to which it is local. If no task is local to the task tracker, the 
 job tracker should  try to pick a task that has minimal number of task 
 trackers to which it is local. 
 It is worthwhile to instrument the job tracker code to report the number of 
 splits that are local to some task trackers.
 That should be the maximum number of tasks with data locality. By comparing 
 that number with the the actual number of 
 data local mappers launched, we can know the effectiveness of the job tracker 
 scheduling.
 When we introduce rack locality, we should apply the same principle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2014) Job Tracker should not clobber the data locality of tasks

2008-01-11 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558155#action_12558155
 ] 

eric14 edited comment on HADOOP-2014 at 1/11/08 2:41 PM:
--

An ideal solution would maintain some sort of prioritized list of maps / node / 
rack so that we execute work first that is unlikely to find another efficient 
location to execute.

It would also make sense to place some no local work early, since these tasks 
run slowly, on nodes that are likely to run out of local work relatively early.

One could also pay attention to IO load on each source node...

At a minimum we should track maps that have no local option and schedule them 
first when a node has no local option.  (As doug cutting suggested in 
HADOOP-2560)

  was (Author: eric14):
An ideal solution would maintain some sort of prioritized list of maps / 
node / rack so that we execute work first that is unlikely to find another 
efficient location to execute.

It would also make sense to place some no local work early, since these tasks 
run slowly, on nodes that are likely to run out of local work relatively early.

One could also pay attention to IO load on each source node...

At a minimum we should track maps that have no local option and schedule them 
first when a node has no local option.  (As doug cutting suggested in 
hadoop-2560
  
 Job Tracker should not clobber the data locality of tasks
 -

 Key: HADOOP-2014
 URL: https://issues.apache.org/jira/browse/HADOOP-2014
 Project: Hadoop
  Issue Type: Bug
  Components: mapred
Reporter: Runping Qi
Assignee: Devaraj Das

 Currently, when the Job Tracker assigns a mapper task to a task tracker and 
 there is no local split to the task tracker, the
 job tracker will find the first runable task in the mast task list  and 
 assign the task to the task tracker.
 The split for the task is not local to the task tracker, of course. However, 
 the split may be local to other task trackers.
 Assigning the that task, to that task tracker may decrease the potential 
 number of mapper attempts with data locality.
 The desired behavior in this situation is to choose a task whose split is not 
 local to any  task tracker. 
 Resort to the current behavior only if no such task is found.
 In general, it will be useful to know the number of task trackers to which 
 each split is local.
 To assign a task to a task tracker, the job tracker should first  try to pick 
 a task that is local to the task tracker  and that has minimal number of task 
 trackers to which it is local. If no task is local to the task tracker, the 
 job tracker should  try to pick a task that has minimal number of task 
 trackers to which it is local. 
 It is worthwhile to instrument the job tracker code to report the number of 
 splits that are local to some task trackers.
 That should be the maximum number of tasks with data locality. By comparing 
 that number with the the actual number of 
 data local mappers launched, we can know the effectiveness of the job tracker 
 scheduling.
 When we introduce rack locality, we should apply the same principle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread lohit vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558159#action_12558159
 ] 

lohit vijayarenu commented on HADOOP-2570:
--

testing the streaming job again. This patch solves the problem seen earlier. 
Thanks!

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2398) Additional Instrumentation for NameNode, RPC Layer and JMX support

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558158#action_12558158
 ] 

Hadoop QA commented on HADOOP-2398:
---

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372639/metricsPatch6_4.patch
against trunk revision r611264.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/console

This message is automatically generated.

 Additional Instrumentation for NameNode, RPC Layer and JMX support 
 ---

 Key: HADOOP-2398
 URL: https://issues.apache.org/jira/browse/HADOOP-2398
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Fix For: 0.16.0

 Attachments: metricsPatch6.txt, metricsPatch6_1.txt, 
 metricsPatch6_2.txt, metricsPatch6_3.txt, metricsPatch6_4.patch, 
 ScreenShotNameNodeStats.png, ScreenShotRPCStats.png


 Additional Instrumentation is needed for name node and its rpc layer. 
 Furthermore the instrumentation should be
 visible via JMX, Java's standard monitoring tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2581) Counters and other useful stats should be logged into Job History log

2008-01-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558154#action_12558154
 ] 

Owen O'Malley commented on HADOOP-2581:
---

It is already in trunk and therefore will be in 0.16. It can not be included in 
0.15.3, because that branch is closed except for bug fixes.

 Counters and other useful stats should be logged into Job History log
 -

 Key: HADOOP-2581
 URL: https://issues.apache.org/jira/browse/HADOOP-2581
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Runping Qi

 The following stats are useful and  available to JT but not logged job 
 history log:
 1. The counters of each job
 2. The counters of each mapper/reducer attempt
 3. The info about the input splits (filename, split size, on which nodes)
 3. The input split for each mapper attempt
 Those data is useful and important for mining to find out performance related 
 problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters

2008-01-11 Thread Christophe Taton (JIRA)
Potential Eclipse plug-in UI loop when editing location parameters
--

 Key: HADOOP-2583
 URL: https://issues.apache.org/jira/browse/HADOOP-2583
 Project: Hadoop
  Issue Type: Bug
Reporter: Christophe Taton
Assignee: Christophe Taton
Priority: Minor
 Fix For: 0.16.0


The UI might enter an infinite loop, when propagating parameters asynchronously.
Some functions are not yet implemented

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2576) Namenode performance degradation over time

2008-01-11 Thread Christian Kunz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kunz updated HADOOP-2576:
---

Priority: Blocker  (was: Major)

 Namenode performance degradation over time
 --

 Key: HADOOP-2576
 URL: https://issues.apache.org/jira/browse/HADOOP-2576
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Christian Kunz
Priority: Blocker

 We have a cluster running the same applications again and again with a high 
 turnover of files.
 The performance of these applications seem to be correlated to the lifetime 
 of the namenode:
 After starting the namenode, the applications need increasingly more time to 
 complete, with about 50% more time after 1 week. 
 During that time the namenode average cpu usage increases from typically 10% 
 to 30%, memory usage nearly doubles (although the average amount of data on 
 dfs stays the same), and the average load factor increases by a factor of 2-3 
 (although not  significantly high, 2).
 When looking at the namenode and datanode logs, I see a lot of asks to delete 
 blocks coming from the namenode for blocks not in the blockmap of the 
 datanodes, repeatedly for the same blocks.
 When I counted the number of blocks asked by the namenode to be deleted, I 
 noticed a noticeable increase with the lifetime of the namenode (a factor of 
 2-3 after 1 week).
 This makes me wonder whether the namenode does not purge the list of invalid 
 blocks from non-existing blocks.
 But independently, the namenode has a degradation issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters

2008-01-11 Thread Christophe Taton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Taton updated HADOOP-2583:
-

Component/s: contrib/eclipse-plugin

 Potential Eclipse plug-in UI loop when editing location parameters
 --

 Key: HADOOP-2583
 URL: https://issues.apache.org/jira/browse/HADOOP-2583
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/eclipse-plugin
Reporter: Christophe Taton
Assignee: Christophe Taton
Priority: Minor
 Fix For: 0.16.0


 The UI might enter an infinite loop, when propagating parameters 
 asynchronously.
 Some functions are not yet implemented

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x

2008-01-11 Thread Mukund Madhugiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558162#action_12558162
 ] 

Mukund Madhugiri commented on HADOOP-1650:
--

I did a re-run on the 500 node cluster and don't see the 
NotReplicatedYetException and SocketTimeoutException. I do see the 
OutOfMemoryError when the sort job is running:

2008-01-11 20:54:10,375 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from task_200801112025_0002_m_005337_0: java.lang.OutOfMemoryError: Java heap 
space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:137)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:373)
at 
org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2043)

Here is the sort data:
|*500 nodes*|*trunk*|*trunk + patch*|
|randomWriter (mins)|27|25|
|sort (mins)|86|107|
|sortValidation (mins)|22|21|

 Upgrade Jetty to 6.x
 

 Key: HADOOP-1650
 URL: https://issues.apache.org/jira/browse/HADOOP-1650
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: hadoop-1650-jetty6.1.5.patch, 
 hadoop-jetty6.1.4-lib.tar.gz, hadoop-jetty6.1.6-lib.tar.gz, 
 jetty-hadoop-6.1.6.patch, jetty-hbase.patch, jetty6.1.4.patch, 
 jetty6.1.6.patch


 This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 
 has fixed some of the issues we discovered in jetty during HADOOP-736 and 
 HADOOP-1273. I'd like to keep this issue open for sometime so that we have 
 enough time to test out things.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2558) [hbase] fixes for build up on hudson

2008-01-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HADOOP-2558.
---

   Resolution: Fixed
Fix Version/s: 0.16.0

Closing.  hbase contrib tests just ran successfully three times in a row 
(#1545-#1547 -- latter two failed in core tests).  My guess is that they are as 
broke as they used to be again: i.e. they'll fail once in a while but generally 
they succeed.  Will open specific issues to address future failures.

 [hbase] fixes for build up on hudson
 

 Key: HADOOP-2558
 URL: https://issues.apache.org/jira/browse/HADOOP-2558
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
 Fix For: 0.16.0

 Attachments: 2558-v2.patch, 2558-v3.patch, 2558-v4.patch, 
 2558-v5.patch, 2558.patch


 Fixes for hbase breakage up on hudson.  There seem to be many reasons for the 
 failings.
 One is that the .META. region of a sudden decides its 'no good' and it gets 
 deployed elsewhere.  Tests don't have the tolerance for this kinda churn.  A 
 previous commit adding in logging of why .META. is 'no good'.  Hopefully that 
 will help.
 Found also a case where TestTableMapReduce would fail because no sleep 
 between retries when getting new scanners.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2567) add FileSystem#getHomeDirectory() method

2008-01-11 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated HADOOP-2567:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this.

 add FileSystem#getHomeDirectory() method
 

 Key: HADOOP-2567
 URL: https://issues.apache.org/jira/browse/HADOOP-2567
 Project: Hadoop
  Issue Type: New Feature
  Components: fs
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 0.16.0

 Attachments: 2567-3.patch, HADOOP-2567-1.patch, HADOOP-2567-2.patch, 
 HADOOP-2567.patch


 The FileSystem API would benefit from a getHomeDirectory() method.
 The default implementation would return /user/$USER/.
 RawLocalFileSystem would return System.getProperty(user.home).
 HADOOP-2514 can use this to implement per-user trash.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2578) HBaseConfiguration(Configuration c) constructor shouldn't overwrite passed conf with defaults

2008-01-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558167#action_12558167
 ] 

stack commented on HADOOP-2578:
---

Would adding a test that checked that c.getResource(hbase-default.xml) and 
c.getResource(hbase-site.xml) were non-null in 
HBaseConfiguration(Configuration c) work?

 HBaseConfiguration(Configuration c) constructor shouldn't overwrite passed 
 conf with defaults
 -

 Key: HADOOP-2578
 URL: https://issues.apache.org/jira/browse/HADOOP-2578
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Thomas Garner

 While testing out mapreduce with hbase, in the map portion of a task, the map 
 would try to connect to an hbase master at localhost/127.0.0.1.  The config 
 passed to the hbaseconfiguration contained the necessary hbase configuration 
 information, but I assume was being overwritten by the defaults in the config 
 files during addResource, as commenting out addHbaseResources in the 
 constructor fixed the symptom.  I would expect the configs to be layered on 
 top of each other, e.g. default, then site, then the config passed as a 
 parameter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1876) Persisting completed jobs status

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558168#action_12558168
 ] 

Hadoop QA commented on HADOOP-1876:
---

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372904/patch1876.txt
against trunk revision r611315.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/console

This message is automatically generated.

 Persisting completed jobs status
 

 Key: HADOOP-1876
 URL: https://issues.apache.org/jira/browse/HADOOP-1876
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
 Environment: all
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 0.16.0

 Attachments: patch1876.txt, patch1876.txt, patch1876.txt


 Currently the JobTracker keeps information about completed jobs in memory. 
 This information is  flushed from the cache when it has outlived 
 (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has 
 been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
 Also, if the JobTracker is restarted (due to being recycled or due to a 
 crash) information about completed jobs is lost.
 If any of the above scenarios happens before the job information is queried 
 by a hadoop client (normally the job submitter or a monitoring component) 
 there is no way to obtain such information.
 A way to avoid this is the JobTracker to persist in DFS the completed jobs 
 information upon job completion. This would be done at the time the job is 
 moved to the completed jobs queue. Then when querying the JobTracker for 
 information about a completed job, if it is not found in the memory queue, a 
 lookup  in DFS would be done to retrieve the completed job information. 
 A directory in DFS (under mapred/system) would be used to persist completed 
 job information, for each completed job there would be a directory with the 
 job ID, within that directory all the information about the job: status, 
 jobprofile, counters and completion events.
 A configuration property will indicate for how log persisted job information 
 should be kept in DFS. After such period it will be cleaned up automatically.
 This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2533) [hbase] Performance: Scanning, just creating MapWritable in next consumes 20% CPU

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558170#action_12558170
 ] 

Hadoop QA commented on HADOOP-2533:
---

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372851/2533-v2.patch
against trunk revision r611333.

@author +1.  The patch does not contain any @author tags.

patch -1.  The patch command could not apply the patch.

Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1549/console

This message is automatically generated.

 [hbase] Performance: Scanning, just creating MapWritable in next consumes 
 20% CPU
 --

 Key: HADOOP-2533
 URL: https://issues.apache.org/jira/browse/HADOOP-2533
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor
 Fix For: 0.16.0

 Attachments: 2533-v2.patch, 2533.patch, dirty.patch


 Every call to HScanner.next creates an instance of MapWritable.  MapWritables 
 are expensive.  Watching a scan run in the profiler, the setup of the 
 MapWritable -- filling out the idToClassMap and classToIdMap -- consumes 20% 
 of all CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2443) [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list

2008-01-11 Thread Bryan Duxbury (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2443:
--

Attachment: 2443-v6.patch

All the tests pass locally at this point, big improvement over the last version.

 [hbase] Keep lazy cache of regions in client rather than an 'authoritative' 
 list
 

 Key: HADOOP-2443
 URL: https://issues.apache.org/jira/browse/HADOOP-2443
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Bryan Duxbury
 Fix For: 0.16.0

 Attachments: 2443-v3.patch, 2443-v4.patch, 2443-v5.patch, 
 2443-v6.patch


 Currently, when the client gets a NotServingRegionException -- usually 
 because its in middle of being split or there has been a regionserver crash 
 and region is being moved elsewhere -- the client does a complete refresh of 
 its cache of region locations for a table.
 Chatting with Jim about a Paul Saab upload issue from Saturday night, when 
 tables are big comprised of regions that are splitting fast (because of bulk 
 upload), its unlikely a client will ever be able to obtain a stable list of 
 all region locations.  Given that any update or scan requires that the list 
 of all regions be in place before it proceeds, this can get in the way of the 
 client succeeding when the cluster is under load.
 Chatting, we figure that it better the client holds a lazy region cache: on 
 NSRE, figure out where that region has gone only and update the client-side 
 cache for that entry only rather than throw out all we know of a table every 
 time.
 Hopefully this will fix the issue PS was experiencing where during intense 
 upload, he was unable to get/scan/hql the same table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2584) Web UI displays an IOException instead of the Tables

2008-01-11 Thread Lars George (JIRA)
Web UI displays an IOException instead of the Tables


 Key: HADOOP-2584
 URL: https://issues.apache.org/jira/browse/HADOOP-2584
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.2
Reporter: Lars George


For me after every second restart I get an error when loading the Hbase UI. 
Here the page:



   Master: 192.168.105.11:6 
   
  HQL, Local logs, Thread Dump, Log Level   
   
 
__
 


Master Attributes   


  Attribute Name
  Value Description 

   Filesystem   lv1-xen-pdc-2.worldlingo.com:9000 Filesystem hbase is 
running on
   Hbase Root Directory /hbaseLocation of hbase 
home directory  


Online META Regions 



 Name   Server  

   -ROOT-192.168.105.31:60020   

   .META.,,1 192.168.105.39:60020   



Tables  


  error msg : java.io.IOException: java.io.IOException: HStoreScanner failed 
construction at
  org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:1879) at
  org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2000) at
  org.apache.hadoop.hbase.HStore.getScanner(HStore.java:1822) at
  org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1543) at 
  org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1118) at 
  org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1465) at
  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at 
   java.lang.reflect.Method.invoke(Method.java:585) at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:401) at   
   org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) Caused by: 
org.apache.hadoop.ipc.RemoteException:  
   java.io.IOException: File does not exist: 
/hbase/hregion_1028785192/info/mapfiles/6628785818889695133/data at 
   
   org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489) at   

   org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1380) at

   org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:425) at 
sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
   Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at  
   java.lang.reflect.Method.invoke(Method.java:585) at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:401) at   
   org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at 
org.apache.hadoop.ipc.Client.call(Client.java:509) at   
   org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at 
org.apache.hadoop.dfs.$Proxy1.getFileInfo(Unknown Source) at   
   

[jira] Commented: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists

2008-01-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558172#action_12558172
 ] 

Raghu Angadi commented on HADOOP-2582:
--

Seems like a bug in FileUtil.copy(), shouldn't it throw an expection or at 
least return false if source file does not exist?


 hadoop dfs -copyToLocal creates zero byte files, when source file does not 
 exists 
 --

 Key: HADOOP-2582
 URL: https://issues.apache.org/jira/browse/HADOOP-2582
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
 Attachments: HADOOP_2582_1.patch


 hadoop dfs -copyToLocal with an no existing source file creates a zero byte 
 destination file. It should throw an error message indicating the source file 
 does not exists.
 {noformat}
 [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile
 [lohit@ hadoop-trunk]$ ls -l nosuchfile 
 -rw-r--r--  1 lohit users 0 Jan 11 21:58 nosuchfile
 [lohit@ hadoop-trunk]$
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2585) Automatic namespace recovery from the secondary image.

2008-01-11 Thread Konstantin Shvachko (JIRA)
Automatic namespace recovery from the secondary image.
--

 Key: HADOOP-2585
 URL: https://issues.apache.org/jira/browse/HADOOP-2585
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko


Hadoop has a three way (configuration controlled) protection from loosing the 
namespace image.
# image can be replicated on different hard-drives of the same node;
# image can be replicated on a nfs mounted drive on an independent node;
# a stale replica of the image is created during periodic checkpointing and 
stored on the secondary name-node.

Currently during startup the name-node examines all configured storage 
directories, selects the
most up to date image, reads it, merges with the corresponding edits, and 
writes to the new image back 
into all storage directories. Everything is done automatically.

If due to multiple hardware failures none of those images on mounted hard 
drives (local or remote) 
are available the secondary image although stale (up to one hour old by 
default) can be still 
used in order to recover the majority of the file system data.
Currently one can reconstruct a valid name-node image from the secondary one 
manually.
It would be nice to support an automatic recovery.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists

2008-01-11 Thread lohit vijayarenu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lohit vijayarenu updated HADOOP-2582:
-

Attachment: HADOOP_2582_1.patch

Attached is a simple patch which checks for existence of source before 
initiating copy. Have updated TestDFSShell test case to check for this 
condition as well.

 hadoop dfs -copyToLocal creates zero byte files, when source file does not 
 exists 
 --

 Key: HADOOP-2582
 URL: https://issues.apache.org/jira/browse/HADOOP-2582
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
 Attachments: HADOOP_2582_1.patch


 hadoop dfs -copyToLocal with an no existing source file creates a zero byte 
 destination file. It should throw an error message indicating the source file 
 does not exists.
 {noformat}
 [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile
 [lohit@ hadoop-trunk]$ ls -l nosuchfile 
 -rw-r--r--  1 lohit users 0 Jan 11 21:58 nosuchfile
 [lohit@ hadoop-trunk]$
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Mukund Madhugiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Madhugiri updated HADOOP-2570:
-

Status: Open  (was: Patch Available)

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227

2008-01-11 Thread Mukund Madhugiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Madhugiri updated HADOOP-2570:
-

Status: Patch Available  (was: Open)

trying to trigger the patch process to pick it up, as i dont see it in the Q

 streaming jobs fail after HADOOP-2227
 -

 Key: HADOOP-2570
 URL: https://issues.apache.org/jira/browse/HADOOP-2570
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
Assignee: Amareshwari Sri Ramadasu
Priority: Blocker
 Fix For: 0.15.3

 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt


 HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed 
 like this
 {code}
 File jobCacheDir = new File(currentDir.getParentFile().getParent(), work);
 {code}
 We should change this to get it working. Referring to the changes made in 
 HADOOP-2227, I see that the APIs used in there to construct the path are not 
 public. And hard coding the path in streaming does not look good. thought?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters

2008-01-11 Thread Christophe Taton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Taton updated HADOOP-2583:
-

Attachment: 2583-20080112-1.patch

The patch:
- prevents UI loops: UI updates are now synchronous
- let the UI reflects not implemented functions (disabled buttons)
- removes some unused files (old todo, old unused icons)


 Potential Eclipse plug-in UI loop when editing location parameters
 --

 Key: HADOOP-2583
 URL: https://issues.apache.org/jira/browse/HADOOP-2583
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/eclipse-plugin
Reporter: Christophe Taton
Assignee: Christophe Taton
Priority: Minor
 Fix For: 0.16.0


 The UI might enter an infinite loop, when propagating parameters 
 asynchronously.
 Some functions are not yet implemented

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-11 Thread Billy Pearson (JIRA)
Splits getting blocked by compactions causeing region to be offline for the 
length of the compaction 10-15 mins
---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
 Fix For: 0.16.0


The below is cut out of one of my region servers logs full log attached

What is happening is there is one region on a this region server and its is 
under heave insert load so compaction are back to back one one finishes a new 
one starts the problem starts when its time to split the region. 

A compaction starts just millsecs before the split starts blocking the split 
but the split closes the region before the compaction is finished. Causing the 
region to be offline until the compaction is done. Once the compaction is done 
the split finishes and all is returned to normal but this is a big problem for 
production if the region is offline for 10-15 mins.

The solution would be not to let the split thread to issue the below line while 
a compaction on that region is happening.
2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
webdata,,1200085987488 closing (Adding to retiringRegions)

The only time I have seen this bug is when there is only one region on a region 
server because if more then one then the compaction happens to the other 
region(s) after the first one is done compaction and the split can do what it 
needs on the first region with out getting blocked.

{code}
2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
completed on region webdata,,1200085987488. Took 16mins, 10sec
2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
HStore webdata,,1200085987488/size needed.
2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size 
needs compaction
2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
compaction on region webdata,,1200085987488
2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
compaction of 14 files using 
/gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
 for webdata,,1200085987488/size
2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache 
flush for region webdata,,1200085987488. Size 31.2m
2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
size is 64.0m
2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
webdata,,1200085987488 closing (Adding to retiringRegions)
...
lots of NotServingRegionException's
...
2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
completed on region webdata,,1200085987488. Took 10mins, 58sec
...

2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
/gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
took 11mins, 0sec
2008-01-11 16:33:02,227 DEBUG 
org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. 
Doing a find...
2008-01-11 16:33:02,283 DEBUG 
org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for 
.META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: 
, encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: 
info, max versions: 1, compression: NONE, in memory: false, max length: 
2147483647, bloom filter: none}}}
2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
.META. with region split info
2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting 
region split to master
2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
split, META update, and report to master all successful. Old 
region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2586) Add version to servers' startup massage.

2008-01-11 Thread Konstantin Shvachko (JIRA)
Add version to servers' startup massage.


 Key: HADOOP-2586
 URL: https://issues.apache.org/jira/browse/HADOOP-2586
 Project: Hadoop
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko


It would be useful if hadoop servers printed hadoop version as a part of the 
startup message:
{code}
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = my-hadoop-host
STARTUP_MSG:   args = [-upgrade]
STARTUP_MSG: Version = 0.15.1, r599161
/
{code}

This would simplify understanding the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2585) Automatic namespace recovery from the secondary image.

2008-01-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558173#action_12558173
 ] 

Konstantin Shvachko commented on HADOOP-2585:
-

We had a real example of such failure on one of our clusters.
And we were able to reconstruct the namespace image from the secondary node 
using the following 
manual procedure, which might be useful for those who find themselves in the 
same type of trouble.

h4. Manual recovery procedure from the secondary image.
# Stop the cluster to make sure all data-nodes and *-tracker are down.
# Select a node where you will run a new name-node, and set it up as usually 
for the name-node.
# Format the new name-node.
# cd dfs.name.dir/current
# You will see file VERSION in there. You will need to provide namespaceID of 
the old cluster in it. 
The old namespaceID could be obtained from one of the data-nodes 
just copy it from dfs.data.dir/current/VERSION.namespaceID
# rm dfs.name.dir/current/fsimage
# scp secondary-node:fs.checkpoint.dir/destimage.tmp ./fsimage
# Start the cluster. Upgrade is recommended, so that you could rollback if 
something goes wrong.
# Run fsck, and remove files with missing blocks if any.

h4. Automatic recovery proposal.
The proposal consists has 2 parts.
# The secondary node should store the latest check-pointed image file in 
compliance with the
name-node storage directory structure. It is best if secondary node uses 
Storage class 
(or FSImage if code re-use makes sense here) in order to maintain the 
checkpoint directory.
This should provide that the checkpointed image is always ready to be read by a 
name-node
if the directory is listed in its dfs.name.dir list.
# The name-node should consider the configuration variable fs.checkpoint.dir 
as a possible
location of the image available for read-only access during startup.
This means that if name-node finds all directories listed in dfs.name.dir 
unavailable or
finds their images corrupted, then it should turn to the fs.checkpoint.dir 
directory
and try to fetch the image from there. I think this should not be the default 
behavior but 
rather triggered by a name-node startup option, something like:
{code}
hadoop namenode -fromCheckpoint
{code}
So the name-node can start with the secondary image as long as the secondary 
node drive is mounted.
And the name-node will never attempt to write anything to this drive.

h4. Added bonuses provided by this approach
- One can choose to restart failed name-node directly on the node where the 
secondary node ran.
This brings us a step closer to the hot standby.
- Replication of the image to NFS can be delegated to the secondary name-node 
if we will
support multiple entries in fs.checkpoint.dir. This is of course if the 
administrator
chooses to accept outdated images in order to boost the name-node performance.


 Automatic namespace recovery from the secondary image.
 --

 Key: HADOOP-2585
 URL: https://issues.apache.org/jira/browse/HADOOP-2585
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko

 Hadoop has a three way (configuration controlled) protection from loosing the 
 namespace image.
 # image can be replicated on different hard-drives of the same node;
 # image can be replicated on a nfs mounted drive on an independent node;
 # a stale replica of the image is created during periodic checkpointing and 
 stored on the secondary name-node.
 Currently during startup the name-node examines all configured storage 
 directories, selects the
 most up to date image, reads it, merges with the corresponding edits, and 
 writes to the new image back 
 into all storage directories. Everything is done automatically.
 If due to multiple hardware failures none of those images on mounted hard 
 drives (local or remote) 
 are available the secondary image although stale (up to one hour old by 
 default) can be still 
 used in order to recover the majority of the file system data.
 Currently one can reconstruct a valid name-node image from the secondary one 
 manually.
 It would be nice to support an automatic recovery.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters

2008-01-11 Thread Christophe Taton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Taton updated HADOOP-2583:
-

Attachment: (was: 2583-20080112-1.patch)

 Potential Eclipse plug-in UI loop when editing location parameters
 --

 Key: HADOOP-2583
 URL: https://issues.apache.org/jira/browse/HADOOP-2583
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/eclipse-plugin
Reporter: Christophe Taton
Assignee: Christophe Taton
Priority: Minor
 Fix For: 0.16.0


 The UI might enter an infinite loop, when propagating parameters 
 asynchronously.
 Some functions are not yet implemented

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters

2008-01-11 Thread Christophe Taton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Taton updated HADOOP-2583:
-

Attachment: 2583-20080112-1.patch

 Potential Eclipse plug-in UI loop when editing location parameters
 --

 Key: HADOOP-2583
 URL: https://issues.apache.org/jira/browse/HADOOP-2583
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/eclipse-plugin
Reporter: Christophe Taton
Assignee: Christophe Taton
Priority: Minor
 Fix For: 0.16.0

 Attachments: 2583-20080112-1.patch


 The UI might enter an infinite loop, when propagating parameters 
 asynchronously.
 Some functions are not yet implemented

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-11 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2587:
-

Assignee: Jim Kellerman

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558179#action_12558179
 ] 

hairong edited comment on HADOOP-2566 at 1/11/08 5:01 PM:


I am still not comfortable with this change:

1. Some of shell commands like delete, copy, and rename use globPath but don't 
need FileStatus.
2. GlobPath does not always call listPath for every directory. For example, 
globPath(/user/*/data) needs only to listPath(/user). Returning 
FileStatus[] requires additional listPath calls on each user xx's home 
directory /user/xx and the root /. This is a lot of overhead. 

  was (Author: hairong):
I am still not comfortable with this change:

1. Some of shell commands like delete, copy, and rename use globPath but don't 
need FileStatus.
2. GlobPath does not always call listPath for every directory. For example, 
globPath(/user/*/data) needs only to listPath(/user). Returning 
FileStatus[] requires listPath on each user xx's home directory /user/xx and 
/user/xx/data. This is a lot of overhead. 
  
 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-11 Thread Billy Pearson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Pearson updated HADOOP-2587:
--

Attachment: hbase-root-regionserver-PE1750-3.log

attached Full log from region server

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method

2008-01-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558179#action_12558179
 ] 

Hairong Kuang commented on HADOOP-2566:
---

I am still not comfortable with this change:

1. Some of shell commands like delete, copy, and rename use globPath but don't 
need FileStatus.
2. GlobPath does not always call listPath for every directory. For example, 
globPath(/user/*/data) needs only to listPath(/user). Returning 
FileStatus[] requires listPath on each user xx's home directory /user/xx and 
/user/xx/data. This is a lot of overhead. 

 need FileSystem#globStatus method
 -

 Key: HADOOP-2566
 URL: https://issues.apache.org/jira/browse/HADOOP-2566
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Reporter: Doug Cutting
Assignee: Hairong Kuang
 Fix For: 0.16.0


 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
 performance, we must use file enumeration APIs that return FileStatus[] 
 rather than Path[].  Currently we have FileSystem#globPaths(), but that 
 method should be deprecated and replaced with a FileSystem#globStatus().
 We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
 cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists

2008-01-11 Thread lohit vijayarenu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lohit vijayarenu updated HADOOP-2582:
-

Attachment: HADOOP_2582_2.patch

Thanks Raghu, I have attached another patch which fixes FileUtil. Now we catch 
both -get and -put errors. 

 hadoop dfs -copyToLocal creates zero byte files, when source file does not 
 exists 
 --

 Key: HADOOP-2582
 URL: https://issues.apache.org/jira/browse/HADOOP-2582
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
 Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch


 hadoop dfs -copyToLocal with an no existing source file creates a zero byte 
 destination file. It should throw an error message indicating the source file 
 does not exists.
 {noformat}
 [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile
 [lohit@ hadoop-trunk]$ ls -l nosuchfile 
 -rw-r--r--  1 lohit users 0 Jan 11 21:58 nosuchfile
 [lohit@ hadoop-trunk]$
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2464) Test permissions related shell commands with DFS

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558183#action_12558183
 ] 

Hadoop QA commented on HADOOP-2464:
---

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372850/HADOOP-2464.patch
against trunk revision r611333.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/console

This message is automatically generated.

 Test permissions related shell commands with DFS
 

 Key: HADOOP-2464
 URL: https://issues.apache.org/jira/browse/HADOOP-2464
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Affects Versions: 0.16.0
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.16.0

 Attachments: HADOOP-2464.patch, HADOOP-2464.patch, HADOOP-2464.patch


 HADOOP-2336 adds FsShell commands for changing permissions for files. But it 
 is not tested on DFS since that requires HADOOP-1298. Once HADOOP-1298 is 
 committed, we should add unit tests for DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-1015) slaves are not recognized by name

2008-01-11 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HADOOP-1015.
-

   Resolution: Cannot Reproduce
Fix Version/s: 0.16.0

This looks like a stale issue. It should not matter whether you specify slaves 
by names or ip addresses as long as your shell recognizes where to ssh. Don't 
have Ubuntu to try this but it seams to work in my environment with current 
trunk.
I am closing it but please feel free to reopen and describe the problem in more 
details if it persists.

 slaves are not recognized by name
 -

 Key: HADOOP-1015
 URL: https://issues.apache.org/jira/browse/HADOOP-1015
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.10.1
 Environment: Ubuntu 6.06 
Reporter: moz devil
Priority: Minor
 Fix For: 0.16.0


 After upgrading from nutch 0.8.1 (has Hadoop 0.4.0) to nutch 0.9.0 (with 
 hadoop 0.10.1), the datanodes where starting with bin/start-all.sh but did 
 not appear in the Hadoop Map/Reduce Administration screen. Only the datanode 
 where the namenode is also running appeared. I was using local dns names 
 which worked fine with hadoop 0.4.0. Now I use ip addresses which give no 
 problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2540) Empty blocks make fsck report corrupt, even when it isn't

2008-01-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558188#action_12558188
 ] 

Hadoop QA commented on HADOOP-2540:
---

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372952/recoverLastBlock2.patch
against trunk revision r611333.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/console

This message is automatically generated.

 Empty blocks make fsck report corrupt, even when it isn't
 -

 Key: HADOOP-2540
 URL: https://issues.apache.org/jira/browse/HADOOP-2540
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.1
Reporter: Allen Wittenauer
Assignee: dhruba borthakur
Priority: Blocker
 Fix For: 0.15.3

 Attachments: recoverLastBlock.patch, recoverLastBlock2.patch


 If the name node crashes after blocks have been allocated and before the 
 content has been uploaded, fsck will report the zero sized files as corrupt 
 upon restart:
 /user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 
 blocks of total size 0 B
 ... even though all blocks are accounted for:
 Status: CORRUPT
  Total size:2932802658847 B
  Total blocks:  26603 (avg. block size 110243305 B)
  Total dirs:419
  Total files:   5031
  Over-replicated blocks:197 (0.740518 %)
  Under-replicated blocks:   0 (0.0 %)
  Target replication factor: 3
  Real replication factor:   3.0074053
 The filesystem under path '/' is CORRUPT
 In UFS and related filesystems, such files would get put into lost+found 
 after an fsck and the filesystem would return back to normal.  It would be 
 super if HDFS could do a similar thing.  Perhaps if all of the nodes stored 
 in the name node's 'includes' file have reported in, HDFS could automatically 
 run a fsck and store these not-necessarily-broken files in something like 
 lost+found.  
 Files that are actually missing blocks, however, should not be touched.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2346) DataNode should have timeout on socket writes.

2008-01-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated HADOOP-2346:
-

Attachment: HADOOP-2346.patch


This patch implements write timeout on datanodes for block reads.  Currently 
only client reads have write timeout. Once the fix looks good , we can write 
timeout in other places (while writing mirror for e.g.).

This adds two classes SocketInputStream and SocketOutputStream in IOUtils. 
Please suggest better names.


 DataNode should have timeout on socket writes.
 --

 Key: HADOOP-2346
 URL: https://issues.apache.org/jira/browse/HADOOP-2346
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.1
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Attachments: HADOOP-2346.patch


 If a client opens a file and stops reading in the middle, DataNode thread 
 writing the data could be stuck forever. For DataNode sockets we set read 
 timeout but not write timeout. I think we should add a write(data, timeout) 
 method in IOUtils that assumes it the underlying FileChannel is non-blocking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2555) Refactor the HTable#get and HTable#getRow methods to avoid repetition of retry-on-failure logic

2008-01-11 Thread Bryan Duxbury (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558191#action_12558191
 ] 

Bryan Duxbury commented on HADOOP-2555:
---

Generally, I love this patch. It'll nicely reduce the amount of copy-paste we 
have.

I'd like to maybe take it one step further, though. Instead of the existing 
callServerWithRetries, how about something like:

{code}

  protected abstract class ServerCallableT implements CallableT {
HRegionLocation location;
HRegionInterface server;
Text row;

protected ServerCallable(Text row){
  this.row = row;
}

void instantiateServer(boolean reload) throws IOException {
  if (reload) {
tableServers = connection.reloadTableServers(tableName);
  }
  location = getRegionLocation(row);
  server = connection.getHRegionConnection(location.getServerAddress());
}
  }
  
  protected T T getRegionServerWithRetries(ServerCallableT callable) 
  throws IOException, UnexpectedCallableException {
for(int tries = 0; tries  numRetries; tries++) {
  try {
callable.instantiateServer(tries == 0);
return callable.call();
  } catch (IOException e) {
if (e instanceof RemoteException) {
  e = RemoteExceptionHandler.decodeRemoteException((RemoteException) e);
}
if (tries == numRetries - 1) {
  throw e;
}
if (LOG.isDebugEnabled()) {
  LOG.debug(reloading table servers because:  + e.getMessage());
}
  } catch (Exception e) {
throw new UnexpectedCallableException(e);
  }
  try {
Thread.sleep(pause);
  } catch (InterruptedException e) {
// continue
  }
}
return null;
  }
{code}

which takes us from 

{code}
value = this.callServerWithRetries(new CallableMapWritable() {
  public MapWritable call() throws IOException {
HRegionLocation r = getRegionLocation(row);
HRegionInterface server =
  connection.getHRegionConnection(r.getServerAddress());
return server.getRow(r.getRegionInfo().getRegionName(), row, ts);
  }
});
{code}

to 

{code}
value = this.callServerWithRetries(new ServerCallableMapWritable(row) {
  public MapWritable call() throws IOException {
  return server.getRow(location.getRegionInfo().getRegionName(), row, ts);
  }
});
{code}

This would save a few lines of code inside each internal block, move a little 
more logic into the helper method, and generally jive better with the way my 
HADOOP-2443 patch is going to need to work in the near future. 

Comments?

 Refactor the HTable#get and HTable#getRow methods to avoid repetition of 
 retry-on-failure logic
 ---

 Key: HADOOP-2555
 URL: https://issues.apache.org/jira/browse/HADOOP-2555
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Peter Dolan
Priority: Minor
 Attachments: hadoop-2555.patch


 The following code is repeated in every one of HTable#get and HTable#getRow 
 methods:
 {code:title=HTable.java|borderStyle=solid}
 MapWritable value = null;
 for (int tries = 0; tries  numRetries; tries++) {
   HRegionLocation r = getRegionLocation(row);
   HRegionInterface server =
 connection.getHRegionConnection(r.getServerAddress());
   
   try {
 value = server.getRow(r.getRegionInfo().getRegionName(), row, ts);  
 // This is the only line of code that changes significantly between methods
 break;
 
   } catch (IOException e) {
 if (e instanceof RemoteException) {
   e = RemoteExceptionHandler.decodeRemoteException((RemoteException) 
 e);
 }
 if (tries == numRetries - 1) {
   // No more tries
   throw e;
 }
 if (LOG.isDebugEnabled()) {
   LOG.debug(reloading table servers because:  + e.getMessage());
 }
 tableServers = connection.reloadTableServers(tableName);
   }
   try {
 Thread.sleep(this.pause);
 
   } catch (InterruptedException x) {
 // continue
   }
 }
 {code}
 This should be factored out into a protected method that handles 
 retry-on-failure logic to facilitate more robust testing and the development 
 of new API methods.
 Proposed modification:
 // Execute the provided Callable against the server
 protected T callServerWithRetries(CallableT callable) throws 
 RemoteException;
 The above code could then be reduced to:
 {code:title=HTable.java|borderStyle=solid}
 MapWritable value = null;
 final connection;
 try {
   value = callServerWithRetries(new CallableMapWritable() {
 HRegionLocation r = getRegionLocation(row);
 HRegionInterface server =

[jira] Commented: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists

2008-01-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558192#action_12558192
 ] 

Raghu Angadi commented on HADOOP-2582:
--

+1. looks good. 

 hadoop dfs -copyToLocal creates zero byte files, when source file does not 
 exists 
 --

 Key: HADOOP-2582
 URL: https://issues.apache.org/jira/browse/HADOOP-2582
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
 Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch


 hadoop dfs -copyToLocal with an no existing source file creates a zero byte 
 destination file. It should throw an error message indicating the source file 
 does not exists.
 {noformat}
 [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile
 [lohit@ hadoop-trunk]$ ls -l nosuchfile 
 -rw-r--r--  1 lohit users 0 Jan 11 21:58 nosuchfile
 [lohit@ hadoop-trunk]$
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2464) Test permissions related shell commands with DFS

2008-01-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated HADOOP-2464:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this.

 Test permissions related shell commands with DFS
 

 Key: HADOOP-2464
 URL: https://issues.apache.org/jira/browse/HADOOP-2464
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Affects Versions: 0.16.0
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.16.0

 Attachments: HADOOP-2464.patch, HADOOP-2464.patch, HADOOP-2464.patch


 HADOOP-2336 adds FsShell commands for changing permissions for files. But it 
 is not tested on DFS since that requires HADOOP-1298. Once HADOOP-1298 is 
 committed, we should add unit tests for DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists

2008-01-11 Thread lohit vijayarenu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lohit vijayarenu updated HADOOP-2582:
-

Status: Patch Available  (was: Open)

thanks raghu. making it PA

 hadoop dfs -copyToLocal creates zero byte files, when source file does not 
 exists 
 --

 Key: HADOOP-2582
 URL: https://issues.apache.org/jira/browse/HADOOP-2582
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.15.2
Reporter: lohit vijayarenu
 Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch


 hadoop dfs -copyToLocal with an no existing source file creates a zero byte 
 destination file. It should throw an error message indicating the source file 
 does not exists.
 {noformat}
 [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile
 [lohit@ hadoop-trunk]$ ls -l nosuchfile 
 -rw-r--r--  1 lohit users 0 Jan 11 21:58 nosuchfile
 [lohit@ hadoop-trunk]$
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2555) Refactor the HTable#get and HTable#getRow methods to avoid repetition of retry-on-failure logic

2008-01-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558196#action_12558196
 ] 

stack commented on HADOOP-2555:
---

Patch looks great.

Don't include patch for CHANGES.txt (though I think notes for new contribs 
recommend it).  Too often, its reason a patch fails to apply up on hudson.

If UnexpectedCallableException, you log it with method args -- thats an 
improvement -- but you don't rethrow the cause; rather you return null.  
Wouldn't letting the original exception out be better?

P.S. I like Bryan's suggestion.

 Refactor the HTable#get and HTable#getRow methods to avoid repetition of 
 retry-on-failure logic
 ---

 Key: HADOOP-2555
 URL: https://issues.apache.org/jira/browse/HADOOP-2555
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Peter Dolan
Priority: Minor
 Attachments: hadoop-2555.patch


 The following code is repeated in every one of HTable#get and HTable#getRow 
 methods:
 {code:title=HTable.java|borderStyle=solid}
 MapWritable value = null;
 for (int tries = 0; tries  numRetries; tries++) {
   HRegionLocation r = getRegionLocation(row);
   HRegionInterface server =
 connection.getHRegionConnection(r.getServerAddress());
   
   try {
 value = server.getRow(r.getRegionInfo().getRegionName(), row, ts);  
 // This is the only line of code that changes significantly between methods
 break;
 
   } catch (IOException e) {
 if (e instanceof RemoteException) {
   e = RemoteExceptionHandler.decodeRemoteException((RemoteException) 
 e);
 }
 if (tries == numRetries - 1) {
   // No more tries
   throw e;
 }
 if (LOG.isDebugEnabled()) {
   LOG.debug(reloading table servers because:  + e.getMessage());
 }
 tableServers = connection.reloadTableServers(tableName);
   }
   try {
 Thread.sleep(this.pause);
 
   } catch (InterruptedException x) {
 // continue
   }
 }
 {code}
 This should be factored out into a protected method that handles 
 retry-on-failure logic to facilitate more robust testing and the development 
 of new API methods.
 Proposed modification:
 // Execute the provided Callable against the server
 protected T callServerWithRetries(CallableT callable) throws 
 RemoteException;
 The above code could then be reduced to:
 {code:title=HTable.java|borderStyle=solid}
 MapWritable value = null;
 final connection;
 try {
   value = callServerWithRetries(new CallableMapWritable() {
 HRegionLocation r = getRegionLocation(row);
 HRegionInterface server =
 connection.getHRegionConnection(r.getServerAddress());
 server.getRow(r.getRegionInfo().getRegionName(), row, ts);
   });
 } catch (RemoteException e) {
   // handle unrecoverable remote exceptions
 }
 {code}
 This would greatly ease the development of new API methods by reducing the 
 amount of code needed to implement a new method and reducing the amount of 
 logic that needs to be tested per method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2584) Web UI displays an IOException instead of the Tables

2008-01-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558197#action_12558197
 ] 

stack commented on HADOOP-2584:
---

Any other context in the master logs that you might think of use Lars?  (Are 
you running w/ DEBUG enabled?  If not, see hbase FAQ for how).

 Web UI displays an IOException instead of the Tables
 

 Key: HADOOP-2584
 URL: https://issues.apache.org/jira/browse/HADOOP-2584
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.2
Reporter: Lars George

 For me after every second restart I get an error when loading the Hbase UI. 
 Here the page:
   
  Master: 192.168.105.11:6 

   HQL, Local logs, Thread Dump, Log Level 
  
  
 __
  
   
   
 Master Attributes 
   
   
 Attribute Name
   Value Description   
   
Filesystem   lv1-xen-pdc-2.worldlingo.com:9000 Filesystem hbase is 
 running on
Hbase Root Directory /hbaseLocation of hbase 
 home directory  
   
   
 Online META Regions   
   
   
   
  Name   Server
   
-ROOT-192.168.105.31:60020 
   
.META.,,1 192.168.105.39:60020 
   
   
   
 Tables
   
   error msg : java.io.IOException: java.io.IOException: HStoreScanner failed 
 construction at
   org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:1879) at
   org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2000) at
   org.apache.hadoop.hbase.HStore.getScanner(HStore.java:1822) at
   org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1543) at 
   org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1118) at 
   org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1465) 
 at
   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
   
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
 at 

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at 
java.lang.reflect.Method.invoke(Method.java:585) at 
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:401) at   
org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) Caused by: 
 org.apache.hadoop.ipc.RemoteException:  
java.io.IOException: File does not exist: 
 /hbase/hregion_1028785192/info/mapfiles/6628785818889695133/data at   
  
org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489) at 
   
org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1380) at  
   
org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:425) at 
 sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at