[jira] Created: (HADOOP-2580) Ability to apply user specified filters on the task logs
Ability to apply user specified filters on the task logs Key: HADOOP-2580 URL: https://issues.apache.org/jira/browse/HADOOP-2580 Project: Hadoop Issue Type: Improvement Reporter: Amar Kamat Priority: Minor It would be great if the user can specify some filters on the task logs for example _grep 'Thread'_ to view log messages and timings on the _Thread_ related messages in the task logs. It would be of great use in case of debugging/analysis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2178) Job history on HDFS
[ https://issues.apache.org/jira/browse/HADOOP-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558010#action_12558010 ] Amareshwari Sri Ramadasu commented on HADOOP-2178: -- To address use cases 1 and 2 suggested by Eric, I propose the following approach. If the job tracker is static, we will store history logs in a location specified by hadoop.job.history.location, by default it is local file system. If the job tracker is not static (like HOD JT) we will store log files in user specified location, by default it is job output directory. We will not have index file any more, because appending becomes an issue in DFS. And we dont need one in case of non-static JT. For static JT, We can do listing of files in the log directory to show the first page. Job history on HDFS --- Key: HADOOP-2178 URL: https://issues.apache.org/jira/browse/HADOOP-2178 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Amareshwari Sri Ramadasu Assignee: Amareshwari Sri Ramadasu Fix For: 0.16.0 This issue addresses the following items : 1. Check for accuracy of job tracker history logs. 2. After completion of the job, copy the JobHistory.log(Master index file) and the job history files to the DFS. 3. User can load the history with commands bin/hadoop job -history directory or bin/hadoop job -history jobid This will start a stand-alone jetty and load jsps -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2311: -- Priority: Trivial (was: Critical) Dropping priority since this bug has not re-occurred. [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Trivial Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore postlog,img149/4699/133lm0.jpg,1196318393738/ip needed.
[jira] Commented: (HADOOP-2178) Job history on HDFS
[ https://issues.apache.org/jira/browse/HADOOP-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558029#action_12558029 ] Runping Qi commented on HADOOP-2178: Even with hod JT, we still need to address case 3. That is, we need a central place to store the job history files so that they can be analyzed offline. It can be the same place in the local file system as it is now, or some common directory in DFS. This is in addition to the one in the output directory. Job history on HDFS --- Key: HADOOP-2178 URL: https://issues.apache.org/jira/browse/HADOOP-2178 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Amareshwari Sri Ramadasu Assignee: Amareshwari Sri Ramadasu Fix For: 0.16.0 This issue addresses the following items : 1. Check for accuracy of job tracker history logs. 2. After completion of the job, copy the JobHistory.log(Master index file) and the job history files to the DFS. 3. User can load the history with commands bin/hadoop job -history directory or bin/hadoop job -history jobid This will start a stand-alone jetty and load jsps -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558025#action_12558025 ] Jim Kellerman commented on HADOOP-2311: --- Have we seen any more occurrences of this problem? If not should we close this issue as not reproducable and open a new one if it should happen again? [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Critical Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28
[jira] Created: (HADOOP-2581) Counters and other useful stats should be logged into Job History log
Counters and other useful stats should be logged into Job History log - Key: HADOOP-2581 URL: https://issues.apache.org/jira/browse/HADOOP-2581 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Runping Qi The following stats are useful and available to JT but not logged job history log: 1. The counters of each job 2. The counters of each mapper/reducer attempt 3. The info about the input splits (filename, split size, on which nodes) 3. The input split for each mapper attempt Those data is useful and important for mining to find out performance related problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2394) Add supprt for migrating between hbase versions
[ https://issues.apache.org/jira/browse/HADOOP-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558037#action_12558037 ] Jim Kellerman commented on HADOOP-2394: --- stack wrote: I ain't too invested in our supporting reverse migrations but its worth noting that any migration system worth its salt - systems I've worked on in the past and ruby on rails - go both ways if only to facilitate testing of the forward migration (inevitably there's a bug when you try to migrate real data). That's what backups are for :) More importantly though, HADOOP-2478 incorporates a migration tool. The specifics of what the tool does will have to be rewritten for each upgrade, but I think the framework is good. Add supprt for migrating between hbase versions --- Key: HADOOP-2394 URL: https://issues.apache.org/jira/browse/HADOOP-2394 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Johan Oskarsson If Hbase is to be used to serve data to live systems we would need a way to upgrade both the underlying hadoop installation and hbase to newer versions with minimal downtime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558048#action_12558048 ] Hadoop QA commented on HADOOP-2570: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372956/patch-2570.txt against trunk revision r611056. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1543/console This message is automatically generated. streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2500) [HBase] Unreadable region kills region servers
[ https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558032#action_12558032 ] Jim Kellerman commented on HADOOP-2500: --- Bryan Duxbury wrote: At the very least, we should not assign a region to a region server if it is detected as no good. That is an unfortunate wording of a log message in the Master. It is saying that the current assignment of the region is no good because the information it read from the meta region had a server or start code that did not match a known server. It does not mean that the master thinks the region itself is no good. Also, if a RegionServer tries to access a region and it has difficulties, it should report to the master that it can't read the region, and the master should stop trying to serve it. From a more general standpoint, maybe when a bad region is detected, its files should be moved to a different location and generally excluded from the cluster. This would allow you to recover from problems better. Yes, we absolutely need to do something, just not sure exactly what yet. One thing for certain. zero length files should be ignored/deleted. [HBase] Unreadable region kills region servers -- Key: HADOOP-2500 URL: https://issues.apache.org/jira/browse/HADOOP-2500 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: CentOS 5 Reporter: Chris Kline Priority: Critical Backgound: The name node (also a DataNode and RegionServer) in our cluster ran out of disk space. I created some space, restarted HDFS and fsck reported corruption with an HBase file. I cleared up that corruption and restarted HBase. I was still unable to read anything from HBase even though HSFS was now healthy. The following was gather from the log files. When HMaster starts up, it finds a region that is no good (Key: 17_125736271): 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current assignment of spider_pages,17_125736271,1198286140018 is no good HMaster then assigns this region to RegionServer X.60: 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 The RegionServer has trouble reading that region (from the RegionServer log on X.60); Note that the worker thread exits 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting spider_pages,17_125736271,1198286140018/meta (2062710340/meta with reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for hstore spider_pages,17_125736271,1198286140018/meta (2062710340/meta) is 4549496 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region spider_pages,17_125736271,1198286140018 java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344) at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697) at org.apache.hadoop.hbase.HStore.init(HStore.java:632) at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled exception java.lang.NullPointerException at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting The HMaster then tries to assign the same region to X.60 again and fails. The HMaster tries to assign the region to X.31 with the same result (X.31 worker thread exits). The file it is complaining about, /data/hbase1/hregion_2062710340/oldlogfile.log, is a
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558078#action_12558078 ] Hairong Kuang commented on HADOOP-2566: --- I do not see why we need globStatus. GlobPath is essentially pattern matching. If the provided path does not contain any pattern, the given path is returned without talking to the namenode. need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2500) [HBase] Unreadable region kills region servers
[ https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558036#action_12558036 ] Bryan Duxbury commented on HADOOP-2500: --- So, we should: * Change the no good message to something a tad more descriptive, like assignment of region is invalid * Enumerate the known ways that a RegionServer can fail to serve a region, trap those problems, and figure out what responses we'd like to give to those events [HBase] Unreadable region kills region servers -- Key: HADOOP-2500 URL: https://issues.apache.org/jira/browse/HADOOP-2500 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: CentOS 5 Reporter: Chris Kline Priority: Critical Backgound: The name node (also a DataNode and RegionServer) in our cluster ran out of disk space. I created some space, restarted HDFS and fsck reported corruption with an HBase file. I cleared up that corruption and restarted HBase. I was still unable to read anything from HBase even though HSFS was now healthy. The following was gather from the log files. When HMaster starts up, it finds a region that is no good (Key: 17_125736271): 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current assignment of spider_pages,17_125736271,1198286140018 is no good HMaster then assigns this region to RegionServer X.60: 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 The RegionServer has trouble reading that region (from the RegionServer log on X.60); Note that the worker thread exits 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting spider_pages,17_125736271,1198286140018/meta (2062710340/meta with reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for hstore spider_pages,17_125736271,1198286140018/meta (2062710340/meta) is 4549496 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region spider_pages,17_125736271,1198286140018 java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344) at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697) at org.apache.hadoop.hbase.HStore.init(HStore.java:632) at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled exception java.lang.NullPointerException at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting The HMaster then tries to assign the same region to X.60 again and fails. The HMaster tries to assign the region to X.31 with the same result (X.31 worker thread exits). The file it is complaining about, /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero-length file in HDFS. After deleting that file and restarting HBase, HBase appears to be back to normal. One thing I can't figure out is that the HMaster log show several entries after the worker thread on X.60 has exited suggesting that the RegionServer is talking with HMaster: 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 There is no corresponding entry in the RegionServer's log. -- This message is
[jira] Updated: (HADOOP-2562) globPaths does not support {ab,cd} as it claims to
[ https://issues.apache.org/jira/browse/HADOOP-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-2562: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks Hairong! globPaths does not support {ab,cd} as it claims to -- Key: HADOOP-2562 URL: https://issues.apache.org/jira/browse/HADOOP-2562 Project: Hadoop Issue Type: Bug Components: fs Affects Versions: 0.15.2 Reporter: Hairong Kuang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.15.3 Attachments: globFix.patch Olga reports: According to 0.15 documentation, FileSystem::globPaths supports {ab,cd} matching. However, when I tried to use it with pattern /data/mydata/{data1,data2} I got no results even though I could find the individual files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2077) Logging version number (and compiled date) at STARTUP_MSG
[ https://issues.apache.org/jira/browse/HADOOP-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558079#action_12558079 ] Hadoop QA commented on HADOOP-2077: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372845/HADOOP-2077_0_20080110.patch against trunk revision r611056. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1544/console This message is automatically generated. Logging version number (and compiled date) at STARTUP_MSG --- Key: HADOOP-2077 URL: https://issues.apache.org/jira/browse/HADOOP-2077 Project: Hadoop Issue Type: Improvement Components: dfs, mapred Reporter: Koji Noguchi Assignee: Arun C Murthy Priority: Trivial Fix For: 0.16.0 Attachments: HADOOP-2077_0_20080110.patch, HADOOP-2077_0_20080110.patch This will help us figure out which version of hadoop we were running when looking back the logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558067#action_12558067 ] Hairong Kuang commented on HADOOP-2566: --- Did you mean that we need FileStatus[] listStatus rather than Path[] listPaths? need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2431) Test HDFS File Permissions
[ https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-2431: -- Attachment: (was: PermissionsTestPlan.pdf) Test HDFS File Permissions -- Key: HADOOP-2431 URL: https://issues.apache.org/jira/browse/HADOOP-2431 Project: Hadoop Issue Type: Test Components: test Affects Versions: 0.15.1 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: HDFSPermissionSpecification6.pdf, PermissionsTestPlan1.pdf, testDFSPermission.patch, testDFSPermission1.patch This jira is intended to provide junit tests to HADOOP-1298. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1298) adding user info to file
[ https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-1298: -- Attachment: HDFSPermissionSpecification6.pdf Updated specification reflects a change that the permissions of listing a directory requires rx permissions. adding user info to file Key: HADOOP-1298 URL: https://issues.apache.org/jira/browse/HADOOP-1298 Project: Hadoop Issue Type: New Feature Components: dfs, fs Affects Versions: 0.16.0 Reporter: Kurtis Heimerl Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.16.0 Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 1298_20071221b.patch, 1298_20071228s.patch, 1298_20080103.patch, hadoop-user-munncha.patch17, HDFSPermissionSpecification5.pdf, HDFSPermissionSpecification6.pdf I'm working on adding a permissions model to hadoop's DFS. The first step is this change, which associates user info with files. Following this I'll assoicate permissions info, then block methods based on that user info, then authorization of the user info. So, right now i've implemented adding user info to files. I'm looking for feedback before I clean this up and make it offical. I wasn't sure what release, i'm working off trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2431) Test HDFS File Permissions
[ https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-2431: -- Attachment: testDFSPermission1.patch I found one permission checking semantics error. Because dfs list is equivalent to unix ls -l, listing a directory needs both SEARCH and READ permissions on the directory. This patch fixed the problem. It also added javadoc to the unit tests. Test HDFS File Permissions -- Key: HADOOP-2431 URL: https://issues.apache.org/jira/browse/HADOOP-2431 Project: Hadoop Issue Type: Test Components: test Affects Versions: 0.15.1 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: PermissionsTestPlan.pdf, testDFSPermission.patch, testDFSPermission1.patch This jira is intended to provide junit tests to HADOOP-1298. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2394) Add supprt for migrating between hbase versions
[ https://issues.apache.org/jira/browse/HADOOP-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558040#action_12558040 ] stack commented on HADOOP-2394: --- There is no framework that I can see in HADOOP-2478. The is just a single script that addresses a single migration incident. Add supprt for migrating between hbase versions --- Key: HADOOP-2394 URL: https://issues.apache.org/jira/browse/HADOOP-2394 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Johan Oskarsson If Hbase is to be used to serve data to live systems we would need a way to upgrade both the underlying hadoop installation and hbase to newer versions with minimal downtime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks
[ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558045#action_12558045 ] Devaraj Das commented on HADOOP-2116: - The only problem with this are the incompatible changes (like ../work and ../work/script); code, especially scripts that assume paths will break. So, is everyone okay with this for 0.16? Should we do the symlink stuff to maintain backward compatibility. As an aside, in the directory organization Owen suggested, one thing that needs to be added is the common scratch space for all tasks (like the file cache). Another thing IMO is that we should probably just do the basic dir organization as was proposed by Amareshwari earlier and the streaming fix. The magnitude of the change required by the dir organization proposed by Owen seems pretty significant and seems aggressive for 0.16. Maybe we can do the remaining for 0.17. Thoughts? Job.local.dir to be exposed to tasks Key: HADOOP-2116 URL: https://issues.apache.org/jira/browse/HADOOP-2116 Project: Hadoop Issue Type: Improvement Components: mapred Affects Versions: 0.14.3 Environment: All Reporter: Milind Bhandarkar Assignee: Amareshwari Sri Ramadasu Fix For: 0.16.0 Attachments: patch-2116.txt, patch-2116.txt Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1965) Handle map output buffers better
[ https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558105#action_12558105 ] Hadoop QA commented on HADOOP-1965: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372632/HADOOP-2419.patch against trunk revision r611264. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1545/console This message is automatically generated. Handle map output buffers better Key: HADOOP-1965 URL: https://issues.apache.org/jira/browse/HADOOP-1965 Project: Hadoop Issue Type: Improvement Components: mapred Affects Versions: 0.16.0 Reporter: Devaraj Das Assignee: Amar Kamat Fix For: 0.16.0 Attachments: 1965_single_proc_150mb_gziped.jpeg, 1965_single_proc_150mb_gziped.pdf, 1965_single_proc_150mb_gziped_breakup.png, HADOOP-1965-1.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-2419.patch, HADOOP-2419.patch, HADOOP-2419.patch, HADOOP-2419.patch Today, the map task stops calling the map method while sort/spill is using the (single instance of) map output buffer. One improvement that can be done to improve performance of the map task is to have another buffer for writing the map outputs to, while sort/spill is using the first buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558129#action_12558129 ] Doug Cutting commented on HADOOP-2566: -- Globbing is implemented on top of listPaths() which is implemented on top of listStatus(). The primitive globbing API should not throw away that status information. It should keep it so that glob clients which need it do not have to call getStatus() for each file that matches. Currently the cache of FileStatus hides the cost of these getStatus() calls, but that cache will break things once files and their status can change. So we need globStatus() before we can remove the cache. FileInputFormat, for example, uses globPaths() to list files matching the input specification, then it uses getStatus() on each matching path when building splits. This must change to call globStatus() before the cache is removed. Long-term, globPaths() and listPaths() may perhaps still be useful as a utility methods implemented in terms of of globStatus() and listStatus(), but since most current users of these will be broken performancewise once the cache is removed, we should deprecate them now to strongly encourage folks to stop using them before that cache is removed, to give fair warning. need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558071#action_12558071 ] Doug Cutting commented on HADOOP-2566: -- No, we need 'FileStatus[] globStatus(Path pattern)' instead of 'Path[] globPaths(Path pattern)'. need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558080#action_12558080 ] Arun C Murthy commented on HADOOP-2570: --- All tests fail with: {noformat} 2008-01-11 17:35:53,433 INFO mapred.TaskTracker (TaskTracker.java:launchTaskForJob(703)) - org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_20080735_0001/work in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1395) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:1469) at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:693) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:686) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1279) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:920) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1315) at org.apache.hadoop.mapred.MiniMRCluster$TaskTrackerRunner.run(MiniMRCluster.java:144) at java.lang.Thread.run(Thread.java:595) {noformat} The problem is that the LocalDirAllocator.getLocalPathToRead throws and exception when the path is not found - this patch should handle that exception and go-ahead to create the symlink... streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2431) Test HDFS File Permissions
[ https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-2431: -- Attachment: PermissionsTestPlan1.pdf Attach an updated test plan Test HDFS File Permissions -- Key: HADOOP-2431 URL: https://issues.apache.org/jira/browse/HADOOP-2431 Project: Hadoop Issue Type: Test Components: test Affects Versions: 0.15.1 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: HDFSPermissionSpecification6.pdf, PermissionsTestPlan1.pdf, testDFSPermission.patch, testDFSPermission1.patch This jira is intended to provide junit tests to HADOOP-1298. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2431) Test HDFS File Permissions
[ https://issues.apache.org/jira/browse/HADOOP-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-2431: -- Attachment: HDFSPermissionSpecification6.pdf Attach the permission checking specification. Test HDFS File Permissions -- Key: HADOOP-2431 URL: https://issues.apache.org/jira/browse/HADOOP-2431 Project: Hadoop Issue Type: Test Components: test Affects Versions: 0.15.1 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: HDFSPermissionSpecification6.pdf, PermissionsTestPlan.pdf, testDFSPermission.patch, testDFSPermission1.patch This jira is intended to provide junit tests to HADOOP-1298. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2389) [hbase] provide multiple language bindings for HBase
[ https://issues.apache.org/jira/browse/HADOOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Simpson updated HADOOP-2389: -- Attachment: (was: hbase-thrift.patch) [hbase] provide multiple language bindings for HBase Key: HADOOP-2389 URL: https://issues.apache.org/jira/browse/HADOOP-2389 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Jim Kellerman Priority: Minor Attachments: hbase-thrift.patch, hbase-thrift.patch, Hbase.thrift.txt, libthrift-r746.jar There have been a number of requests for multiple language bindings for HBase. While there is now a REST interface, this may not be suited for high-volume applications. A couple of suggested approaches have been proposed: - Provide a Thrift based API (very fast socket based but some of the languages are not well supported) - Provide a JSON based API over sockets. (faster than REST, but probably slower than Thrift) Others? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2574) bugs in mapred tutorial
[ https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-2574: -- Attachment: HADOOP-2574_0_20080110.patch Here is patch which addresses most of Phu's concerns... bugs in mapred tutorial --- Key: HADOOP-2574 URL: https://issues.apache.org/jira/browse/HADOOP-2574 Project: Hadoop Issue Type: Bug Components: documentation Reporter: Doug Cutting Assignee: Arun C Murthy Fix For: 0.15.3, 0.16.0 Attachments: HADOOP-2574_0_20080110.patch Sam Pullara sends me: {noformat} Phu was going through the WordCount example... lines 52 and 53 should have args[0] and args[1]: http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html The javac and jar command are also wrong, they don't include the directories for the packages, should be: $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d classes WordCount.java $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes . {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2574) bugs in mapred tutorial
[ https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned HADOOP-2574: - Assignee: Arun C Murthy bugs in mapred tutorial --- Key: HADOOP-2574 URL: https://issues.apache.org/jira/browse/HADOOP-2574 Project: Hadoop Issue Type: Bug Components: documentation Reporter: Doug Cutting Assignee: Arun C Murthy Fix For: 0.15.3, 0.16.0 Attachments: HADOOP-2574_0_20080110.patch Sam Pullara sends me: {noformat} Phu was going through the WordCount example... lines 52 and 53 should have args[0] and args[1]: http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html The javac and jar command are also wrong, they don't include the directories for the packages, should be: $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d classes WordCount.java $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes . {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2574) bugs in mapred tutorial
[ https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-2574: -- Attachment: mapred_tutorial.html Here is how the tutorial looks with this patch... bugs in mapred tutorial --- Key: HADOOP-2574 URL: https://issues.apache.org/jira/browse/HADOOP-2574 Project: Hadoop Issue Type: Bug Components: documentation Reporter: Doug Cutting Assignee: Arun C Murthy Fix For: 0.15.3, 0.16.0 Attachments: HADOOP-2574_0_20080110.patch, mapred_tutorial.html Sam Pullara sends me: {noformat} Phu was going through the WordCount example... lines 52 and 53 should have args[0] and args[1]: http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html The javac and jar command are also wrong, they don't include the directories for the packages, should be: $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d classes WordCount.java $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes . {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558087#action_12558087 ] Arun C Murthy commented on HADOOP-2570: --- Sigh, this exception seems to stem from the fact that the LocalDirAllocator is not used to create the *taskTracker/jobcache/jobid/work* directory at all. It is always created in the same partition as the *taskTracker/jobcache/jobid/* directory. This means LocalDirAllocator doesn't know about the *taskTracker/jobcache/jobid/work* directory at all and hence the DiskErrorException. streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache
[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1707: - Status: Patch Available (was: Open) Remove the DFS Client disk-based cache -- Key: HADOOP-1707 URL: https://issues.apache.org/jira/browse/HADOOP-1707 Project: Hadoop Issue Type: Improvement Components: dfs Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, clientDiskBuffer11.patch, clientDiskBuffer12.patch, clientDiskBuffer14.patch, clientDiskBuffer15.patch, clientDiskBuffer16.patch, clientDiskBuffer17.patch, clientDiskBuffer18.patch, clientDiskBuffer19.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, DataTransferProtocol.doc, DataTransferProtocol.html The DFS client currently uses a staging file on local disk to cache all user-writes to a file. When the staging file accumulates 1 block worth of data, its contents are flushed to a HDFS datanode. These operations occur sequentially. A simple optimization of allowing the user to write to another staging file while simultaneously uploading the contents of the first staging file to HDFS will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2560) Combining multiple input blocks into one mapper
[ https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558042#action_12558042 ] Owen O'Malley commented on HADOOP-2560: --- I like that approach, Doug. We should also have a entry for splits that are local to no nodes in the map/reduce cluster and prefer to steal from them rather than other nodes. This would solve HADOOP-2014... Combining multiple input blocks into one mapper --- Key: HADOOP-2560 URL: https://issues.apache.org/jira/browse/HADOOP-2560 Project: Hadoop Issue Type: Bug Reporter: Runping Qi Currently, an input split contains a consecutive chunk of input file, which by default, corresponding to a DFS block. This may lead to a large number of mapper tasks if the input data is large. This leads to the following problems: 1. Shuffling cost: since the framework has to move M * R map output segments to the nodes running reducers, larger M means larger shuffling cost. 2. High JVM initialization overhead 3. Disk fragmentation: larger number of map output files means lower read throughput for accessing them. Ideally, you want to keep the number of mappers to no more than 16 times the number of nodes in the cluster. To achive that, we can increase the input split size. However, if a split span over more than one dfs block, you lose the data locality scheduling benefits. One way to address this problem is to combine multiple input blocks with the same rack into one split. If in average we combine B blocks into one split, then we will reduce the number of mappers by a factor of B. Since all the blocks for one mapper share a rack, thus we can benefit from rack-aware scheduling. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache
[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558041#action_12558041 ] Mukund Madhugiri commented on HADOOP-1707: -- Running on a 100 node cluster, with the patch clientDiskBuffer19.patch, the sort benchmark showed these results: |*100 nodes*|*trunk*|*trunk + patch*| |randomWriter (hrs)|0.44|0.45| |sort (hrs)|1.03|1| |sortValidation (hrs)|0.39|0.3| Remove the DFS Client disk-based cache -- Key: HADOOP-1707 URL: https://issues.apache.org/jira/browse/HADOOP-1707 Project: Hadoop Issue Type: Improvement Components: dfs Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, clientDiskBuffer11.patch, clientDiskBuffer12.patch, clientDiskBuffer14.patch, clientDiskBuffer15.patch, clientDiskBuffer16.patch, clientDiskBuffer17.patch, clientDiskBuffer18.patch, clientDiskBuffer19.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, DataTransferProtocol.doc, DataTransferProtocol.html The DFS client currently uses a staging file on local disk to cache all user-writes to a file. When the staging file accumulates 1 block worth of data, its contents are flushed to a HDFS datanode. These operations occur sequentially. A simple optimization of allowing the user to write to another staging file while simultaneously uploading the contents of the first staging file to HDFS will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558082#action_12558082 ] Raghu Angadi commented on HADOOP-2566: -- Also, this would not duplicate code. {{globPaths()}} would just be implemented with {{globStatus()}} (when there is a glob in the path). need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558081#action_12558081 ] rangadi edited comment on HADOOP-2566 at 1/11/08 11:36 AM: globStatus would certainly be useful since globPaths() is used in many places where we really want to do globStatus(). globStatus is much more efficient in those cases since we aften do '{{for(path : globPaths(pattern)) { stat = listStatus(path) ... }}}'. I am not sure if globPaths() can go away. One difference I see is that globPath(/non/existent/path/withoutglob) returns simple path without any filesystem interaction (as expected). But globStatus(/non/existent/path/withoutglob) will ask filesystem and will return NULL (or array with zero entries). was (Author: rangadi): globStatus would certainly be useful since globPaths() is used in many places where we really want to do globStatus(). globStatus is much more efficient in those cases since we aften do {{for(path : globPaths(pattern)) { stat = listStatus(path) ... }. I am not sure if globPaths() can go away. One difference I see is that globPath(/non/existent/path/withoutglob) returns simple path without any filesystem interaction (as expected). But globStatus(/non/existent/path/withoutglob) will ask filesystem and will return NULL (or array with zero entries). need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2481) NNBench should periodically report its progress
[ https://issues.apache.org/jira/browse/HADOOP-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558084#action_12558084 ] Mukund Madhugiri commented on HADOOP-2481: -- +1 I tested it on a 500 node cluster and it works fine. Thanks Hairong NNBench should periodically report its progress --- Key: HADOOP-2481 URL: https://issues.apache.org/jira/browse/HADOOP-2481 Project: Hadoop Issue Type: Bug Components: test Affects Versions: 0.15.1 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: NNBench.patch When I run NNBench on a 100-node cluster, some map tasks fail with the error message Task xx failed to report status for yy seconds. Killing!. Map tasks should periodically reports its progress to prevent itself being killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2389) [hbase] provide multiple language bindings for HBase
[ https://issues.apache.org/jira/browse/HADOOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Simpson updated HADOOP-2389: -- Attachment: hbase-thrift.patch re-uploading to apply patch [hbase] provide multiple language bindings for HBase Key: HADOOP-2389 URL: https://issues.apache.org/jira/browse/HADOOP-2389 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Jim Kellerman Priority: Minor Attachments: hbase-thrift.patch, hbase-thrift.patch, Hbase.thrift.txt, libthrift-r746.jar There have been a number of requests for multiple language bindings for HBase. While there is now a REST interface, this may not be suited for high-volume applications. A couple of suggested approaches have been proposed: - Provide a Thrift based API (very fast socket based but some of the languages are not well supported) - Provide a JSON based API over sockets. (faster than REST, but probably slower than Thrift) Others? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2389) [hbase] provide multiple language bindings for HBase
[ https://issues.apache.org/jira/browse/HADOOP-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Simpson updated HADOOP-2389: -- Attachment: hbase-thrift.patch re-uploading [hbase] provide multiple language bindings for HBase Key: HADOOP-2389 URL: https://issues.apache.org/jira/browse/HADOOP-2389 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Jim Kellerman Priority: Minor Attachments: hbase-thrift.patch, hbase-thrift.patch, Hbase.thrift.txt, libthrift-r746.jar There have been a number of requests for multiple language bindings for HBase. While there is now a REST interface, this may not be suited for high-volume applications. A couple of suggested approaches have been proposed: - Provide a Thrift based API (very fast socket based but some of the languages are not well supported) - Provide a JSON based API over sockets. (faster than REST, but probably slower than Thrift) Others? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558088#action_12558088 ] Hairong Kuang commented on HADOOP-2566: --- GlobPath is intended to return all pathes that matches the given glob. It is not intended to do 'for(path : globPaths(pattern)) { stat = listStatus(path) ... }'. The feature that you want is listing all the pathes that matches the glob. need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558081#action_12558081 ] Raghu Angadi commented on HADOOP-2566: -- globStatus would certainly be useful since globPaths() is used in many places where we really want to do globStatus(). globStatus is much more efficient in those cases since we aften do {{for(path : globPaths(pattern)) { stat = listStatus(path) ... }. I am not sure if globPaths() can go away. One difference I see is that globPath(/non/existent/path/withoutglob) returns simple path without any filesystem interaction (as expected). But globStatus(/non/existent/path/withoutglob) will ask filesystem and will return NULL (or array with zero entries). need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2481) NNBench should periodically report its progress
[ https://issues.apache.org/jira/browse/HADOOP-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-2481: -- Status: Patch Available (was: Open) NNBench should periodically report its progress --- Key: HADOOP-2481 URL: https://issues.apache.org/jira/browse/HADOOP-2481 Project: Hadoop Issue Type: Bug Components: test Affects Versions: 0.15.1 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: NNBench.patch When I run NNBench on a 100-node cluster, some map tasks fail with the error message Task xx failed to report status for yy seconds. Killing!. Map tasks should periodically reports its progress to prevent itself being killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2449) Restore the old NN Bench that was replaced by a MR NN Bench
[ https://issues.apache.org/jira/browse/HADOOP-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558144#action_12558144 ] Hadoop QA commented on HADOOP-2449: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371829/fixNNBenchPatch.txt against trunk revision r611264. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1546/console This message is automatically generated. Restore the old NN Bench that was replaced by a MR NN Bench Key: HADOOP-2449 URL: https://issues.apache.org/jira/browse/HADOOP-2449 Project: Hadoop Issue Type: Test Reporter: Sanjay Radia Assignee: Sanjay Radia Attachments: fixNNBenchPatch.txt The old NN Bench did not use Map Reduce. It was replaced by a new NN Bench that uses Map reduce. The old NN Bench is useful and should be restored. - useful ofr simulated data niodes which do not work for Map reduce since the job configs need to be persistent. - a NN test that is independent of map reduce can be useful as it is one less variable in figuring out bottlenecks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2574) bugs in mapred tutorial
[ https://issues.apache.org/jira/browse/HADOOP-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-2574: -- Status: Patch Available (was: Open) bugs in mapred tutorial --- Key: HADOOP-2574 URL: https://issues.apache.org/jira/browse/HADOOP-2574 Project: Hadoop Issue Type: Bug Components: documentation Reporter: Doug Cutting Assignee: Arun C Murthy Fix For: 0.15.3, 0.16.0 Attachments: HADOOP-2574_0_20080110.patch, mapred_tutorial.html Sam Pullara sends me: {noformat} Phu was going through the WordCount example... lines 52 and 53 should have args[0] and args[1]: http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html The javac and jar command are also wrong, they don't include the directories for the packages, should be: $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d classes WordCount.java $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes . {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2116) Job.local.dir to be exposed to tasks
[ https://issues.apache.org/jira/browse/HADOOP-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558150#action_12558150 ] Milind Bhandarkar commented on HADOOP-2116: --- Since this bug is scheduled for 0.16, having incompatible changes in that release is fine (of course, as long as it is flagged such in the release notes.) Job.local.dir to be exposed to tasks Key: HADOOP-2116 URL: https://issues.apache.org/jira/browse/HADOOP-2116 Project: Hadoop Issue Type: Improvement Components: mapred Affects Versions: 0.14.3 Environment: All Reporter: Milind Bhandarkar Assignee: Amareshwari Sri Ramadasu Fix For: 0.16.0 Attachments: patch-2116.txt, patch-2116.txt Currently, since all task cwds are created under a jobcache directory, users that need a job-specific shared directory for use as scratch space, create ../work. This is hacky, and will break when HADOOP-2115 is addressed. For such jobs, hadoop mapred should expose job.local.dir via localized configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558093#action_12558093 ] Raghu Angadi commented on HADOOP-2566: -- 'for(path : globPaths(pattern)) { stat = listStatus(path) ... }'. FsShell.setReplication() is an example of this pattern of use (essentially). I agree that globStatus() may not replace all uses of globPaths(). need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2581) Counters and other useful stats should be logged into Job History log
[ https://issues.apache.org/jira/browse/HADOOP-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558101#action_12558101 ] Owen O'Malley commented on HADOOP-2581: --- The counters *are* logged in job history, as of HADOOP-1210. Counters and other useful stats should be logged into Job History log - Key: HADOOP-2581 URL: https://issues.apache.org/jira/browse/HADOOP-2581 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Runping Qi The following stats are useful and available to JT but not logged job history log: 1. The counters of each job 2. The counters of each mapper/reducer attempt 3. The info about the input splits (filename, split size, on which nodes) 3. The input split for each mapper attempt Those data is useful and important for mining to find out performance related problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558133#action_12558133 ] Arun C Murthy commented on HADOOP-2570: --- Please ignore my previous comments... it's been a long day (maybe the following ones too! *smile*) It seems like the test cases don't have a jar and hence there is an 'if' check in TaskTracker.localizeJob which fails and hence the work directory isn't created. This explains the exception seen in the TaskTracker.launchTaskForJob function. I didn't make any headway after that... streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1965) Handle map output buffers better
[ https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-1965: -- Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks, Amar - this was a long-drawn affair! Handle map output buffers better Key: HADOOP-1965 URL: https://issues.apache.org/jira/browse/HADOOP-1965 Project: Hadoop Issue Type: Improvement Components: mapred Affects Versions: 0.16.0 Reporter: Devaraj Das Assignee: Amar Kamat Fix For: 0.16.0 Attachments: 1965_single_proc_150mb_gziped.jpeg, 1965_single_proc_150mb_gziped.pdf, 1965_single_proc_150mb_gziped_breakup.png, HADOOP-1965-1.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-1965-Benchmark.patch, HADOOP-2419.patch, HADOOP-2419.patch, HADOOP-2419.patch, HADOOP-2419.patch Today, the map task stops calling the map method while sort/spill is using the (single instance of) map output buffer. One improvement that can be done to improve performance of the map task is to have another buffer for writing the map outputs to, while sort/spill is using the first buffer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2581) Counters and other useful stats should be logged into Job History log
[ https://issues.apache.org/jira/browse/HADOOP-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558147#action_12558147 ] Runping Qi commented on HADOOP-2581: Cool. That will address the first 2 items. Is it intended to be in 0.16? Is it possible to included in 0.15.3? We still need to log the split info. Counters and other useful stats should be logged into Job History log - Key: HADOOP-2581 URL: https://issues.apache.org/jira/browse/HADOOP-2581 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Runping Qi The following stats are useful and available to JT but not logged job history log: 1. The counters of each job 2. The counters of each mapper/reducer attempt 3. The info about the input splits (filename, split size, on which nodes) 3. The input split for each mapper attempt Those data is useful and important for mining to find out performance related problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-2570: -- Attachment: HADOOP-2570_1_20080112.patch bq. It seems like the test cases don't have a jar and hence there is an 'if' check in TaskTracker.localizeJob which fails and hence the work directory isn't created. This explains the exception seen in the TaskTracker.launchTaskForJob function. Here is patch which fixes TaskTracker.localizeJob to fix the problem described above, along with Amareshwari's original fix. streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-2570: -- Status: Patch Available (was: Open) streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2014) Job Tracker should not clobber the data locality of tasks
[ https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558155#action_12558155 ] eric baldeschwieler commented on HADOOP-2014: - An ideal solution would maintain some sort of prioritized list of maps / node / rack so that we execute work first that is unlikely to find another efficient location to execute. It would also make sense to place some no local work early, since these tasks run slowly, on nodes that are likely to run out of local work relatively early. One could also pay attention to IO load on each source node... At a minimum we should track maps that have no local option and schedule them first when a node has no local option. (As doug cutting suggested in hadoop-2560 Job Tracker should not clobber the data locality of tasks - Key: HADOOP-2014 URL: https://issues.apache.org/jira/browse/HADOOP-2014 Project: Hadoop Issue Type: Bug Components: mapred Reporter: Runping Qi Assignee: Devaraj Das Currently, when the Job Tracker assigns a mapper task to a task tracker and there is no local split to the task tracker, the job tracker will find the first runable task in the mast task list and assign the task to the task tracker. The split for the task is not local to the task tracker, of course. However, the split may be local to other task trackers. Assigning the that task, to that task tracker may decrease the potential number of mapper attempts with data locality. The desired behavior in this situation is to choose a task whose split is not local to any task tracker. Resort to the current behavior only if no such task is found. In general, it will be useful to know the number of task trackers to which each split is local. To assign a task to a task tracker, the job tracker should first try to pick a task that is local to the task tracker and that has minimal number of task trackers to which it is local. If no task is local to the task tracker, the job tracker should try to pick a task that has minimal number of task trackers to which it is local. It is worthwhile to instrument the job tracker code to report the number of splits that are local to some task trackers. That should be the maximum number of tasks with data locality. By comparing that number with the the actual number of data local mappers launched, we can know the effectiveness of the job tracker scheduling. When we introduce rack locality, we should apply the same principle. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2014) Job Tracker should not clobber the data locality of tasks
[ https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558155#action_12558155 ] eric14 edited comment on HADOOP-2014 at 1/11/08 2:41 PM: -- An ideal solution would maintain some sort of prioritized list of maps / node / rack so that we execute work first that is unlikely to find another efficient location to execute. It would also make sense to place some no local work early, since these tasks run slowly, on nodes that are likely to run out of local work relatively early. One could also pay attention to IO load on each source node... At a minimum we should track maps that have no local option and schedule them first when a node has no local option. (As doug cutting suggested in HADOOP-2560) was (Author: eric14): An ideal solution would maintain some sort of prioritized list of maps / node / rack so that we execute work first that is unlikely to find another efficient location to execute. It would also make sense to place some no local work early, since these tasks run slowly, on nodes that are likely to run out of local work relatively early. One could also pay attention to IO load on each source node... At a minimum we should track maps that have no local option and schedule them first when a node has no local option. (As doug cutting suggested in hadoop-2560 Job Tracker should not clobber the data locality of tasks - Key: HADOOP-2014 URL: https://issues.apache.org/jira/browse/HADOOP-2014 Project: Hadoop Issue Type: Bug Components: mapred Reporter: Runping Qi Assignee: Devaraj Das Currently, when the Job Tracker assigns a mapper task to a task tracker and there is no local split to the task tracker, the job tracker will find the first runable task in the mast task list and assign the task to the task tracker. The split for the task is not local to the task tracker, of course. However, the split may be local to other task trackers. Assigning the that task, to that task tracker may decrease the potential number of mapper attempts with data locality. The desired behavior in this situation is to choose a task whose split is not local to any task tracker. Resort to the current behavior only if no such task is found. In general, it will be useful to know the number of task trackers to which each split is local. To assign a task to a task tracker, the job tracker should first try to pick a task that is local to the task tracker and that has minimal number of task trackers to which it is local. If no task is local to the task tracker, the job tracker should try to pick a task that has minimal number of task trackers to which it is local. It is worthwhile to instrument the job tracker code to report the number of splits that are local to some task trackers. That should be the maximum number of tasks with data locality. By comparing that number with the the actual number of data local mappers launched, we can know the effectiveness of the job tracker scheduling. When we introduce rack locality, we should apply the same principle. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558159#action_12558159 ] lohit vijayarenu commented on HADOOP-2570: -- testing the streaming job again. This patch solves the problem seen earlier. Thanks! streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2398) Additional Instrumentation for NameNode, RPC Layer and JMX support
[ https://issues.apache.org/jira/browse/HADOOP-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558158#action_12558158 ] Hadoop QA commented on HADOOP-2398: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372639/metricsPatch6_4.patch against trunk revision r611264. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1547/console This message is automatically generated. Additional Instrumentation for NameNode, RPC Layer and JMX support --- Key: HADOOP-2398 URL: https://issues.apache.org/jira/browse/HADOOP-2398 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Sanjay Radia Assignee: Sanjay Radia Fix For: 0.16.0 Attachments: metricsPatch6.txt, metricsPatch6_1.txt, metricsPatch6_2.txt, metricsPatch6_3.txt, metricsPatch6_4.patch, ScreenShotNameNodeStats.png, ScreenShotRPCStats.png Additional Instrumentation is needed for name node and its rpc layer. Furthermore the instrumentation should be visible via JMX, Java's standard monitoring tool. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2581) Counters and other useful stats should be logged into Job History log
[ https://issues.apache.org/jira/browse/HADOOP-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558154#action_12558154 ] Owen O'Malley commented on HADOOP-2581: --- It is already in trunk and therefore will be in 0.16. It can not be included in 0.15.3, because that branch is closed except for bug fixes. Counters and other useful stats should be logged into Job History log - Key: HADOOP-2581 URL: https://issues.apache.org/jira/browse/HADOOP-2581 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Runping Qi The following stats are useful and available to JT but not logged job history log: 1. The counters of each job 2. The counters of each mapper/reducer attempt 3. The info about the input splits (filename, split size, on which nodes) 3. The input split for each mapper attempt Those data is useful and important for mining to find out performance related problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters
Potential Eclipse plug-in UI loop when editing location parameters -- Key: HADOOP-2583 URL: https://issues.apache.org/jira/browse/HADOOP-2583 Project: Hadoop Issue Type: Bug Reporter: Christophe Taton Assignee: Christophe Taton Priority: Minor Fix For: 0.16.0 The UI might enter an infinite loop, when propagating parameters asynchronously. Some functions are not yet implemented -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2576) Namenode performance degradation over time
[ https://issues.apache.org/jira/browse/HADOOP-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kunz updated HADOOP-2576: --- Priority: Blocker (was: Major) Namenode performance degradation over time -- Key: HADOOP-2576 URL: https://issues.apache.org/jira/browse/HADOOP-2576 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Christian Kunz Priority: Blocker We have a cluster running the same applications again and again with a high turnover of files. The performance of these applications seem to be correlated to the lifetime of the namenode: After starting the namenode, the applications need increasingly more time to complete, with about 50% more time after 1 week. During that time the namenode average cpu usage increases from typically 10% to 30%, memory usage nearly doubles (although the average amount of data on dfs stays the same), and the average load factor increases by a factor of 2-3 (although not significantly high, 2). When looking at the namenode and datanode logs, I see a lot of asks to delete blocks coming from the namenode for blocks not in the blockmap of the datanodes, repeatedly for the same blocks. When I counted the number of blocks asked by the namenode to be deleted, I noticed a noticeable increase with the lifetime of the namenode (a factor of 2-3 after 1 week). This makes me wonder whether the namenode does not purge the list of invalid blocks from non-existing blocks. But independently, the namenode has a degradation issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters
[ https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christophe Taton updated HADOOP-2583: - Component/s: contrib/eclipse-plugin Potential Eclipse plug-in UI loop when editing location parameters -- Key: HADOOP-2583 URL: https://issues.apache.org/jira/browse/HADOOP-2583 Project: Hadoop Issue Type: Bug Components: contrib/eclipse-plugin Reporter: Christophe Taton Assignee: Christophe Taton Priority: Minor Fix For: 0.16.0 The UI might enter an infinite loop, when propagating parameters asynchronously. Some functions are not yet implemented -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1650) Upgrade Jetty to 6.x
[ https://issues.apache.org/jira/browse/HADOOP-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558162#action_12558162 ] Mukund Madhugiri commented on HADOOP-1650: -- I did a re-run on the 500 node cluster and don't see the NotReplicatedYetException and SocketTimeoutException. I do see the OutOfMemoryError when the sort job is running: 2008-01-11 20:54:10,375 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200801112025_0002_m_005337_0: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:137) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:373) at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2043) Here is the sort data: |*500 nodes*|*trunk*|*trunk + patch*| |randomWriter (mins)|27|25| |sort (mins)|86|107| |sortValidation (mins)|22|21| Upgrade Jetty to 6.x Key: HADOOP-1650 URL: https://issues.apache.org/jira/browse/HADOOP-1650 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Devaraj Das Assignee: Devaraj Das Attachments: hadoop-1650-jetty6.1.5.patch, hadoop-jetty6.1.4-lib.tar.gz, hadoop-jetty6.1.6-lib.tar.gz, jetty-hadoop-6.1.6.patch, jetty-hbase.patch, jetty6.1.4.patch, jetty6.1.6.patch This is the third attempt at moving to jetty6. Apparently, the jetty-6.1.4 has fixed some of the issues we discovered in jetty during HADOOP-736 and HADOOP-1273. I'd like to keep this issue open for sometime so that we have enough time to test out things. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2558) [hbase] fixes for build up on hudson
[ https://issues.apache.org/jira/browse/HADOOP-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HADOOP-2558. --- Resolution: Fixed Fix Version/s: 0.16.0 Closing. hbase contrib tests just ran successfully three times in a row (#1545-#1547 -- latter two failed in core tests). My guess is that they are as broke as they used to be again: i.e. they'll fail once in a while but generally they succeed. Will open specific issues to address future failures. [hbase] fixes for build up on hudson Key: HADOOP-2558 URL: https://issues.apache.org/jira/browse/HADOOP-2558 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Fix For: 0.16.0 Attachments: 2558-v2.patch, 2558-v3.patch, 2558-v4.patch, 2558-v5.patch, 2558.patch Fixes for hbase breakage up on hudson. There seem to be many reasons for the failings. One is that the .META. region of a sudden decides its 'no good' and it gets deployed elsewhere. Tests don't have the tolerance for this kinda churn. A previous commit adding in logging of why .META. is 'no good'. Hopefully that will help. Found also a case where TestTableMapReduce would fail because no sleep between retries when getting new scanners. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2567) add FileSystem#getHomeDirectory() method
[ https://issues.apache.org/jira/browse/HADOOP-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated HADOOP-2567: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. add FileSystem#getHomeDirectory() method Key: HADOOP-2567 URL: https://issues.apache.org/jira/browse/HADOOP-2567 Project: Hadoop Issue Type: New Feature Components: fs Reporter: Doug Cutting Assignee: Doug Cutting Fix For: 0.16.0 Attachments: 2567-3.patch, HADOOP-2567-1.patch, HADOOP-2567-2.patch, HADOOP-2567.patch The FileSystem API would benefit from a getHomeDirectory() method. The default implementation would return /user/$USER/. RawLocalFileSystem would return System.getProperty(user.home). HADOOP-2514 can use this to implement per-user trash. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2578) HBaseConfiguration(Configuration c) constructor shouldn't overwrite passed conf with defaults
[ https://issues.apache.org/jira/browse/HADOOP-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558167#action_12558167 ] stack commented on HADOOP-2578: --- Would adding a test that checked that c.getResource(hbase-default.xml) and c.getResource(hbase-site.xml) were non-null in HBaseConfiguration(Configuration c) work? HBaseConfiguration(Configuration c) constructor shouldn't overwrite passed conf with defaults - Key: HADOOP-2578 URL: https://issues.apache.org/jira/browse/HADOOP-2578 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Thomas Garner While testing out mapreduce with hbase, in the map portion of a task, the map would try to connect to an hbase master at localhost/127.0.0.1. The config passed to the hbaseconfiguration contained the necessary hbase configuration information, but I assume was being overwritten by the defaults in the config files during addResource, as commenting out addHbaseResources in the constructor fixed the symptom. I would expect the configs to be layered on top of each other, e.g. default, then site, then the config passed as a parameter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1876) Persisting completed jobs status
[ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558168#action_12558168 ] Hadoop QA commented on HADOOP-1876: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372904/patch1876.txt against trunk revision r611315. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1548/console This message is automatically generated. Persisting completed jobs status Key: HADOOP-1876 URL: https://issues.apache.org/jira/browse/HADOOP-1876 Project: Hadoop Issue Type: Improvement Components: mapred Environment: all Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 0.16.0 Attachments: patch1876.txt, patch1876.txt, patch1876.txt Currently the JobTracker keeps information about completed jobs in memory. This information is flushed from the cache when it has outlived (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). Also, if the JobTracker is restarted (due to being recycled or due to a crash) information about completed jobs is lost. If any of the above scenarios happens before the job information is queried by a hadoop client (normally the job submitter or a monitoring component) there is no way to obtain such information. A way to avoid this is the JobTracker to persist in DFS the completed jobs information upon job completion. This would be done at the time the job is moved to the completed jobs queue. Then when querying the JobTracker for information about a completed job, if it is not found in the memory queue, a lookup in DFS would be done to retrieve the completed job information. A directory in DFS (under mapred/system) would be used to persist completed job information, for each completed job there would be a directory with the job ID, within that directory all the information about the job: status, jobprofile, counters and completion events. A configuration property will indicate for how log persisted job information should be kept in DFS. After such period it will be cleaned up automatically. This improvement would not introduce API changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2533) [hbase] Performance: Scanning, just creating MapWritable in next consumes 20% CPU
[ https://issues.apache.org/jira/browse/HADOOP-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558170#action_12558170 ] Hadoop QA commented on HADOOP-2533: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372851/2533-v2.patch against trunk revision r611333. @author +1. The patch does not contain any @author tags. patch -1. The patch command could not apply the patch. Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1549/console This message is automatically generated. [hbase] Performance: Scanning, just creating MapWritable in next consumes 20% CPU -- Key: HADOOP-2533 URL: https://issues.apache.org/jira/browse/HADOOP-2533 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: stack Priority: Minor Fix For: 0.16.0 Attachments: 2533-v2.patch, 2533.patch, dirty.patch Every call to HScanner.next creates an instance of MapWritable. MapWritables are expensive. Watching a scan run in the profiler, the setup of the MapWritable -- filling out the idToClassMap and classToIdMap -- consumes 20% of all CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2443) [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list
[ https://issues.apache.org/jira/browse/HADOOP-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2443: -- Attachment: 2443-v6.patch All the tests pass locally at this point, big improvement over the last version. [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list Key: HADOOP-2443 URL: https://issues.apache.org/jira/browse/HADOOP-2443 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Bryan Duxbury Fix For: 0.16.0 Attachments: 2443-v3.patch, 2443-v4.patch, 2443-v5.patch, 2443-v6.patch Currently, when the client gets a NotServingRegionException -- usually because its in middle of being split or there has been a regionserver crash and region is being moved elsewhere -- the client does a complete refresh of its cache of region locations for a table. Chatting with Jim about a Paul Saab upload issue from Saturday night, when tables are big comprised of regions that are splitting fast (because of bulk upload), its unlikely a client will ever be able to obtain a stable list of all region locations. Given that any update or scan requires that the list of all regions be in place before it proceeds, this can get in the way of the client succeeding when the cluster is under load. Chatting, we figure that it better the client holds a lazy region cache: on NSRE, figure out where that region has gone only and update the client-side cache for that entry only rather than throw out all we know of a table every time. Hopefully this will fix the issue PS was experiencing where during intense upload, he was unable to get/scan/hql the same table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2584) Web UI displays an IOException instead of the Tables
Web UI displays an IOException instead of the Tables Key: HADOOP-2584 URL: https://issues.apache.org/jira/browse/HADOOP-2584 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.2 Reporter: Lars George For me after every second restart I get an error when loading the Hbase UI. Here the page: Master: 192.168.105.11:6 HQL, Local logs, Thread Dump, Log Level __ Master Attributes Attribute Name Value Description Filesystem lv1-xen-pdc-2.worldlingo.com:9000 Filesystem hbase is running on Hbase Root Directory /hbaseLocation of hbase home directory Online META Regions Name Server -ROOT-192.168.105.31:60020 .META.,,1 192.168.105.39:60020 Tables error msg : java.io.IOException: java.io.IOException: HStoreScanner failed construction at org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:1879) at org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2000) at org.apache.hadoop.hbase.HStore.getScanner(HStore.java:1822) at org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1543) at org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1118) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:401) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File does not exist: /hbase/hregion_1028785192/info/mapfiles/6628785818889695133/data at org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489) at org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1380) at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:425) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:401) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at org.apache.hadoop.ipc.Client.call(Client.java:509) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy1.getFileInfo(Unknown Source) at
[jira] Commented: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558172#action_12558172 ] Raghu Angadi commented on HADOOP-2582: -- Seems like a bug in FileUtil.copy(), shouldn't it throw an expection or at least return false if source file does not exist? hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu Attachments: HADOOP_2582_1.patch hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2585) Automatic namespace recovery from the secondary image.
Automatic namespace recovery from the secondary image. -- Key: HADOOP-2585 URL: https://issues.apache.org/jira/browse/HADOOP-2585 Project: Hadoop Issue Type: New Feature Components: dfs Affects Versions: 0.15.0 Reporter: Konstantin Shvachko Hadoop has a three way (configuration controlled) protection from loosing the namespace image. # image can be replicated on different hard-drives of the same node; # image can be replicated on a nfs mounted drive on an independent node; # a stale replica of the image is created during periodic checkpointing and stored on the secondary name-node. Currently during startup the name-node examines all configured storage directories, selects the most up to date image, reads it, merges with the corresponding edits, and writes to the new image back into all storage directories. Everything is done automatically. If due to multiple hardware failures none of those images on mounted hard drives (local or remote) are available the secondary image although stale (up to one hour old by default) can be still used in order to recover the majority of the file system data. Currently one can reconstruct a valid name-node image from the secondary one manually. It would be nice to support an automatic recovery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2582: - Attachment: HADOOP_2582_1.patch Attached is a simple patch which checks for existence of source before initiating copy. Have updated TestDFSShell test case to check for this condition as well. hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu Attachments: HADOOP_2582_1.patch hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Madhugiri updated HADOOP-2570: - Status: Open (was: Patch Available) streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2570) streaming jobs fail after HADOOP-2227
[ https://issues.apache.org/jira/browse/HADOOP-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Madhugiri updated HADOOP-2570: - Status: Patch Available (was: Open) trying to trigger the patch process to pick it up, as i dont see it in the Q streaming jobs fail after HADOOP-2227 - Key: HADOOP-2570 URL: https://issues.apache.org/jira/browse/HADOOP-2570 Project: Hadoop Issue Type: Bug Components: contrib/streaming Affects Versions: 0.15.2 Reporter: lohit vijayarenu Assignee: Amareshwari Sri Ramadasu Priority: Blocker Fix For: 0.15.3 Attachments: HADOOP-2570_1_20080112.patch, patch-2570.txt HADOOP-2227 changes jobCacheDir. In streaming, jobCacheDir was constructed like this {code} File jobCacheDir = new File(currentDir.getParentFile().getParent(), work); {code} We should change this to get it working. Referring to the changes made in HADOOP-2227, I see that the APIs used in there to construct the path are not public. And hard coding the path in streaming does not look good. thought? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters
[ https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christophe Taton updated HADOOP-2583: - Attachment: 2583-20080112-1.patch The patch: - prevents UI loops: UI updates are now synchronous - let the UI reflects not implemented functions (disabled buttons) - removes some unused files (old todo, old unused icons) Potential Eclipse plug-in UI loop when editing location parameters -- Key: HADOOP-2583 URL: https://issues.apache.org/jira/browse/HADOOP-2583 Project: Hadoop Issue Type: Bug Components: contrib/eclipse-plugin Reporter: Christophe Taton Assignee: Christophe Taton Priority: Minor Fix For: 0.16.0 The UI might enter an infinite loop, when propagating parameters asynchronously. Some functions are not yet implemented -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Fix For: 0.16.0 The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2586) Add version to servers' startup massage.
Add version to servers' startup massage. Key: HADOOP-2586 URL: https://issues.apache.org/jira/browse/HADOOP-2586 Project: Hadoop Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Konstantin Shvachko It would be useful if hadoop servers printed hadoop version as a part of the startup message: {code} / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = my-hadoop-host STARTUP_MSG: args = [-upgrade] STARTUP_MSG: Version = 0.15.1, r599161 / {code} This would simplify understanding the logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2585) Automatic namespace recovery from the secondary image.
[ https://issues.apache.org/jira/browse/HADOOP-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558173#action_12558173 ] Konstantin Shvachko commented on HADOOP-2585: - We had a real example of such failure on one of our clusters. And we were able to reconstruct the namespace image from the secondary node using the following manual procedure, which might be useful for those who find themselves in the same type of trouble. h4. Manual recovery procedure from the secondary image. # Stop the cluster to make sure all data-nodes and *-tracker are down. # Select a node where you will run a new name-node, and set it up as usually for the name-node. # Format the new name-node. # cd dfs.name.dir/current # You will see file VERSION in there. You will need to provide namespaceID of the old cluster in it. The old namespaceID could be obtained from one of the data-nodes just copy it from dfs.data.dir/current/VERSION.namespaceID # rm dfs.name.dir/current/fsimage # scp secondary-node:fs.checkpoint.dir/destimage.tmp ./fsimage # Start the cluster. Upgrade is recommended, so that you could rollback if something goes wrong. # Run fsck, and remove files with missing blocks if any. h4. Automatic recovery proposal. The proposal consists has 2 parts. # The secondary node should store the latest check-pointed image file in compliance with the name-node storage directory structure. It is best if secondary node uses Storage class (or FSImage if code re-use makes sense here) in order to maintain the checkpoint directory. This should provide that the checkpointed image is always ready to be read by a name-node if the directory is listed in its dfs.name.dir list. # The name-node should consider the configuration variable fs.checkpoint.dir as a possible location of the image available for read-only access during startup. This means that if name-node finds all directories listed in dfs.name.dir unavailable or finds their images corrupted, then it should turn to the fs.checkpoint.dir directory and try to fetch the image from there. I think this should not be the default behavior but rather triggered by a name-node startup option, something like: {code} hadoop namenode -fromCheckpoint {code} So the name-node can start with the secondary image as long as the secondary node drive is mounted. And the name-node will never attempt to write anything to this drive. h4. Added bonuses provided by this approach - One can choose to restart failed name-node directly on the node where the secondary node ran. This brings us a step closer to the hot standby. - Replication of the image to NFS can be delegated to the secondary name-node if we will support multiple entries in fs.checkpoint.dir. This is of course if the administrator chooses to accept outdated images in order to boost the name-node performance. Automatic namespace recovery from the secondary image. -- Key: HADOOP-2585 URL: https://issues.apache.org/jira/browse/HADOOP-2585 Project: Hadoop Issue Type: New Feature Components: dfs Affects Versions: 0.15.0 Reporter: Konstantin Shvachko Hadoop has a three way (configuration controlled) protection from loosing the namespace image. # image can be replicated on different hard-drives of the same node; # image can be replicated on a nfs mounted drive on an independent node; # a stale replica of the image is created during periodic checkpointing and stored on the secondary name-node. Currently during startup the name-node examines all configured storage directories, selects the most up to date image, reads it, merges with the corresponding edits, and writes to the new image back into all storage directories. Everything is done automatically. If due to multiple hardware failures none of those images on mounted hard drives (local or remote) are available the secondary image although stale (up to one hour old by default) can be still used in order to recover the majority of the file system data. Currently one can reconstruct a valid name-node image from the secondary one manually. It would be nice to support an automatic recovery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters
[ https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christophe Taton updated HADOOP-2583: - Attachment: (was: 2583-20080112-1.patch) Potential Eclipse plug-in UI loop when editing location parameters -- Key: HADOOP-2583 URL: https://issues.apache.org/jira/browse/HADOOP-2583 Project: Hadoop Issue Type: Bug Components: contrib/eclipse-plugin Reporter: Christophe Taton Assignee: Christophe Taton Priority: Minor Fix For: 0.16.0 The UI might enter an infinite loop, when propagating parameters asynchronously. Some functions are not yet implemented -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2583) Potential Eclipse plug-in UI loop when editing location parameters
[ https://issues.apache.org/jira/browse/HADOOP-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christophe Taton updated HADOOP-2583: - Attachment: 2583-20080112-1.patch Potential Eclipse plug-in UI loop when editing location parameters -- Key: HADOOP-2583 URL: https://issues.apache.org/jira/browse/HADOOP-2583 Project: Hadoop Issue Type: Bug Components: contrib/eclipse-plugin Reporter: Christophe Taton Assignee: Christophe Taton Priority: Minor Fix For: 0.16.0 Attachments: 2583-20080112-1.patch The UI might enter an infinite loop, when propagating parameters asynchronously. Some functions are not yet implemented -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2587: - Assignee: Jim Kellerman Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558179#action_12558179 ] hairong edited comment on HADOOP-2566 at 1/11/08 5:01 PM: I am still not comfortable with this change: 1. Some of shell commands like delete, copy, and rename use globPath but don't need FileStatus. 2. GlobPath does not always call listPath for every directory. For example, globPath(/user/*/data) needs only to listPath(/user). Returning FileStatus[] requires additional listPath calls on each user xx's home directory /user/xx and the root /. This is a lot of overhead. was (Author: hairong): I am still not comfortable with this change: 1. Some of shell commands like delete, copy, and rename use globPath but don't need FileStatus. 2. GlobPath does not always call listPath for every directory. For example, globPath(/user/*/data) needs only to listPath(/user). Returning FileStatus[] requires listPath on each user xx's home directory /user/xx and /user/xx/data. This is a lot of overhead. need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Pearson updated HADOOP-2587: -- Attachment: hbase-root-regionserver-PE1750-3.log attached Full log from region server Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2566) need FileSystem#globStatus method
[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558179#action_12558179 ] Hairong Kuang commented on HADOOP-2566: --- I am still not comfortable with this change: 1. Some of shell commands like delete, copy, and rename use globPath but don't need FileStatus. 2. GlobPath does not always call listPath for every directory. For example, globPath(/user/*/data) needs only to listPath(/user). Returning FileStatus[] requires listPath on each user xx's home directory /user/xx and /user/xx/data. This is a lot of overhead. need FileSystem#globStatus method - Key: HADOOP-2566 URL: https://issues.apache.org/jira/browse/HADOOP-2566 Project: Hadoop Issue Type: Improvement Components: fs Reporter: Doug Cutting Assignee: Hairong Kuang Fix For: 0.16.0 To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting performance, we must use file enumeration APIs that return FileStatus[] rather than Path[]. Currently we have FileSystem#globPaths(), but that method should be deprecated and replaced with a FileSystem#globStatus(). We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2582: - Attachment: HADOOP_2582_2.patch Thanks Raghu, I have attached another patch which fixes FileUtil. Now we catch both -get and -put errors. hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2464) Test permissions related shell commands with DFS
[ https://issues.apache.org/jira/browse/HADOOP-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558183#action_12558183 ] Hadoop QA commented on HADOOP-2464: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372850/HADOOP-2464.patch against trunk revision r611333. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1550/console This message is automatically generated. Test permissions related shell commands with DFS Key: HADOOP-2464 URL: https://issues.apache.org/jira/browse/HADOOP-2464 Project: Hadoop Issue Type: Improvement Components: dfs Affects Versions: 0.16.0 Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2464.patch, HADOOP-2464.patch, HADOOP-2464.patch HADOOP-2336 adds FsShell commands for changing permissions for files. But it is not tested on DFS since that requires HADOOP-1298. Once HADOOP-1298 is committed, we should add unit tests for DFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1015) slaves are not recognized by name
[ https://issues.apache.org/jira/browse/HADOOP-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HADOOP-1015. - Resolution: Cannot Reproduce Fix Version/s: 0.16.0 This looks like a stale issue. It should not matter whether you specify slaves by names or ip addresses as long as your shell recognizes where to ssh. Don't have Ubuntu to try this but it seams to work in my environment with current trunk. I am closing it but please feel free to reopen and describe the problem in more details if it persists. slaves are not recognized by name - Key: HADOOP-1015 URL: https://issues.apache.org/jira/browse/HADOOP-1015 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.10.1 Environment: Ubuntu 6.06 Reporter: moz devil Priority: Minor Fix For: 0.16.0 After upgrading from nutch 0.8.1 (has Hadoop 0.4.0) to nutch 0.9.0 (with hadoop 0.10.1), the datanodes where starting with bin/start-all.sh but did not appear in the Hadoop Map/Reduce Administration screen. Only the datanode where the namenode is also running appeared. I was using local dns names which worked fine with hadoop 0.4.0. Now I use ip addresses which give no problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2540) Empty blocks make fsck report corrupt, even when it isn't
[ https://issues.apache.org/jira/browse/HADOOP-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558188#action_12558188 ] Hadoop QA commented on HADOOP-2540: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372952/recoverLastBlock2.patch against trunk revision r611333. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1551/console This message is automatically generated. Empty blocks make fsck report corrupt, even when it isn't - Key: HADOOP-2540 URL: https://issues.apache.org/jira/browse/HADOOP-2540 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.1 Reporter: Allen Wittenauer Assignee: dhruba borthakur Priority: Blocker Fix For: 0.15.3 Attachments: recoverLastBlock.patch, recoverLastBlock2.patch If the name node crashes after blocks have been allocated and before the content has been uploaded, fsck will report the zero sized files as corrupt upon restart: /user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 blocks of total size 0 B ... even though all blocks are accounted for: Status: CORRUPT Total size:2932802658847 B Total blocks: 26603 (avg. block size 110243305 B) Total dirs:419 Total files: 5031 Over-replicated blocks:197 (0.740518 %) Under-replicated blocks: 0 (0.0 %) Target replication factor: 3 Real replication factor: 3.0074053 The filesystem under path '/' is CORRUPT In UFS and related filesystems, such files would get put into lost+found after an fsck and the filesystem would return back to normal. It would be super if HDFS could do a similar thing. Perhaps if all of the nodes stored in the name node's 'includes' file have reported in, HDFS could automatically run a fsck and store these not-necessarily-broken files in something like lost+found. Files that are actually missing blocks, however, should not be touched. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2346) DataNode should have timeout on socket writes.
[ https://issues.apache.org/jira/browse/HADOOP-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated HADOOP-2346: - Attachment: HADOOP-2346.patch This patch implements write timeout on datanodes for block reads. Currently only client reads have write timeout. Once the fix looks good , we can write timeout in other places (while writing mirror for e.g.). This adds two classes SocketInputStream and SocketOutputStream in IOUtils. Please suggest better names. DataNode should have timeout on socket writes. -- Key: HADOOP-2346 URL: https://issues.apache.org/jira/browse/HADOOP-2346 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.1 Reporter: Raghu Angadi Assignee: Raghu Angadi Attachments: HADOOP-2346.patch If a client opens a file and stops reading in the middle, DataNode thread writing the data could be stuck forever. For DataNode sockets we set read timeout but not write timeout. I think we should add a write(data, timeout) method in IOUtils that assumes it the underlying FileChannel is non-blocking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2555) Refactor the HTable#get and HTable#getRow methods to avoid repetition of retry-on-failure logic
[ https://issues.apache.org/jira/browse/HADOOP-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558191#action_12558191 ] Bryan Duxbury commented on HADOOP-2555: --- Generally, I love this patch. It'll nicely reduce the amount of copy-paste we have. I'd like to maybe take it one step further, though. Instead of the existing callServerWithRetries, how about something like: {code} protected abstract class ServerCallableT implements CallableT { HRegionLocation location; HRegionInterface server; Text row; protected ServerCallable(Text row){ this.row = row; } void instantiateServer(boolean reload) throws IOException { if (reload) { tableServers = connection.reloadTableServers(tableName); } location = getRegionLocation(row); server = connection.getHRegionConnection(location.getServerAddress()); } } protected T T getRegionServerWithRetries(ServerCallableT callable) throws IOException, UnexpectedCallableException { for(int tries = 0; tries numRetries; tries++) { try { callable.instantiateServer(tries == 0); return callable.call(); } catch (IOException e) { if (e instanceof RemoteException) { e = RemoteExceptionHandler.decodeRemoteException((RemoteException) e); } if (tries == numRetries - 1) { throw e; } if (LOG.isDebugEnabled()) { LOG.debug(reloading table servers because: + e.getMessage()); } } catch (Exception e) { throw new UnexpectedCallableException(e); } try { Thread.sleep(pause); } catch (InterruptedException e) { // continue } } return null; } {code} which takes us from {code} value = this.callServerWithRetries(new CallableMapWritable() { public MapWritable call() throws IOException { HRegionLocation r = getRegionLocation(row); HRegionInterface server = connection.getHRegionConnection(r.getServerAddress()); return server.getRow(r.getRegionInfo().getRegionName(), row, ts); } }); {code} to {code} value = this.callServerWithRetries(new ServerCallableMapWritable(row) { public MapWritable call() throws IOException { return server.getRow(location.getRegionInfo().getRegionName(), row, ts); } }); {code} This would save a few lines of code inside each internal block, move a little more logic into the helper method, and generally jive better with the way my HADOOP-2443 patch is going to need to work in the near future. Comments? Refactor the HTable#get and HTable#getRow methods to avoid repetition of retry-on-failure logic --- Key: HADOOP-2555 URL: https://issues.apache.org/jira/browse/HADOOP-2555 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Peter Dolan Priority: Minor Attachments: hadoop-2555.patch The following code is repeated in every one of HTable#get and HTable#getRow methods: {code:title=HTable.java|borderStyle=solid} MapWritable value = null; for (int tries = 0; tries numRetries; tries++) { HRegionLocation r = getRegionLocation(row); HRegionInterface server = connection.getHRegionConnection(r.getServerAddress()); try { value = server.getRow(r.getRegionInfo().getRegionName(), row, ts); // This is the only line of code that changes significantly between methods break; } catch (IOException e) { if (e instanceof RemoteException) { e = RemoteExceptionHandler.decodeRemoteException((RemoteException) e); } if (tries == numRetries - 1) { // No more tries throw e; } if (LOG.isDebugEnabled()) { LOG.debug(reloading table servers because: + e.getMessage()); } tableServers = connection.reloadTableServers(tableName); } try { Thread.sleep(this.pause); } catch (InterruptedException x) { // continue } } {code} This should be factored out into a protected method that handles retry-on-failure logic to facilitate more robust testing and the development of new API methods. Proposed modification: // Execute the provided Callable against the server protected T callServerWithRetries(CallableT callable) throws RemoteException; The above code could then be reduced to: {code:title=HTable.java|borderStyle=solid} MapWritable value = null; final connection; try { value = callServerWithRetries(new CallableMapWritable() { HRegionLocation r = getRegionLocation(row); HRegionInterface server =
[jira] Commented: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558192#action_12558192 ] Raghu Angadi commented on HADOOP-2582: -- +1. looks good. hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2464) Test permissions related shell commands with DFS
[ https://issues.apache.org/jira/browse/HADOOP-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated HADOOP-2464: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Test permissions related shell commands with DFS Key: HADOOP-2464 URL: https://issues.apache.org/jira/browse/HADOOP-2464 Project: Hadoop Issue Type: Improvement Components: dfs Affects Versions: 0.16.0 Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2464.patch, HADOOP-2464.patch, HADOOP-2464.patch HADOOP-2336 adds FsShell commands for changing permissions for files. But it is not tested on DFS since that requires HADOOP-1298. Once HADOOP-1298 is committed, we should add unit tests for DFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2582) hadoop dfs -copyToLocal creates zero byte files, when source file does not exists
[ https://issues.apache.org/jira/browse/HADOOP-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu updated HADOOP-2582: - Status: Patch Available (was: Open) thanks raghu. making it PA hadoop dfs -copyToLocal creates zero byte files, when source file does not exists -- Key: HADOOP-2582 URL: https://issues.apache.org/jira/browse/HADOOP-2582 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.15.2 Reporter: lohit vijayarenu Attachments: HADOOP_2582_1.patch, HADOOP_2582_2.patch hadoop dfs -copyToLocal with an no existing source file creates a zero byte destination file. It should throw an error message indicating the source file does not exists. {noformat} [lohit@ hadoop-trunk]$ hadoop dfs -get nosuchfile nosuchfile [lohit@ hadoop-trunk]$ ls -l nosuchfile -rw-r--r-- 1 lohit users 0 Jan 11 21:58 nosuchfile [lohit@ hadoop-trunk]$ {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2555) Refactor the HTable#get and HTable#getRow methods to avoid repetition of retry-on-failure logic
[ https://issues.apache.org/jira/browse/HADOOP-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558196#action_12558196 ] stack commented on HADOOP-2555: --- Patch looks great. Don't include patch for CHANGES.txt (though I think notes for new contribs recommend it). Too often, its reason a patch fails to apply up on hudson. If UnexpectedCallableException, you log it with method args -- thats an improvement -- but you don't rethrow the cause; rather you return null. Wouldn't letting the original exception out be better? P.S. I like Bryan's suggestion. Refactor the HTable#get and HTable#getRow methods to avoid repetition of retry-on-failure logic --- Key: HADOOP-2555 URL: https://issues.apache.org/jira/browse/HADOOP-2555 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Peter Dolan Priority: Minor Attachments: hadoop-2555.patch The following code is repeated in every one of HTable#get and HTable#getRow methods: {code:title=HTable.java|borderStyle=solid} MapWritable value = null; for (int tries = 0; tries numRetries; tries++) { HRegionLocation r = getRegionLocation(row); HRegionInterface server = connection.getHRegionConnection(r.getServerAddress()); try { value = server.getRow(r.getRegionInfo().getRegionName(), row, ts); // This is the only line of code that changes significantly between methods break; } catch (IOException e) { if (e instanceof RemoteException) { e = RemoteExceptionHandler.decodeRemoteException((RemoteException) e); } if (tries == numRetries - 1) { // No more tries throw e; } if (LOG.isDebugEnabled()) { LOG.debug(reloading table servers because: + e.getMessage()); } tableServers = connection.reloadTableServers(tableName); } try { Thread.sleep(this.pause); } catch (InterruptedException x) { // continue } } {code} This should be factored out into a protected method that handles retry-on-failure logic to facilitate more robust testing and the development of new API methods. Proposed modification: // Execute the provided Callable against the server protected T callServerWithRetries(CallableT callable) throws RemoteException; The above code could then be reduced to: {code:title=HTable.java|borderStyle=solid} MapWritable value = null; final connection; try { value = callServerWithRetries(new CallableMapWritable() { HRegionLocation r = getRegionLocation(row); HRegionInterface server = connection.getHRegionConnection(r.getServerAddress()); server.getRow(r.getRegionInfo().getRegionName(), row, ts); }); } catch (RemoteException e) { // handle unrecoverable remote exceptions } {code} This would greatly ease the development of new API methods by reducing the amount of code needed to implement a new method and reducing the amount of logic that needs to be tested per method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2584) Web UI displays an IOException instead of the Tables
[ https://issues.apache.org/jira/browse/HADOOP-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558197#action_12558197 ] stack commented on HADOOP-2584: --- Any other context in the master logs that you might think of use Lars? (Are you running w/ DEBUG enabled? If not, see hbase FAQ for how). Web UI displays an IOException instead of the Tables Key: HADOOP-2584 URL: https://issues.apache.org/jira/browse/HADOOP-2584 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.2 Reporter: Lars George For me after every second restart I get an error when loading the Hbase UI. Here the page: Master: 192.168.105.11:6 HQL, Local logs, Thread Dump, Log Level __ Master Attributes Attribute Name Value Description Filesystem lv1-xen-pdc-2.worldlingo.com:9000 Filesystem hbase is running on Hbase Root Directory /hbaseLocation of hbase home directory Online META Regions Name Server -ROOT-192.168.105.31:60020 .META.,,1 192.168.105.39:60020 Tables error msg : java.io.IOException: java.io.IOException: HStoreScanner failed construction at org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:1879) at org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2000) at org.apache.hadoop.hbase.HStore.getScanner(HStore.java:1822) at org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1543) at org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1118) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:401) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File does not exist: /hbase/hregion_1028785192/info/mapfiles/6628785818889695133/data at org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489) at org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1380) at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:425) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at