JvmMetrics
Hi I ran into a use case where I need to keep two contexts for metrics. One being ganglia and the other being a file context (to do offline metrics analysis). I altered JvmMetrics to allow for the user to supply a context instead of if getting one by name, and altered file context for it to be able to timestamp metrics collection (like log4j does). Would be glad to submit a patch if anyone is interested. REgards
Java RMI and Hadoop RecordIO
Hi I've been testing some different serialization techniques, to go along with a research project. I know motivation behind hadoop serialization mechanism (e.g. Writable) and the enhancement of this feature through record I/O is not only performance, but also control of the input/output. Still I've been running some simple tests and I've foud that plain RMi beats Hadoop RecordIO almost every time (14-16% faster). In my test I have a simple java class that has 14 int fields and 1 long field and I'm serializing aroung 35000 instances. Am I doing anything wrong? are there ways to improve performance in RecordIO? Have I got the use case wrong? Regards David Alves
Java RMI and Hadoop RecordIO
Hi I've been testing some different serialization techniques, to use in a research project. I know motivation behind hadoop serialization mechanism (e.g. Writable) and the enhancement of this feature through record I/O is not only performance, but also control of the input/output. Still I've been running some simple tests and I've foud that plain RMi beats Hadoop RecordIO almost every time (14-16% faster). In my test I have a simple java class that has 14 int fields and 1 long field and I'm serializing aroung 35000 instances. Am I doing anything wrong? are there ways to improve performance in RecordIO? Have I got the use case wrong? Regards David Alves
Re: Map merge part makes the task timeout
Hi again Browsing the source code (Merger.class) I see that merger actually call reporter.progress() so shouldn't this make the task be reported as still working? Regards David Alves On Nov 20, 2008, at 6:29 PM, David Alves wrote: Hi all I have a big map task that takes a long time to complete and produces a lot of information. The merge part makes the task timeout (you can see this in the end of this email, the merge part is aborted after ten minutes, the default time). I've increased the mapred.tasks.timeout property to 30 min instead of 10, in hadoop-site.xml as follows: property namemapred.task.timeout/name value180/value /property But stiil the task fails with: Task attempt_200811201704_0001_m_00_0 failed to report status for 603 seconds. Killing! Is there any other property I should change?. Regards David Alves 17:35:40,697 INFO [MapTask] Starting flush of map output 17:35:40,697 INFO [MapTask] bufstart = 17472260; bufend = 51354916; bufvoid = 99614720 17:35:40,697 INFO [MapTask] kvstart = 39119; kvend = 39468; length = 327680 17:35:40,950 INFO [MapTask] Index: (0, 33884416, 33884416) 17:35:40,950 INFO [MapTask] Finished spill 152 17:35:45,333 INFO [Merger] Merging 153 sorted segments 17:35:46,337 INFO [Merger] Merging 9 intermediate segments out of a total of 153 17:36:16,849 INFO [Merger] Merging 10 intermediate segments out of a total of 145 17:36:47,615 INFO [Merger] Merging 10 intermediate segments out of a total of 136 17:37:21,529 INFO [Merger] Merging 10 intermediate segments out of a total of 127 17:37:59,883 INFO [Merger] Merging 10 intermediate segments out of a total of 118 17:38:35,370 INFO [Merger] Merging 10 intermediate segments out of a total of 109 17:39:14,795 INFO [Merger] Merging 10 intermediate segments out of a total of 100 17:39:51,787 INFO [Merger] Merging 10 intermediate segments out of a total of 91 17:40:28,721 INFO [Merger] Merging 10 intermediate segments out of a total of 82 17:41:05,650 INFO [Merger] Merging 10 intermediate segments out of a total of 73 17:41:43,285 INFO [Merger] Merging 10 intermediate segments out of a total of 64 17:42:23,531 INFO [Merger] Merging 10 intermediate segments out of a total of 55 17:43:01,709 INFO [Merger] Merging 10 intermediate segments out of a total of 46 17:43:40,209 INFO [Merger] Merging 10 intermediate segments out of a total of 37 17:44:20,707 INFO [Merger] Merging 10 intermediate segments out of a total of 28 17:44:57,700 INFO [Merger] Merging 10 intermediate segments out of a total of 19
Again UnknowScannerException
Hi I've seen this issue a lot in the mailing list, but I still have a doubt. My map tasks keep failing with unknownscannerexception (2 map tasks on same node over a 3 node cluster with 4Gb mem, scanning almost 40 GBs of data, running hadoop 0.18.0, and hbase 0.18.0), this happened in the past, but as it passed 50% of the times rarely a M/R task completely failed, as the data increased the USE now completely prevents the maps from running to completion. I'm only scanning the table, there are no inserts at the same time. I've previouly seen mentioned a lease period I could increase. Is this the hbase.regionserver.lease.period property? Should I upgrade to hbase 0.18.1, and if so must I also update hadoop? Regards David Alves
Map merge part makes the task timeout
Hi all I have a big map task that takes a long time to complete and produces a lot of information. The merge part makes the task timeout (you can see this in the end of this email, the merge part is aborted after ten minutes, the default time). I've increased the mapred.tasks.timeout property to 30 min instead of 10, in hadoop-site.xml as follows: property namemapred.task.timeout/name value180/value / property But stiil the task fails with: Task attempt_200811201704_0001_m_00_0 failed to report status for 603 seconds. Killing! Is there any other property I should change?. Regards David Alves 17:35:40,697 INFO [MapTask] Starting flush of map output 17:35:40,697 INFO [MapTask] bufstart = 17472260; bufend = 51354916; bufvoid = 99614720 17:35:40,697 INFO [MapTask] kvstart = 39119; kvend = 39468; length = 327680 17:35:40,950 INFO [MapTask] Index: (0, 33884416, 33884416) 17:35:40,950 INFO [MapTask] Finished spill 152 17:35:45,333 INFO [Merger] Merging 153 sorted segments 17:35:46,337 INFO [Merger] Merging 9 intermediate segments out of a total of 153 17:36:16,849 INFO [Merger] Merging 10 intermediate segments out of a total of 145 17:36:47,615 INFO [Merger] Merging 10 intermediate segments out of a total of 136 17:37:21,529 INFO [Merger] Merging 10 intermediate segments out of a total of 127 17:37:59,883 INFO [Merger] Merging 10 intermediate segments out of a total of 118 17:38:35,370 INFO [Merger] Merging 10 intermediate segments out of a total of 109 17:39:14,795 INFO [Merger] Merging 10 intermediate segments out of a total of 100 17:39:51,787 INFO [Merger] Merging 10 intermediate segments out of a total of 91 17:40:28,721 INFO [Merger] Merging 10 intermediate segments out of a total of 82 17:41:05,650 INFO [Merger] Merging 10 intermediate segments out of a total of 73 17:41:43,285 INFO [Merger] Merging 10 intermediate segments out of a total of 64 17:42:23,531 INFO [Merger] Merging 10 intermediate segments out of a total of 55 17:43:01,709 INFO [Merger] Merging 10 intermediate segments out of a total of 46 17:43:40,209 INFO [Merger] Merging 10 intermediate segments out of a total of 37 17:44:20,707 INFO [Merger] Merging 10 intermediate segments out of a total of 28 17:44:57,700 INFO [Merger] Merging 10 intermediate segments out of a total of 19
Full table scan fails during map
Hi guys We've got HBase(0.18.0, r695089) and Hadoop(0.18.0, r686010) running for a while, and apart from the ocasional regionserver stopping without notice (and whithout explanations from what we can see in the logs), problem that we solve easily just by restarting it, we now have come to face a more serious problem of what I think is data loss. We use Hbase as a links and documents database (similar to nutch) in a 3 node cluster (4GB Mem on each node), the links database has a 4 regions and the document database now has 200 regions for a total of 216 (with meta and root). After the crawl task, which went ok, (we now have 60GB/300GB full in hdfs) we proceed to do a full table scan to create the indexes and thats where things started to fail. We are seing a problem in the logs (at the end of this email). This repeats untils theres a retriesexausted exception and the task fails in the map phase. Hadoop fsk tool tells us that hdfs is ok. I'm still to explore the rest of the logs searching for some kind of error I will post a new mail if I find anything. Any help would be greatly appreciated. Regards David Alves 2008-11-19 19:47:52,664 DEBUG org.apache.hadoop.dfs.DFSClient: DataStreamer block blk_-4521866854383825816_55401 wrote packet seqno:0 size:38 offsetInBlock:0 lastPacketInBlock:true 2008-11-19 19:47:52,676 DEBUG org.apache.hadoop.dfs.DFSClient: DFSClient received ack for seqno 0 2008-11-19 19:47:52,676 DEBUG org.apache.hadoop.dfs.DFSClient: Closing old block blk_-4521866854383825816_55401 2008-11-19 19:47:52,769 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Added / hbase/links/1617869663/docDatum/mapfiles/7718188406431341070 with 20622 entries, sequence id 5289673, data size 5.6m, file size 6.0m 2008-11-19 19:47:52,770 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region links,ext://myrepo/mypath/MYDOC.pdf,1227122254743 in 3015ms, sequence id=5289673, compaction requested=false 2008-11-19 19:53:17,524 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening scanner (fsOk: true) java.io.IOException: HStoreScanner failed construction at org .apache .hadoop .hbase.regionserver.StoreFileScanner.init(StoreFileScanner.java:70) at org .apache .hadoop.hbase.regionserver.HStoreScanner.init(HStoreScanner.java:68) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java: 1916) at org.apache.hadoop.hbase.regionserver.HRegion $HScanner.init(HRegion.java:1954) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java: 1345) at org .apache .hadoop .hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1170) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) Caused by: java.io.FileNotFoundException: File does not exist: hdfs://cyclops- prod-1:9000/hbase/document/153945136/docDatum/mapfiles/ 5163556575658593611/data at org .apache .hadoop .dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java: 394) at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:695) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java: 1419) at org.apache.hadoop.io.SequenceFile $Reader.init(SequenceFile.java:1414) at org.apache.hadoop.io.MapFile $Reader.createDataFileReader(MapFile.java:301) at org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile $HbaseReader.createDataFileReader(HStoreFile.java:650) at org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:283) at org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile $HbaseReader.init(HStoreFile.java:632) at org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile $Reader.init(HStoreFile.java:714) at org.apache.hadoop.hbase.regionserver.HStoreFile $HalfMapFileReader.init(HStoreFile.java:908) at org .apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java: 408) at org .apache .hadoop .hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java: 96) at org .apache .hadoop .hbase.regionserver.StoreFileScanner.init(StoreFileScanner.java: 67) ... 10 more
Compound filters
Hi Guys I'm currently needing to build some compound filters for column values (I'm needing OR but it could be easily extended to use AND, OR NOT grouped in any way) comparing byte[] values. The objective is to fitler the dataset that is inputed onto M/R jobs. Would this be an interesting feature, or is it already predicted/implemented in any way that I don't know about? I'm thinking with this functionality RegExpRowFilter could just focus on matching row keys and leave column matching to these filters. Regards David
NotServingRegionException revisited
Hi Guys I have found, what I think is a strange case. Last Friday a M/R task failed constantly (if a task fails for some reason it is later reran a number of times to make sure service outages won't stop the process) with NotServingRegionException. The thing here is that that particular region is ONLINE (at least its what I can tell from a select * from .META. and it is not a split, and it is not retiring (no retiring info in logs). It is not a ocasional thing because the task keeps failing (even after a cluster restart). So how can a ONLINE region (as reported by a .META. scanner) not be on the onlineRegions map in HRegionServer? Any ideas? Regards David Alves Partial Logs/Info (this keeps appearing so only one result is shown): Master: 2008-04-28 18:44:59,235 DEBUG org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner regioninfo: {regionname: cyclops-documents-database,,1209061263654, startKey: , endKey: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm, encodedName: 485063880, tableDesc: {name: cyclops-documents-database, families: {documentDbContent:={name: documentDbContent, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none}, documentDbCrawlDatum:={name: documentDbCrawlDatum, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none}, documentDbMetadata:={name: documentDbMetadata, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none}, documentDbRepoDatum:={name: documentDbRepoDatum, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none, server: 10.0.0.1:60020, startCode: 1209390438896 Region Server: 2008-04-28 18:45:29,028 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020, call batchUpdate(cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm,1209061263655, [EMAIL PROTECTED]) from 10.0.0.2:47636: error: org.apache.hadoop.hbase.NotServingRegionException: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm,1209061263655 org.apache.hadoop.hbase.NotServingRegionException: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm,1209061263655 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:1318) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:1280) at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate(HRegionServer.java:1098) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HbaseRPC $Server.call(HbaseRPC.java:413) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
RE: NotServingRegionException revisited
Hi Again After going through the logs a bit more carefully I found a FNFE while trying to do a compaction on that particular region. The relevant log follows attached. After the failed compaction because of the FNFE the region is still online in .META. but no longer among the online regions in the region server, which I suspect causes my problem right?. Regards David Alves -Original Message- From: David Alves [mailto:[EMAIL PROTECTED] Sent: Monday, April 28, 2008 6:31 PM To: hbase-user@hadoop.apache.org Subject: NotServingRegionException revisited Hi Guys I have found, what I think is a strange case. Last Friday a M/R task failed constantly (if a task fails for some reason it is later reran a number of times to make sure service outages won't stop the process) with NotServingRegionException. The thing here is that that particular region is ONLINE (at least its what I can tell from a select * from .META. and it is not a split, and it is not retiring (no retiring info in logs). It is not a ocasional thing because the task keeps failing (even after a cluster restart). So how can a ONLINE region (as reported by a .META. scanner) not be on the onlineRegions map in HRegionServer? Any ideas? Regards David Alves Partial Logs/Info (this keeps appearing so only one result is shown): Master: 2008-04-28 18:44:59,235 DEBUG org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner regioninfo: {regionname: cyclops-documents-database,,1209061263654, startKey: , endKey: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca- CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm, encodedName: 485063880, tableDesc: {name: cyclops- documents-database, families: {documentDbContent:={name: documentDbContent, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none}, documentDbCrawlDatum:={name: documentDbCrawlDatum, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none}, documentDbMetadata:={name: documentDbMetadata, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none}, documentDbRepoDatum:={name: documentDbRepoDatum, max versions: 3, compression: NONE, in memory: false, block cache enabled: false, max length: 2147483647, bloom filter: none, server: 10.0.0.1:60020, startCode: 1209390438896 Region Server: 2008-04-28 18:45:29,028 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020, call batchUpdate(cyclops-documents- database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation- /Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm,1209061263655, [EMAIL PROTECTED]) from 10.0.0.2:47636: error: org.apache.hadoop.hbase.NotServingRegionException: cyclops-documents- database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation- /Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm,1209061263655 org.apache.hadoop.hbase.NotServingRegionException: cyclops-documents- database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation- /Critical/Biblioteca-CyclopsRepoLocation/EBooks/DEVELOPING INTRANET APPLICATIONS WITH JAVA/ch5.htm,1209061263655 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer .java:1318) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer .java:1280) at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate(HRegionServ er.java:1098) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HbaseRPC $Server.call(HbaseRPC.java:413) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
Lost Rows
Hi Guys Regarding my previous problems I'm glad to say that I can now crawl an entire repository with only a small percentage of failed tasks, last hbase version plus the correction of replication property seemed to solve it for me. Still I have two issues I'd appreciate your input in. The first one regards splits. I've made a small tool (built upon stack's one) that checks DB state, and can online/offline tables and merge regions etc. This tool gives me the report ant the end of this email. The question here Is that I seem to have lost 144 rows (comparing the output formats output records and the actual rows in the table from a select count(*)). I suspect these rows are in the offline splits. Can I use my tool to merge the splits against their online parents using HRegion.merge() ? Or is it a big no no. The second issue is more problematic, I misconfigured my last job and it ran 10 maps instead of the 1 it should, but when under that kind of load hbase completely failed, regionservers went down, at one time I had to completely erase the database because it wouldn't start again (I suspect .META. was offline) the other time I was able to recover all the data by simply restarting it. Is there any kind of procedure I should use in this situation? Best Regards David Alves Log Trace: Found region: cyclops-documents-database,,1208892792201 Id: 1208892792201 Start Key: End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW TO USE HTML 3.2/ch6.htm Online/Offline Status: ONLINE Split?: FALSE Found region: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW TO USE HTML 3.2/ch6.htm,1208892792202 Id: 1208892792202 Start Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW TO USE HTML 3.2/ch6.htm End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm Online/Offline Status: ONLINE Split?: FALSE DEBUG 23-04 14:54:50,744 (DFSClient.java:readChunk:934) -DFSClient readChunk got seqno 2 offsetInBlock 8192 lastPacketInBlock false packetLen 4132 Found region: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm,1208891918491 Id: 1208891918491 Start Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm Online/Offline Status: OFFLINE Split?: TRUE Found region: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm,1208893494772 Id: 1208893494772 Start Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm Online/Offline Status: ONLINE Split?: FALSE DEBUG 23-04 14:54:50,754 (DFSClient.java:readChunk:934) -DFSClient readChunk got seqno 3 offsetInBlock 12288 lastPacketInBlock false packetLen 4132 Found region: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm,1208893494773 Id: 1208893494773 Start Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm Online/Offline Status: OFFLINE Split?: TRUE Found region: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm,1208894034845 Id: 1208894034845 Start Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium Edition Using VB 5/Books/Platinium
Re: Lost Rows
On Wed, 2008-04-23 at 11:22 -0700, stack wrote: Here's a few things David. Regards your tool, you could have just done 'select info:regioninfo from .META.;' and it would output same data (If you did something like echo 'select info:regioninfo from .META.;' |./bin/hbase shell --html /tmp/meta.html, the output would be html'ized and easier to read than an ascii table). Regarding the tool Well I knew that I could just do a select * from .META. to get the info the question here was that I needed to do stuff on the regions based on their state, besides now I use socks proxy with my tool that allows me to check on the cluster from my laptop :) the output you saw was from logs I actually pretty print the info to my application (both by web and console) as hql will be deprecated anyhow it seemed as a godd idea. If you want to do merging of regions, check out the main on org.apache.hadoop.hbase.util.Merge. will check it out Regards offline regions, looking at your report below, all offlined regions look legit. Their online status is offline but they also have the split attribute set (On split, the parent is offlined. The daughter regions take its place. The parent hangs around until such time as the daughters no longer hold reference to the parent. Then the parent is deleted). Ok. Regards the 144 missing rows, is it possible you fed your map task duplicates? The duplicates would increment the map count of inputs processed but reduce would squash the duplicates together and output a single row. If you don't have that many rows, perhaps output inputs and outputs and try to figure where the 144 are going missing? The missing rows were counted from TableOutputFormat reduce output records (from the M/R job) and matched against a select count(*) so even if the maps were fed duplicates there are still missing rows. Regards hbase buckling under load, please send us logs. If you are using TRUNK, it should be able to easily carry ten concurrent clients and where it can't, it puts up a gate to block updates. It shouldn't be falling over. Well one of the times (the one I could recover from) I saw a lot of NotServingRegionException in the logs, which I thinks falls into the graceful failure category you mentioned, the other time all hell broke loose (Like EOFExceptions reading from .META.)but I still saw a thread dump on the logs so maybe it just OOMEd out. I will send the relevant part of the logs aside because they are quite huge. On another matter must hbase really log (even in debug) all filter calls? Thats stands for about 70% of my logs. Best Regards David Thanks D, St.Ack David Alves wrote: Hi Guys Regarding my previous problems I'm glad to say that I can now crawl an entire repository with only a small percentage of failed tasks, last hbase version plus the correction of replication property seemed to solve it for me. Still I have two issues I'd appreciate your input in. The first one regards splits. I've made a small tool (built upon stack's one) that checks DB state, and can online/offline tables and merge regions etc. This tool gives me the report ant the end of this email. The question here Is that I seem to have lost 144 rows (comparing the output formats output records and the actual rows in the table from a select count(*)). I suspect these rows are in the offline splits. Can I use my tool to merge the splits against their online parents using HRegion.merge() ? Or is it a big no no. The second issue is more problematic, I misconfigured my last job and it ran 10 maps instead of the 1 it should, but when under that kind of load hbase completely failed, regionservers went down, at one time I had to completely erase the database because it wouldn't start again (I suspect .META. was offline) the other time I was able to recover all the data by simply restarting it. Is there any kind of procedure I should use in this situation? o Best Regards David Alves Log Trace: Found region: cyclops-documents-database,,1208892792201 Id: 1208892792201 Start Key: End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW TO USE HTML 3.2/ch6.htm Online/Offline Status: ONLINE Split?: FALSE Found region: cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW TO USE HTML 3.2/ch6.htm,1208892792202 Id: 1208892792202 Start Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW TO USE HTML 3.2/ch6.htm End Key: smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm Online/Offline Status: ONLINE Split?: FALSE DEBUG 23-04 14:54:50,744 (DFSClient.java:readChunk:934
RE: Lost Rows
agreed. thanks Jim. On Wed, 2008-04-23 at 12:13 -0700, Jim Kellerman wrote: While log4j supports TRACE, apache commons logging does not, so those trace messages will come out when DEBUG is set. To disable the filter messages, just add the following to your log4j.properties file: log4j.logger.org.apache.hadoop.hbase.filter=INFO --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Clint Morgan [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 23, 2008 12:01 PM To: hbase-user@hadoop.apache.org Subject: Re: Lost Rows On Wed, Apr 23, 2008 at 11:58 AM, David Alves [EMAIL PROTECTED] wrote: On another matter must hbase really log (even in debug) all filter calls? Thats stands for about 70% of my logs. Agreed, I'll drop those messages to trace. No virus found in this incoming message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.3/1393 - Release Date: 4/23/2008 8:12 AM No virus found in this outgoing message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.3/1393 - Release Date: 4/23/2008 8:12 AM
Concurrent Modification Exceptions in logs
Hi Guys My NPE problem on online table lookup seemed to go away (at least until now), I think the cause was different dfs.replication values for hadoop and hbase (thanks st.ack for pointing it out), now I'm just struggling with region offline exceptions :). I'm seeing some CMEs in the logs they occurred while I still had bad dfs.replication settings between hadoop and hbase but still thought you should know. Regards David Alves Trace: 2008-04-21 13:20:46,443 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$ValueIterator.next(HashMap.java:822) at org.apache.hadoop.hbase.master.ServerManager.processMsgs(ServerManager.java:350) at org.apache.hadoop.hbase.master.ServerManager.processRegionServerAllsWell(ServerManager.java:299) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:217) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:560) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:388) at java.lang.Thread.run(Thread.java:619)
Make NameNode listen on multiple interfaces
Hi In my setup I have a cluster in witch each server has two network interfaces on for hadoop network traffic (lets call it A) and one for traffic to the rest of the network (lets call it B). Until now I only needed to make the nodes communicate with the master and vice-versa (through the A interface) so no problem there, but now I'm in need of submitting jobs and accessing the filesystem itself from outside machines (through the B interface) so my question id can I make namenode listen on both interfaces? Regards David Alves
Strange logging behaviour
Hi Again On another note from the previous email my even though I followed the faq my ..master...log is showing only one error entry and the .out is showing only INFO entries is this normal? (i'm logging to a different dir that default by altering the relevant property in hbase-env.sh) Regards David Alves
Re: Strange logging behaviour
The NPE problem I'm currently having didn't happen with the released version but I still got time outs and offline region problems. So I tryed migrating and have already migrated most of my code to comply to the new APIs (which are a lot better by the way congrats). Due to this fact I would like to keep using trunk but will get back to the released version if you think is better. Best Regards David On Fri, 2008-04-18 at 10:49 -0700, stack wrote: It didn't? Wasn't it the same issue of table regions being offlined or was it something else? Thanks, St.Ack On Tue, Apr 29, 2008 at 10:28 AM, David Alves [EMAIL PROTECTED] wrote: Hi Again On another note from the previous email my even though I followed the faq my ..master...log is showing only one error entry and the .out is showing only INFO entries is this normal? (i'm logging to a different dir that default by altering the relevant property in hbase-env.sh) Regards David Alves
Re: Strange logging behaviour
I fully understand your point, I know that trunk is not guaranteed to be stable, and by no means was I expecting it to be. As you must imagine this is not a mission critical application and it is still in inception phase. In fact when I refer to production I mean future production that, for the being is only available to a limited set of beta users. I rolled hbase trunk onto the production cluster in order to check if the time out and region offline issues would go away as the servers are better there, but came around to the NPE problem happening always. Still I think this must be a relevant problem so I thought I could get your help debugging/solving it so both my application and hbase would progress forward, and I wouldn't need to revert my app to the old APIs, and learn a bit more about Hbase in the process. Regards David On Fri, 2008-04-18 at 11:21 -0700, stack wrote: TRUNK comes with the usual disclaimer: no guarantees that its stable. Whereas with releases, if they are not stable, we'll stop work on TRUNK to fix release problems and try and roll a new one quicIkly. If you're trying to run hbase in a production context, would suggest you use release unless there is an explicit feature you need that is only in TRUNK. If logging is not working correctly in TRUNK then its going to be hard for us to help you out since you can't pass us detail of sufficient detail (Its broke for you, right)? I was going to look at trying to figure it in a bit St.Ack On Tue, Apr 29, 2008 at 11:01 AM, David Alves [EMAIL PROTECTED] wrote: The NPE problem I'm currently having didn't happen with the released version but I still got time outs and offline region problems. So I tryed migrating and have already migrated most of my code to comply to the new APIs (which are a lot better by the way congrats). Due to this fact I would like to keep using trunk but will get back to the released version if you think is better. Best Regards David On Fri, 2008-04-18 at 10:49 -0700, stack wrote: It didn't? Wasn't it the same issue of table regions being offlined or was it something else? Thanks, St.Ack On Tue, Apr 29, 2008 at 10:28 AM, David Alves [EMAIL PROTECTED] wrote: Hi Again On another note from the previous email my even though I followed the faq my ..master...log is showing only one error entry and the .out is showing only INFO entries is this normal? (i'm logging to a different dir that default by altering the relevant property in hbase-env.sh) Regards David Alves
Regions Offline
Hi My system is quite simple: - two (one quad core, one dual core) servers with 2GB mem and 150 GB allocated to dfs. - I use it to crawl multiple supports but mainly filesystems and save the results onto hbase (not too many files 100.000 but rows can get easily to 30 MB each) I constantly getting NullPointerExceptions (on the client caused by NotServingRegionExceptions on regionserver) when creating tables or RegionOfflineExceptions when doing puts or sometimes just time outs. When started with hbase I developed in 'local' mode, I then migrated to a small dev 2 servers cluster (weaker than production is now) where I tested the functionality, and it worked fine but, my bad, due to pressing scheduling I didn't do any real load tests, so the system is now continuously going under in production. I've only been able to do a full crawl by resetting the cluster to one node and putting it in 'local' mode. My question is what can cause regions to be offline in regionservers? I ask so that I can investigate the matter further but having a starting point. I'm willing to help anyway I can but I would really appreciate any help and/or starting point and tools for my investigation. Best Regards David Alves
Batch update gain
Hi All I'm currently rewriting my own TableOutputFormat classes to comply with the new APIs introduced in the latest version and I was wondering if it would be valuable to rewrite them as buffered writers, meaning keeping a predetermined set of records (set by size to avoid OOME) before commiting them to HBase. What are your thoughs about this? In another note I think it would be valuable to rewrite the TableInputFormat class to be extendable. For example in my case I needed a Filtered (RegExpRowFilter) TableInputFormat and could not extend the original because its instance of HTable is package protected. Best regards David Alves
Re: Batch update gain
Hi Yes I was thinking of batch (multiple rows) updates, but only then I realized that the old commit with lock methods were deprecated so forget I mentioned it. About HBASE-581 I'll drop my comments in JIRA. In another note in my application I have a region that became offline. Is there a way of making if online again (I restarted the application several times and it didn't help)? Regards David Alves On Tue, 2008-04-15 at 09:09 -0700, stack wrote: David Alves wrote: Hi All I'm currently rewriting my own TableOutputFormat classes to comply with the new APIs introduced in the latest version and I was wondering if it would be valuable to rewrite them as buffered writers, meaning keeping a predetermined set of records (set by size to avoid OOME) before commiting them to HBase. Commits are by row. Are you talking of batching up rows before forwarding them to hbase? What are your thoughs about this? In another note I think it would be valuable to rewrite the TableInputFormat class to be extendable. For example in my case I needed a Filtered (RegExpRowFilter) TableInputFormat and could not extend the original because its instance of HTable is package protected. This needs to be done before 0.2.0 release. Its been on my mind. I just made a JIRA for it. Dump any thoughts you have on how it might work into hbase-581. At a minimum, at note on what currently prevents your being able to subclass. If you are currently working on this, I could do the hbase end for you. Just say. St.Ack
RE: StackOverFlow Error in HBase
Hi Jim and all I'll commit to test the patch under the same conditions as it failed before, (with around 36000 records) but in this precise moment I preparing my next development iteration, which means a lot of meetings. By the end of the day tomorrow (friday) I should have a confirmation whether the patch worked (or not). Regards David Alves On Thu, 2008-04-03 at 09:12 -0700, Jim Kellerman wrote: David, Have you had a chance to try this patch? We are about to release hbase-0.1.1 and until we receive a confirmation in HBASE-554 from another person who has tried it and verifies that it works, we cannot include it in this release. If it is not in this release, there will be a significant wait for it to appear in an hbase release. hbase-0.1.2 will not happen anytime soon unless there are critical issues that arise that have not been fixed in 0.1.1. hbase-0.2.0 is also some time in the future. There are a significant number of issues to address before that release is ready. Frankly, I'd like to see this patch in 0.1.1, because it is an issue for people that use filters. The alternative would be for Clint to supply a test case that fails without the patch but passes with the patch. We will hold up the release, but need a commitment either from David to test the patch or for Clint to supply a test. We need that commitment by the end of the day today 2008/04/03 along with an eta as to when it will be completed. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: David Alves [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2008 2:36 PM To: hbase-user@hadoop.apache.org Subject: RE: StackOverFlow Error in HBase Hi I just deployed the unpatched version. Tomorrow I'll rebuild the system with the patch and try it out. Thanks again. Regards David Alves -Original Message- From: Jim Kellerman [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2008 10:04 PM To: hbase-user@hadoop.apache.org Subject: RE: StackOverFlow Error in HBase David, Have you tried this patch and does it work for you? If so we'll include it hbase-0.1.1 --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: David Alves [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 01, 2008 10:44 AM To: hbase-user@hadoop.apache.org Subject: RE: StackOverFlow Error in HBase Hi Thanks for the prompt path Clint, St.Ack and all you guys. Regards David Alves -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Clint Morgan Sent: Tuesday, April 01, 2008 2:04 AM To: hbase-user@hadoop.apache.org Subject: Re: StackOverFlow Error in HBase Try the patch at https://issues.apache.org/jira/browse/HBASE-554. cheers, -clint On Mon, Mar 31, 2008 at 5:39 AM, David Alves [EMAIL PROTECTED] wrote: Hi ... again In my previous mail I stated that increasing the stack size solved the problem, well I jumped a little bit to the conclusion, in fact it didn't, the StackOverFlowError always occurs at the end of the cycle when no more records match the filter. Anyway I've rewritten my application to use a normal scanner and and do the filtering after which is not optimal but it works. I'm just saying this because it might be a clue, in previous versions (!= 0.1.0) even though a more serious problem happened (regionservers became irresponsive after so many records) this didn't happen. Btw in current version I notice no, or very small, decrease of thoughput with time, great work! Regards David Alves On Mon, 2008-03-31 at 05:18 +0100, David Alves wrote: Hi again As I was almost at the end (80%) of indexable docs, for the time being I simply increased the stack size, which seemed to work. Thanks for your input St.Ack really helped me solve the problem at least for the moment. On another note in the same method I changed the way the scanner was obtained when htable.getStartKeys() would be more than 1, so that I could limit the records read each time to a single region, and the scanning would start at the last region, strangely the number of keys obtained by htable.getStartKeys() was always 1 even though by the end there are already 21 regions. Any thoughts? Regards David Alves -Original Message- From: stack [mailto:[EMAIL PROTECTED]Sent: Sunday, March
RE: StackOverFlow Error in HBase
Hi Thanks for the prompt path Clint, St.Ack and all you guys. Regards David Alves -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Clint Morgan Sent: Tuesday, April 01, 2008 2:04 AM To: hbase-user@hadoop.apache.org Subject: Re: StackOverFlow Error in HBase Try the patch at https://issues.apache.org/jira/browse/HBASE-554. cheers, -clint On Mon, Mar 31, 2008 at 5:39 AM, David Alves [EMAIL PROTECTED] wrote: Hi ... again In my previous mail I stated that increasing the stack size solved the problem, well I jumped a little bit to the conclusion, in fact it didn't, the StackOverFlowError always occurs at the end of the cycle when no more records match the filter. Anyway I've rewritten my application to use a normal scanner and and do the filtering after which is not optimal but it works. I'm just saying this because it might be a clue, in previous versions (!= 0.1.0) even though a more serious problem happened (regionservers became irresponsive after so many records) this didn't happen. Btw in current version I notice no, or very small, decrease of thoughput with time, great work! Regards David Alves On Mon, 2008-03-31 at 05:18 +0100, David Alves wrote: Hi again As I was almost at the end (80%) of indexable docs, for the time being I simply increased the stack size, which seemed to work. Thanks for your input St.Ack really helped me solve the problem at least for the moment. On another note in the same method I changed the way the scanner was obtained when htable.getStartKeys() would be more than 1, so that I could limit the records read each time to a single region, and the scanning would start at the last region, strangely the number of keys obtained by htable.getStartKeys() was always 1 even though by the end there are already 21 regions. Any thoughts? Regards David Alves -Original Message- From: stack [mailto:[EMAIL PROTECTED] Sent: Sunday, March 30, 2008 9:36 PM To: hbase-user@hadoop.apache.org Subject: Re: StackOverFlow Error in HBase You're doing nothing wrong. The filters as written recurse until they find a match. If long stretches between matching rows, then you will get a StackOverflowError. Filters need to be changed. Thanks for pointing this out. Can you do without them for the moment until we get a chance to fix it? (HBASE-554) Thanks, St.Ack David Alves wrote: Hi St.Ack and all The error always occurs when trying to see if there are more rows to process. Yes I'm using a filter(RegExpRowFilter) to select only the rows (any row key) that match a specific value in one of the columns. Then I obtain the scanner just test the hasNext method, close the scanner and return. Am I doing something wrong? Still StackOverflowError is not supposed to happen right? Regards David Alves On Thu, 2008-03-27 at 12:36 -0700, stack wrote: You are using a filter? If so, tell us more about it. St.Ack David Alves wrote: Hi guys I 'm using HBase to keep data that is later indexed. The data is indexed in chunks so the cycle is get records index them check for more records etc... When I tryed the candidate-2 instead of the old 0.16.0 (which I switched to do to the regionservers becoming unresponsive) I got the error in the end of this email well into an indexing job. So you have any idea why? Am I doing something wrong? David Alves java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.StackOverflowError at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.dfs.DFSClient $BlockReader.readChunk(DFSClient.java:735) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java: 234) at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157) at org.apache.hadoop.dfs.DFSClient $BlockReader.read(DFSClient.java:658) at org.apache.hadoop.dfs.DFSClient $DFSInputStream.readBuffer(DFSClient.java:1130) at org.apache.hadoop.dfs.DFSClient $DFSInputStream.read(DFSClient.java:1166) at java.io.DataInputStream.readFully
RE: StackOverFlow Error in HBase
Hi ... again In my previous mail I stated that increasing the stack size solved the problem, well I jumped a little bit to the conclusion, in fact it didn't, the StackOverFlowError always occurs at the end of the cycle when no more records match the filter. Anyway I've rewritten my application to use a normal scanner and and do the filtering after which is not optimal but it works. I'm just saying this because it might be a clue, in previous versions (!= 0.1.0) even though a more serious problem happened (regionservers became irresponsive after so many records) this didn't happen. Btw in current version I notice no, or very small, decrease of thoughput with time, great work! Regards David Alves On Mon, 2008-03-31 at 05:18 +0100, David Alves wrote: Hi again As I was almost at the end (80%) of indexable docs, for the time being I simply increased the stack size, which seemed to work. Thanks for your input St.Ack really helped me solve the problem at least for the moment. On another note in the same method I changed the way the scanner was obtained when htable.getStartKeys() would be more than 1, so that I could limit the records read each time to a single region, and the scanning would start at the last region, strangely the number of keys obtained by htable.getStartKeys() was always 1 even though by the end there are already 21 regions. Any thoughts? Regards David Alves -Original Message- From: stack [mailto:[EMAIL PROTECTED] Sent: Sunday, March 30, 2008 9:36 PM To: hbase-user@hadoop.apache.org Subject: Re: StackOverFlow Error in HBase You're doing nothing wrong. The filters as written recurse until they find a match. If long stretches between matching rows, then you will get a StackOverflowError. Filters need to be changed. Thanks for pointing this out. Can you do without them for the moment until we get a chance to fix it? (HBASE-554) Thanks, St.Ack David Alves wrote: Hi St.Ack and all The error always occurs when trying to see if there are more rows to process. Yes I'm using a filter(RegExpRowFilter) to select only the rows (any row key) that match a specific value in one of the columns. Then I obtain the scanner just test the hasNext method, close the scanner and return. Am I doing something wrong? Still StackOverflowError is not supposed to happen right? Regards David Alves On Thu, 2008-03-27 at 12:36 -0700, stack wrote: You are using a filter? If so, tell us more about it. St.Ack David Alves wrote: Hi guys I 'm using HBase to keep data that is later indexed. The data is indexed in chunks so the cycle is get records index them check for more records etc... When I tryed the candidate-2 instead of the old 0.16.0 (which I switched to do to the regionservers becoming unresponsive) I got the error in the end of this email well into an indexing job. So you have any idea why? Am I doing something wrong? David Alves java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.StackOverflowError at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.dfs.DFSClient $BlockReader.readChunk(DFSClient.java:735) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java: 234) at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157) at org.apache.hadoop.dfs.DFSClient $BlockReader.read(DFSClient.java:658) at org.apache.hadoop.dfs.DFSClient $DFSInputStream.readBuffer(DFSClient.java:1130) at org.apache.hadoop.dfs.DFSClient $DFSInputStream.read(DFSClient.java:1166) at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.DataOutputBuffer $Buffer.write(DataOutputBuffer.java:56) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) at org.apache.hadoop.io.SequenceFile $Reader.next(SequenceFile.java:1829) at org.apache.hadoop.io.SequenceFile $Reader.next(SequenceFile.java:1729) at org.apache.hadoop.io.SequenceFile $Reader.next(SequenceFile.java:1775) at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:461) at org.apache.hadoop.hbase.HStore $StoreFileScanner.getNext(HStore.java:2350) at org.apache.hadoop.hbase.HAbstractScanner.next(HAbstractScanner.java:256) at org.apache.hadoop.hbase.HStore $HStoreScanner.next(HStore.java:2561
Doubt in RegExpRowFilter and RowFilters in general
Hi Guys In my previous email I might have misunderstood the roles of the RowFilterInterfaces so I'll pose my question more clearly (since the last one wasn't in question form :)). I save a setup when a table has to columns belonging to different column families (Table A cf1:a cf2:b)); I'm trying to build a filter so that a scanner only returns the rows where cf1:a = myvalue1 and cf2:b = myvalue2. I've build a RegExpRowFilter like this; MapText, byte[] conditionalsMap = new HashMapText, byte[](); conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes()); conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes()); return new RegExpRowFilter(.*, conditionalsMap); My problem is this filter always fails when I know for sure that there are rows whose columns match my values. I'm building the the scanner like this (the purpose in this case is to find if there are more values that match my filter): final Text startKey = this.htable.getStartKeys()[0]; HScannerInterface scanner = htable.obtainScanner(new Text[] {new Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface); return scanner.iterator().hasNext(); Can anyone give me a hand please. Thanks in advance David Alves
Re: Doubt in RegExpRowFilter and RowFilters in general
Hi Again In my previous example I seem to have misplaced a new keyword (new myvalue1.getBytes() where it should have been myvalue1.getBytes()). On another note my program hangs when I supply my own filter to the scanner (I suppose it's clear that the nodes don't know my class so there should be a ClassNotFoundException right?). Regards David Alves On Mon, 2008-02-11 at 16:51 +, David Alves wrote: Hi Guys In my previous email I might have misunderstood the roles of the RowFilterInterfaces so I'll pose my question more clearly (since the last one wasn't in question form :)). I save a setup when a table has to columns belonging to different column families (Table A cf1:a cf2:b)); I'm trying to build a filter so that a scanner only returns the rows where cf1:a = myvalue1 and cf2:b = myvalue2. I've build a RegExpRowFilter like this; MapText, byte[] conditionalsMap = new HashMapText, byte[](); conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes()); conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes()); return new RegExpRowFilter(.*, conditionalsMap); My problem is this filter always fails when I know for sure that there are rows whose columns match my values. I'm building the the scanner like this (the purpose in this case is to find if there are more values that match my filter): final Text startKey = this.htable.getStartKeys()[0]; HScannerInterface scanner = htable.obtainScanner(new Text[] {new Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface); return scanner.iterator().hasNext(); Can anyone give me a hand please. Thanks in advance David Alves
Re: RegExpRowFilter with multiple conditions on rows matching both
I now realize the text is a bit confusing.. sorry for that. Also that last paragraph should en with: ... at the same time. Regards David On Sun, 2008-02-10 at 01:03 +, David Alves wrote: Hi All! First of all congrats for the great piece of software. I have a table with two column families (A,B) each with a column when I build a RegExpRowFilter to select only rows whose columns A AND B match the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered. This is strange because if I build the map required by the constructor with only one or the other of the conditionals the rows that match won't be filtered, meaning that if they pass one and the other conditionals in different runs the should pass them both in the same run right? More concisely when running with both conditionals they are able to pass the filter() method for both columns but fail to pass the filterNotNull() method. The debug log tells me that the TreeMapText,byte[] passed to filterNotNull() by the HStore scanner doesn't contain both columns at the same time (the method is called two time first with one column and then with the other). Finally when running with only one of conditionals the filterNotNull() method still returns true once but returns false the second time (therefore returning the record) meaning that not all columns of the same row are passing through the cycle. Regards David Alves
Re: Skip Reduce Phase
Great! Thanks Owen, Ted and Jason On Thu, 2008-02-07 at 10:07 -0800, Owen O'Malley wrote: On Feb 7, 2008, at 9:59 AM, Ted Dunning wrote: I think that setting the parameter to 0 skips most of the overhead of the later stages. Setting it to 0 skips all of the buffering, sorting, merging, and shuffling. It passes the objects straight from the mapper to the output format, which writes it straight to hdfs. -- Owen
Re: Skip Reduce Phase
Hi Ted But wouldn't that still go through the intermediate phases and do the merge sort and copy to the local filesystem (which is the reduce input)? Is there a way to provide the direct map output (saved onto DFS) to another map task, or does you suggestion already do this and this is a moot point?. David On Thu, 2008-02-07 at 09:39 -0800, Ted Dunning wrote: Set numReducers to 0. On 2/7/08 9:35 AM, David Alves [EMAIL PROTECTED] wrote: Hi All First of all since this is my first post I must say congrats for the great piece of software (both Hadoop and HBase). I've been using HadoopHBase for a while and I have a question, let me just explain a little my setup: I have an HBase Database that holds information that I want to process in a Map/Reduce job but that before needs to be a little processed. So I built another Map/Reduce Job that uses a Specific (Filtered) TableInputFormat and then pre processes the information in a Map phase. As I don't need none of the intermediate phases (like merge sort) and I don't need to do anything on the reduce phase I was wondering If I could just save the Map phase output and start the second Map/Reduce job using that as an input (but still saving the splits to DFS for backtracking/reliability reasons). Is this possible? Thanks in advance, and again great piece of software. David Alves
Skip Reduce Phase
Hi All First of all since this is my first post I must say congrats for the great piece of software (both Hadoop and HBase). I've been using HadoopHBase for a while and I have a question, let me just explain a little my setup: I have an HBase Database that holds information that I want to process in a Map/Reduce job but that before needs to be a little processed. So I built another Map/Reduce Job that uses a Specific (Filtered) TableInputFormat and then pre processes the information in a Map phase. As I don't need none of the intermediate phases (like merge sort) and I don't need to do anything on the reduce phase I was wondering If I could just save the Map phase output and start the second Map/Reduce job using that as an input (but still saving the splits to DFS for backtracking/reliability reasons). Is this possible? Thanks in advance, and again great piece of software. David Alves