Puts failing with WrongRegionException
Hi there, Wonder if anyone has seen error like this 2014-10-03 16:03:45,203 WARN [RpcServer.handler=7,port=60020] regionserver.HRegion: Failed getting lock in batch put, row=65317d52abfedc8b94a19f6fbffe187c org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion m_test,64d7e88463b88e7325b623fbd6629cda,1408803862959.cb513be341b94588469efa9d26d29857., startKey='64d7e88463b88e7325b623fbd6629cda', getEndKey()='6516687f5dae26f529c53f309cb36fca', row='65317d52abfedc8b94a19f6fbffe187c' Recently, we have added 10 more region servers to our cluster and then I started seeing errors like above when doing puts via TableOutputFormat in a MR job. Maybe where hbase stores the region info is corrupted? thanks for your help in advance thomas
Re: Region Server on all data node?
Thanks Ted for the pointer. A follow-up question. Will the region server always write data to its local hdfs? I am seeing logs that said the region server is trying to get data from another data nodes. Under what scenario will a region server needs to get data from a non-local data node? thanks On Fri, Sep 26, 2014 at 7:21 AM, Ted Yu wrote: > Region server should be installed on server which is a data node. > > Region server count may be lower than data node count. So some data nodes > would not have region server running. > > See http://hbase.apache.org/book.html#regions.arch.locality > > Cheers > > On Fri, Sep 26, 2014 at 7:02 AM, Thomas Kwan wrote: > >> Hi there, >> >> To get good read performance, should the region server be installed on >> every data node? (Thinking about data locality here) >> >> thanks >> thomas >>
Region Server on all data node?
Hi there, To get good read performance, should the region server be installed on every data node? (Thinking about data locality here) thanks thomas
Re: random reads
Hi Esteban, Thanks for sharing ideas. We are on Hbase 0.96 and java 1.6. I have enabled short-circuit read, and heap size is around 16G for each region server. We have about 20 of them. The list of rowkeys that I need to process is about 10M. I am using batch gets already and the batch size is ~2000 gets. thomas On Thu, Aug 14, 2014 at 11:01 AM, Esteban Gutierrez wrote: > Hello Thomas, > > What version of HBase are you using? sorting and grouping based on the > regions the rows is going to help for sure. I don't think you should focus > too much in the locality side of the problem unless your HDFS input set is > too large (100s or 1000s of MBs per task), otherwise it might be faster to > load in-memory the input dataset and do the batched calls. As discussed in > this mailing list recently there are too many factors that might be > involved in the performance: number of threads or tasks, size of the row, > RS resources, configurations, etc. so any additional info would be very > helpful. > > cheers, > esteban. > > > > > -- > Cloudera, Inc. > > > > On Thu, Aug 14, 2014 at 10:32 AM, Thomas Kwan > wrote: > >> Hi there >> >> I have a use-case where I need to do a read to check if a hbase entry >> is present, then I do a put to create the entry when it is not there. >> >> I have a script to get a list of rowkeys from hive and put them on a >> HDFS directory. Then I have a MR job that reads the rowkeys and do >> batch reads. I am getting around 1.5K requests per second. >> >> To attempt to make this faster, I am wondering if I can >> >> - sort and group the rowkeys based on regions >> - make the MR jobs run on regions that have the data locally >> >> Scan or TableInputFormat must have some codes to do something similar >> right? >> >> thanks >> thomas >>
random reads
Hi there I have a use-case where I need to do a read to check if a hbase entry is present, then I do a put to create the entry when it is not there. I have a script to get a list of rowkeys from hive and put them on a HDFS directory. Then I have a MR job that reads the rowkeys and do batch reads. I am getting around 1.5K requests per second. To attempt to make this faster, I am wondering if I can - sort and group the rowkeys based on regions - make the MR jobs run on regions that have the data locally Scan or TableInputFormat must have some codes to do something similar right? thanks thomas
Cannot drop table
I just created a new test table today, and when trying to drop it, I got ERROR: No serverName in hbase:meta for p_hashes,,1407880525015.6fe0bbb6e5a6f614891e7c1c5901c70f. containing In the hbase master log, it said client.HConnectionManager$HConnectionImplementation: locateRegionInMeta failed; parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, hostname=dn31.manage.com,60020,1407810472702, seqNum=0}, attempt=7/350; retrying after=10077ms; because: No serverName in hbase:meta for p_hashes,,1407880525015.6fe0bbb6e5a6f614891e7c1c5901c70f. containing >From scan 'hbase:meta' in hbase shell, I noticed that p_hashes does not have "info:server" column. p_hashes,,1407880525015.6fe0bbb6e5a6f614891e7c1c5901c70f. column=info:regioninfo, timestamp=1407880525176, value={ENCODED => 6fe0bbb6e5a6f614891e7c1c5901c70f, NAME => 'p_hashes,,1407880525015.6fe0bbb6e5a6f614891e7c1c5901c70f.', STARTKEY => '', ENDKEY => ''} Another sample table that has "info:server" column. test_2,,1407864389951.91bee65616bba337ac6032fa53ed01a7. column=info:regioninfo, timestamp=1407864390169, value={ENCODED => 91bee65616bba337ac6032fa53ed01a7, NAME => 'test_2,,1407864389951.91bee65616bba337ac6032fa53ed01a7.', STARTKEY => '', ENDKEY => ''} test_2,,1407864389951.91bee65616bba337ac6032fa53ed01a7. column=info:seqnumDuringOpen, timestamp=1407864390217, value=\x00\x00\x00\x00\x00\x00\x00\x01 test_2,,1407864389951.91bee65616bba337ac6032fa53ed01a7. column=info:server, timestamp=1407864390217, value=dn24.manage.com:60020 test_2,,1407864389951.91bee65616bba337ac6032fa53ed01a7. column=info:serverstartcode, timestamp=1407864390217, value=1407807046393 Should the "info:server" always be there in the hbase:meta? How can I delete table that do not have the info:server an entry?
Re: Read operations hanged
Ted, >From the master log, there was a compaction around the time. 2014-08-09 22:50:51,176 DEBUG [827019302@qtp-63557232-287] client.HBaseAdmin: Trying to compact {ENCODED => 12c9a609765ad0bbd6468d93368f860a, NAME => 'm_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.', STARTKEY => '2fd811c2b1d7476efb16499ccb2b823d', ENDKEY => '3328d07989225a29067b7b7981150052'}: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. is not online at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952) at org.apache.hadoop.hbase.regionserver.HRegionServer.compactRegion(HRegionServer.java:3750) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19803) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) Also, hbase hbck shows a lot of errors. In particular, I see ERROR: Region { meta => m_hashes,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a., hdfs => hdfs://cluster01/apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a, deployed => } not deployed on any region server. ... ERROR: There is a hole in the region chain between 2fd811c2b1d7476efb16499ccb2b823d and 3328d07989225a29067b7b7981150052. You need to create a new .regioninfo and region dir in hdfs to plug the hole. Looks like the data is there [hbase@db03 ~]$ hadoop fs -du /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a 105 /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a/.regioninfo 4023827732 /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a/cf1 1773806 /apps/hbase/data/data/default/m_data/12c9a609765ad0bbd6468d93368f860a/recovered.edits Wonder if hbase hbck --repairHoles can fix this kind of thing? thomas On Sun, Aug 10, 2014 at 5:17 PM, Ted Yu wrote: > bq. it's host dn29.manage.com,60020,1407600154728 is dead but not processed > yet > > Can you look back (from 22:50:51) in master log to see what happened to > dn29 ? > > Thanks > > > On Sun, Aug 10, 2014 at 2:51 PM, Thomas Kwan wrote: > >> Thanks for your help Ted. >> >> From the master's log, I see >> >> 2014-08-09 22:50:51,176 DEBUG [827019302@qtp-63557232-287] >> client.HBaseAdmin: Trying to compact {ENCODED => >> 12c9a609765ad0bbd6468d93368f860a, NAME => >> >> 'm_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.', >> STARTKEY => '2fd811c2b1d7476efb16499ccb2b823d', ENDKEY => >> '3328d07989225a29067b7b7981150052'}: >> org.apache.hadoop.hbase.NotServingRegionException: >> org.apache.hadoop.hbase.NotServingRegionException: Region >> >> m_hashes,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> is not online >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.compactRegion(HRegionServer.java:3750) >> at >> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19803) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) >> at >> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) >> >> at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown >> Source) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) >> at >> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >> at >> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) >> at >> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277) >> at >> org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1647) >> at >> org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1623) >> at >> org.apach
Re: Read operations hanged
=237,port=6] master.AssignmentManager: Skip assigning m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a., it's host dn29.manage.com,60020,1407600154728 is dead but not processed yet And I checked dn29 via hbase UI running at http://dn29.manage.com:60030/rs-status, looks like there is no regions on dn29. thanks thomas On Sun, Aug 10, 2014 at 12:28 PM, Ted Yu wrote: > Can you check master log to see why 'm_data,2fd811c2b1d7476efb16499ccb2b823d' > went offline ? > > Thanks > > > On Sun, Aug 10, 2014 at 12:13 PM, Thomas Kwan > wrote: > >> Hi Ted, >> >> Hbase version is 0.96.0.2.0 >> >> Nothing interesting in the hbase log on dn29 and confirmed that region >> server is running on dn29 >> >> When I do 'get', i see >> >> hbase(main):001:0> get 'm_data','2fd811c2b1d7476efb16499ccb2b823d' >> >> COLUMN CELL >> >> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> is not online >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26925) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) >> at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) >> >> On Sun, Aug 10, 2014 at 10:32 AM, Ted Yu wrote: >> > bq. if I can just rmr stuff under /hbase-unsecure/splitWAL/... >> > >> > Please don't. >> > >> > Have you checked region server log on dn29.manage.com ? >> > >> > What hbase version are you using ? >> > >> > Cheers >> > >> > >> > On Sun, Aug 10, 2014 at 10:27 AM, Thomas Kwan >> > wrote: >> > >> >> And I have a program that do some read operations and it hangs. And I am >> >> seeing >> >> >> >> 2014-08-10 12:22:05,359 DEBUG [main] >> >> client.HConnectionManager$HConnectionImplementation: Removed all >> >> cached region locations that map to >> >> dn29.manage.com,60020,1407600154728 >> >> 2014-08-10 12:22:06,173 DEBUG [main] >> >> client.HConnectionManager$HConnectionImplementation: Removed >> >> dn29.manage.com:60020 as a location of >> >> >> >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> >> for tableName=m_data from cache >> >> 2014-08-10 12:22:07,180 DEBUG [main] >> >> client.HConnectionManager$HConnectionImplementation: Removed >> >> dn29.manage.com:60020 as a location of >> >> >> >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> >> for tableName=m_data from cache >> >> 2014-08-10 12:22:09,193 DEBUG [main] >> >> client.HConnectionManager$HConnectionImplementation: Removed >> >> dn29.manage.com:60020 as a location of >> >> >> >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> >> for tableName=m_data from cache >> >> 2014-08-10 12:22:09,196 DEBUG [main] >> >> client.HConnectionManager$HConnectionImplementation: Removed all >> >> cached region locations that map to >> >> dn29.manage.com,60020,1407600154728 >> >> 2014-08-10 12:22:13,208 DEBUG [main] >> >> client.HConnectionManager$HConnectionImplementation: Removed all >> >> cached region locations that map to >> >> dn29.manage.com,60020,1407600154728 >> >> >> >> I am seeing the following in the hbase master also >> >> >> >> 2014-08-10 10:22:25,016 INFO >> >> [master02.manage.com,6,1407690402682.splitLogManagerTimeoutMonitor] >> >> master.SplitLogManager: total tasks = 1 unassigned = 0 >> >> tasks={/hbase-unsecure/splitWAL/WALs%2Fdn29.manage.com >> >> %2C60020%2C1407600154728-splitting%2Fdn29.manage.com >> >> %252C60020%252C1407600154728.1407621759364=last_update >> >> = 1407690428226 last_version = 53 cur_worker_name = >> >> dn21.manage.com,60020,1407650188526 status = in_progress incarnation = >> >> 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} >> >> >> >> I wonder if I can just rmr stuff under /hbase-unsecure/splitWAL/... >> >> >> >> thanks >> >> thomas >> >> >>
Re: Read operations hanged
Hi Ted, Hbase version is 0.96.0.2.0 Nothing interesting in the hbase log on dn29 and confirmed that region server is running on dn29 When I do 'get', i see hbase(main):001:0> get 'm_data','2fd811c2b1d7476efb16499ccb2b823d' COLUMN CELL ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. is not online at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2585) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3952) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26925) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) On Sun, Aug 10, 2014 at 10:32 AM, Ted Yu wrote: > bq. if I can just rmr stuff under /hbase-unsecure/splitWAL/... > > Please don't. > > Have you checked region server log on dn29.manage.com ? > > What hbase version are you using ? > > Cheers > > > On Sun, Aug 10, 2014 at 10:27 AM, Thomas Kwan > wrote: > >> And I have a program that do some read operations and it hangs. And I am >> seeing >> >> 2014-08-10 12:22:05,359 DEBUG [main] >> client.HConnectionManager$HConnectionImplementation: Removed all >> cached region locations that map to >> dn29.manage.com,60020,1407600154728 >> 2014-08-10 12:22:06,173 DEBUG [main] >> client.HConnectionManager$HConnectionImplementation: Removed >> dn29.manage.com:60020 as a location of >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> for tableName=m_data from cache >> 2014-08-10 12:22:07,180 DEBUG [main] >> client.HConnectionManager$HConnectionImplementation: Removed >> dn29.manage.com:60020 as a location of >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> for tableName=m_data from cache >> 2014-08-10 12:22:09,193 DEBUG [main] >> client.HConnectionManager$HConnectionImplementation: Removed >> dn29.manage.com:60020 as a location of >> >> m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. >> for tableName=m_data from cache >> 2014-08-10 12:22:09,196 DEBUG [main] >> client.HConnectionManager$HConnectionImplementation: Removed all >> cached region locations that map to >> dn29.manage.com,60020,1407600154728 >> 2014-08-10 12:22:13,208 DEBUG [main] >> client.HConnectionManager$HConnectionImplementation: Removed all >> cached region locations that map to >> dn29.manage.com,60020,1407600154728 >> >> I am seeing the following in the hbase master also >> >> 2014-08-10 10:22:25,016 INFO >> [master02.manage.com,6,1407690402682.splitLogManagerTimeoutMonitor] >> master.SplitLogManager: total tasks = 1 unassigned = 0 >> tasks={/hbase-unsecure/splitWAL/WALs%2Fdn29.manage.com >> %2C60020%2C1407600154728-splitting%2Fdn29.manage.com >> %252C60020%252C1407600154728.1407621759364=last_update >> = 1407690428226 last_version = 53 cur_worker_name = >> dn21.manage.com,60020,1407650188526 status = in_progress incarnation = >> 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} >> >> I wonder if I can just rmr stuff under /hbase-unsecure/splitWAL/... >> >> thanks >> thomas >>
Read operations hanged
And I have a program that do some read operations and it hangs. And I am seeing 2014-08-10 12:22:05,359 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed all cached region locations that map to dn29.manage.com,60020,1407600154728 2014-08-10 12:22:06,173 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed dn29.manage.com:60020 as a location of m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. for tableName=m_data from cache 2014-08-10 12:22:07,180 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed dn29.manage.com:60020 as a location of m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. for tableName=m_data from cache 2014-08-10 12:22:09,193 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed dn29.manage.com:60020 as a location of m_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a. for tableName=m_data from cache 2014-08-10 12:22:09,196 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed all cached region locations that map to dn29.manage.com,60020,1407600154728 2014-08-10 12:22:13,208 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed all cached region locations that map to dn29.manage.com,60020,1407600154728 I am seeing the following in the hbase master also 2014-08-10 10:22:25,016 INFO [master02.manage.com,6,1407690402682.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 0 tasks={/hbase-unsecure/splitWAL/WALs%2Fdn29.manage.com%2C60020%2C1407600154728-splitting%2Fdn29.manage.com%252C60020%252C1407600154728.1407621759364=last_update = 1407690428226 last_version = 53 cur_worker_name = dn21.manage.com,60020,1407650188526 status = in_progress incarnation = 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} I wonder if I can just rmr stuff under /hbase-unsecure/splitWAL/... thanks thomas
Hbase MR Job with 2 OutputForm classes possible?
Hi there, I have a Hbase MR job that reads data from HDFS, do a Hbase Get, and then do some data transformation. Then I need to put the data back to Hbase as well as write data to a HDFS file directory (so I can import it back into Hive). The current job creation logic is similar to the following: public static Job createHBaseJob(Configuration conf, String []args) throws IOException { Path inputDir = new Path(args[0]); String tableName = args[1]; String params = args[2]; Job job = new Job(conf, NAME + "_" + tableName + " " + params); job.setJarByClass(MyMap.class); job.setInputFormatClass(TextInputFormat.class); job.setMapperClass(MyMap.class); FileInputFormat.setInputPaths(job, inputDir); // No reducers. Just write straight to table. Call initTableReducerJob // to set up the TableOutputFormat. TableMapReduceUtil.initTableReducerJob(tableName, null, job); job.setNumReduceTasks(0); TableMapReduceUtil.addDependencyJars(job); return job; } TableMapReduceUtil.initTableReducerJob is already setup the OutputFormat class. I wonder if there is magic that I can do to pipe the data to a HDFS file as well. Currently I just have 2 jobs. One writes to Hbase and one writes HDFS. But in the current setup, I need to do the Hbase get twice. Any input is highly welcome!! thomas
Get operation slows in MR job
Hi there, I have a MR job that does Get and then Put operation to a Hbase table. For the write operation, I am using TableOutputFormat (like to the map function in https://github.com/larsgeorge/hbase-book/blob/master/ch07/src/main/java/mapreduce/ImportFromFile.java ). The write is pretty fast, 200K requests/sec. But the read operations are slow, like 2K requests/sec. I wonder if anyone has recommendation on how to improve read operations. I am using batched Gets already. thanks in advance