Re: Hadoop AWS module (Spark) is inventing a secret-ket each time
I have sent the message there as well, I thought I would send it here as well because im actually setting up the hadoopConf On Wed, Mar 8, 2017 at 6:49 PM, Ravi Prakash <ravihad...@gmail.com> wrote: > Sorry to hear about your travails. > > I think you might be better off asking the spark community: > http://spark.apache.org/community.html > > On Wed, Mar 8, 2017 at 3:22 AM, Jonhy Stack <so.jo...@gmail.com> wrote: > >> Hi, >> >> I'm trying to read a s3 bucket from Spark and up until today Spark always >> complain that the request return 403 >> >> hadoopConf = spark_context._jsc.hadoopConfiguration() >> hadoopConf.set("fs.s3a.access.key", "ACCESSKEY") >> hadoopConf.set("fs.s3a.secret.key", "SECRETKEY") >> hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AF >> ileSystem") >> logs = spark_context.textFile("s3a://mybucket/logs/*) >> >> Spark was saying Invalid Access key [ACCESSKEY] >> >> However with the same ACCESSKEY and SECRETKEY this was working with >> aws-cli >> >> aws s3 ls mybucket/logs/ >> >> and in python boto3 this was working >> >> resource = boto3.resource("s3", region_name="us-east-1") >> resource.Object("mybucket", "logs/text.py") \ >> .put(Body=open("text.py", "rb"),ContentType="text/x-py") >> >> so my credentials ARE invalid and the problem is definitely something >> with Spark.. >> >> Today I decided to turn on the "DEBUG" log for the entire spark and to my >> suprise... Spark is NOT using the [SECRETKEY] I have provided but >> instead... add a random one??? >> >> 17/03/08 10:40:04 DEBUG request: Sending Request: HEAD >> https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS >> ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4 >> Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, >> Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: >> application/x-www-form-urlencoded; charset=utf-8, ) >> >> This is why it still return 403! Spark is not using the key I provide >> with fs.s3a.secret.key but instead invent a random one EACH time (everytime >> I submit the job the random secret key is different) >> >> For the record I'm running this locally on my machine (OSX) with this >> command >> >> spark-submit --packages com.amazonaws:aws-java-sdk-pom >> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py >> >> Could some one enlighten me on this? >> > >
Hadoop AWS module (Spark) is inventing a secret-ket each time
Hi, I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403 hadoopConf = spark_context._jsc.hadoopConfiguration() hadoopConf.set("fs.s3a.access.key", "ACCESSKEY") hadoopConf.set("fs.s3a.secret.key", "SECRETKEY") hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") logs = spark_context.textFile("s3a://mybucket/logs/*) Spark was saying Invalid Access key [ACCESSKEY] However with the same ACCESSKEY and SECRETKEY this was working with aws-cli aws s3 ls mybucket/logs/ and in python boto3 this was working resource = boto3.resource("s3", region_name="us-east-1") resource.Object("mybucket", "logs/text.py") \ .put(Body=open("text.py", "rb"),ContentType="text/x-py") so my credentials ARE invalid and the problem is definitely something with Spark.. Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one??? 17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one EACH time (everytime I submit the job the random secret key is different) For the record I'm running this locally on my machine (OSX) with this command spark-submit --packages com.amazonaws:aws-java-sdk- pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py Could some one enlighten me on this?
Re: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase
On Mon, Sep 17, 2012 at 6:55 AM, Dai, Jason jason@intel.com wrote: Hi, I'd like to announce Project Panthera, our open source efforts that showcase better data analytics capabilities on Hadoop/HBase (through both SW and HW improvements), available at https://github.com/intel-hadoop/project-panthera. ... 2) A document store (built on top of HBase) for better query processing Under Project Panthera, we will gradually make our implementation of the document store available as an extension to HBase (https://github.com/intel-hadoop/hbase-0.94-panthera). Specifically, today's release provides document store support in HBase by utilizing co-processors, which brings up-to 3x reduction in storage usage and up-to 1.8x speedup in query processing. Going forward, we will also use HBase-6800https://issues.apache.org/jira/browse/HBASE-6800 as the umbrella JIRA to track our efforts to get the document store idea reviewed and hopefully incorporated into Apache HBase. Thanks for open sourcing this stuff Jason. It looks great. I took a quick look. Like Andy, I see that Pathera -- great name by the way, J-D is playing Pantera (too!) loud here in our space since this note showed up on the list -- includes a full HBase. Do you have to deliver Panthera that way? Can we help make it so you do not need to include HBase core? Do you have a list of things we need to change so you can go downstream of core? Good on you Jason, St.Ack
Re: why hbase doesn't provide Encryption
On Tue, Sep 4, 2012 at 9:52 PM, Farrokh Shahriari mohandes.zebeleh...@gmail.com wrote: Hello I just wanna know why hbase doesn't provide Encryption ? Please be more specific. You want us to encrypt each cell for you automatically. How would you suggest it work in a generic way? Thanks, St.Ack
Re: HBase MasterNotRunningException
On Thu, Aug 30, 2012 at 12:17 PM, Jilani Shaik jilani2...@gmail.com wrote: telnet is working for 60010, 60030 and 9000 from both the local and remote boxes. Then the hbase daemons are not running or as Anil is suggesting, the connectivity between machines needs fixing (It looks like all binds to localhost.. can you fix that?). Once your connectivity fixed, then try running HBase. St.Ack
Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain
On Thu, Jun 7, 2012 at 2:18 AM, Manu S manupk...@gmail.com wrote: *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain This is pretty basic. Fix this first and then your hbase will work. Please stop spraying your queries across multiple lists. Doing so makes us think you arrogant which I am sure is not the case. Pick the list that seems most appropriate. For example, in this case, it seems like the hbase-user list would have been the right place to write; not common-user and cdh-user. If it turns out you've chosen wrong, usually the chosen list will help you figure the proper target. Thanks, St.Ack
Re: Need Help with HBase
On Tue, Aug 16, 2011 at 3:53 PM, Taylor, Ronald C ronald.tay...@pnnl.gov wrote: Re file systems: while HBase can theoretically run on other scalable file systems, I remember somebody on the HBase list saying, in effect, that unless you are a file system guru and willing to put in a heck of a lot of work, the only practical choice as an underlying file system is Hadoop's HDFS. I think that was something like half a year ago or more, so maybe things have changed. Any of the HBase developers on the HBase list have an update (or a correction to my recollection)? See our book on 'Which Hadoop': http://hbase.apache.org/book.html#hadoop. It tells our 'which version of hadoop' story. It talks of how you need to use the unreleased branch-0.20-append branch or run Cloudera's CDH3u1. It also mentions the newcomer MapR as an alternative. They did the work to make HBase run on their filesystem. St.Ack
Re: HBase Mapreduce cannot find Map class
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description for some help. St.Ack On Thu, Jul 28, 2011 at 4:04 AM, air cnwe...@gmail.com wrote: -- Forwarded message -- From: air cnwe...@gmail.com Date: 2011/7/28 Subject: HBase Mapreduce cannot find Map class To: CDH Users cdh-u...@cloudera.org import java.io.IOException; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.mapred.TableMapReduceUtil; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.lib.NullOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class LoadToHBase extends Configured implements Tool{ public static class XMapK, V extends MapReduceBase implements MapperLongWritable, Text, K, V{ private JobConf conf; @Override public void configure(JobConf conf){ this.conf = conf; try{ this.table = new HTable(new HBaseConfiguration(conf), observations); }catch(IOException e){ throw new RuntimeException(Failed HTable construction, e); } } @Override public void close() throws IOException{ super.close(); table.close(); } private HTable table; public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException{ String[] valuelist = value.toString().split(\t); SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd HH:mm:ss); Date addtime = null; // 用户注册时间 Date ds = null; Long delta_days = null; String uid = valuelist[0]; try { addtime = sdf.parse(valuelist[1]); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } String ds_str = conf.get(load.hbase.ds, null); if (ds_str != null){ try { ds = sdf.parse(ds_str); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } }else{ ds_str = 2011-07-28; } if (addtime != null ds != null){ delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 * 60 * 1000); } if (delta_days != null){ byte[] rowKey = uid.getBytes(); Put p = new Put(rowKey); p.add(content.getBytes(), attr1.getBytes(), delta_days.toString().getBytes()); table.put(p); } } } /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub int exitCode = ToolRunner.run(new HBaseConfiguration(), new LoadToHBase(), args); System.exit(exitCode); } @Override public int run(String[] args) throws Exception { // TODO Auto-generated method stub JobConf conf = new JobConf(getClass()); TableMapReduceUtil.addDependencyJars(conf); FileInputFormat.addInputPath(conf, new Path(args[0])); conf.setJobName(LoadToHBase); conf.setJarByClass(getClass()); conf.setMapperClass(XMap.class); conf.setNumReduceTasks(0); conf.setOutputFormat(NullOutputFormat.class); JobClient.runJob(conf); return 0; } } execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/ and it says: .. 11/07/28 17:20:29 INFO mapred.JobClient: Task Id : attempt_201107261532_2625_m_04_1, Status : FAILED java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
Re: Custom TableInputFormat not working correctly
Do you have 100k rows? St.Ack On Sun, Jun 19, 2011 at 8:49 AM, edward choi mp2...@gmail.com wrote: Hi, I have implemented a custom TableInputFormat. I call it TableInputFormatMapPerRow and that is exactly what it does. The getSplits() of my custom TableInputFormat creates a TableSplit for each row in the HBase. But when I actually run an application with my custom TableInputFormat, there are not enough map tasks than there should be. I really don't know what I am doing wrong. Any suggestions please? Below is my TableInputFormatMapPerRow.java Ed -- /** * Copyright 2007 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hbase.mapreduce; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.InputFormat; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.util.StringUtils; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.KeyOnlyFilter; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormatBase; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.util.Bytes; /** * Convert HBase tabular data into a format that is consumable by Map/Reduce. */ public class TableInputFormatMapPerRow extends InputFormatImmutableBytesWritable, Result implements Configurable { private final Log LOG = LogFactory.getLog(TableInputFormatMapPerRow.class); /** Job parameter that specifies the input table. */ public static final String INPUT_TABLE = hbase.mapreduce.inputtable; /** Base-64 encoded scanner. All other SCAN_ confs are ignored if this is specified. * See {@link TableMapReduceUtil#convertScanToString(Scan)} for more details. */ public static final String SCAN = hbase.mapreduce.scan; /** Column Family to Scan */ public static final String SCAN_COLUMN_FAMILY = hbase.mapreduce.scan.column.family; /** Space delimited list of columns to scan. */ public static final String SCAN_COLUMNS = hbase.mapreduce.scan.columns; /** The timestamp used to filter columns with a specific timestamp. */ public static final String SCAN_TIMESTAMP = hbase.mapreduce.scan.timestamp; /** The starting timestamp used to filter columns with a specific range of versions. */ public static final String SCAN_TIMERANGE_START = hbase.mapreduce.scan.timerange.start; /** The ending timestamp used to filter columns with a specific range of versions. */ public static final String SCAN_TIMERANGE_END = hbase.mapreduce.scan.timerange.end; /** The maximum number of version to return. */ public static final String SCAN_MAXVERSIONS = hbase.mapreduce.scan.maxversions; /** Set to false to disable server-side caching of blocks for this scan. */ public static final String SCAN_CACHEBLOCKS = hbase.mapreduce.scan.cacheblocks; /** The number of rows for caching that will be passed to scanners. */ public static final String SCAN_CACHEDROWS = hbase.mapreduce.scan.cachedrows; /** The configuration. */ private Configuration conf = null; private HTable table = null; private Scan scan = null; private TableRecordReader tableRecordReader = null; /** * Returns the current configuration. * * @return The current configuration. * @see org.apache.hadoop.conf.Configurable#getConf() */ @Override public Configuration getConf() { return conf; } /** * Sets the configuration. This is used to set the details for the table to * be scanned. * * @param
Re: Hbase startup error: NoNode for /hbase/master after running out of space
On Mon, Jun 6, 2011 at 8:29 PM, Zhong, Sheng sheng.zh...@searshc.com wrote: I am appreciated by any help and suggestion. P.S: we're using apache hadoop 0.20.2 and hbase 0.20.3, and zookeeper is running via zookeeper-3.2.2 (not managed by Hbase). Can you upgrade you hbase and hadoop? St.Ack
Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar
On Mon, Jun 6, 2011 at 6:23 AM, praveenesh kumar praveen...@gmail.com wrote: Changing the name of the hadoop-apppend-core.jar file to hadoop-0.20.2-core.jar did the trick.. Its working now.. But is this the right solution to this problem ?? It would seem to be. Did you have two hadoop*jar versions in your lib directory by any chance? You did not remove the first? St.Ack
Client seeing wrong data on nodeDataChanged
I'm trying to debug an issue that maybe you fellas have some ideas for figuring. In short: Client 1 updates a znode setting its content to X, then X again, then Y, and then finally it deletes the znode. Client 1 is watching the znode and I can see that its getting three nodeDataChanged events and a nodeDeleted. Client 2 is also watching the znode. It gets notified three times: two nodeDataChanged events(only) and a nodeDeleted event. I'd expect 3 nodeDataChanged events but understand a client might skip states. The problem is that when client 2 looks at the data in the znode on nodeDataChanged, for both cases the data is Y. Not X and then Y, but Y both times. This is unexpected. This is 3.3.1 on a 5 node ensemble. I have full zk logging enabled. Would it help posting these? St.Ack
Re: Client seeing wrong data on nodeDataChanged
On Thu, Oct 28, 2010 at 7:32 PM, Ted Dunning ted.dunn...@gmail.com wrote: Client 2 is not guaranteed to see X if it doesn't get to asking before the value has been updated to Y. Right, but I wouldn't expect the watch to be triggered twice with value Y. Anyways, I think we have a handle on whats going on: at the time of the above incident, the master process is experiencing a flood of zk changes and our thought is that we're not paying sufficient attention to the order of receipt. Will be back if this is not the issue. Thanks, St.Ack
Re: HBase MR: run more map tasks than regions
On Tue, Sep 14, 2010 at 10:10 AM, Alex Baranau alex.barano...@gmail.com wrote: Is the only way is to enhance TableInputFormat for me? Currently, yes, you must enhance TIF or use an alternate TIF. St.Ack
Re: Meaning of storefileIndexSize
On Tue, May 18, 2010 at 2:15 AM, Renaud Delbru renaud.del...@deri.org wrote: Hi, after some tuning, like increasing the hfile block size to 128KB, I have noticed that the storefileIndexSize is now half of what it was before (~250). Do storefileIndexSize is the size of the in-memory hfile block index ? Yes. So, yes, doubling the block size should halve the index size. How come your index is so big? Do you have big keys? Lots of data? Lots of storefiles? Looking in HRegionServer I see that its calculated so: storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024); In the Store, we do this: /** * @return The size of the store file indexes, in bytes. */ long getStorefilesIndexSize() { long size = 0; for (StoreFile s: storefiles.values()) { Reader r = s.getReader(); if (r == null) { LOG.warn(StoreFile + s + has a null Reader); continue; } size += r.indexSize(); } return size; } The indexSize is out of the HFile metadata. St.Ack Thanks -- Renaud Delbru On 17/05/10 15:27, Renaud Delbru wrote: Hi, I would like to understand the meaning of the storefileIndexSize metric, could someone point me to a definition or explain me what does that mean ? Also, we are performing a large table import (90M rows, size of the row varying between hundreds of kb to 8 MB), and we are encountering memory problem (OOME). My observation is that it always happens after a while, when the storefileIndexSize starts to be large ( 500). Is there a way to reduce it ? Thanks,
Re: Meaning of storefileIndexSize
On Tue, May 18, 2010 at 9:04 AM, Renaud Delbru renaud.del...@deri.org wrote: How come your index is so big? Do you have big keys? Lots of data? Lots of storefiles? We have 90M of rows, each rows varies from a few hundreds of kilobytes to 8MB. Index keeps the 'key' that starts each block in an hfile and its offset where the 'key' is a combination of row+column+timestamp (not the value). Your 'keys' are large? I have also changed at the same time another parameter, the hbase.hregion.max.filesize. It was set to 1GB (from previous test), and I switch it back to the default value (256MB). So, in the previous tests, there was a few number of region files (like 250), but a very large index file size (500). In my last test (hregion.max.filesize=256, block size=128K), the number of region files increased (I have now more than 1000 region file), but the index file size is now less than 200. Do you think the hregion.max.filesize could had impact on the index file size ? Hmm. You have same amount of data just more files because you lowered max filesize (by a factor of 4 so 4x the number of files) so I'd expect that index would be of the same size. If inclined to do more digging, you can use the hfile tool: ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile Do the above and you'll get usage. Print out the metadata on hfiles. Might help you figure whats going on. Looking in HRegionServer I see that its calculated so: storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024); So, storefileIndexSize indicates the number of MB of heap used by the index. And, in our case, 500 was too excessive given the fact that our region server is limited to 1GB of heap. If 1GB only, then yeah, big indices will cause a prob. How many regions per regionserver? Sounds like you have a few? If so, can you add more servers? Or up the RAM in your machines? Yours, St.Ack
Re: HBase MR with Filter
On Tue, May 18, 2010 at 9:25 AM, Patrick Datko patrick.da...@ymc.ch wrote: Hey, i'm building a Map Reduce Job which should get Data from a HBase Table filter it an store the reduced data in another HBase Table. I used the SingleColumnValueFilter to limit the Data that will be commit to the Map-Process. The Problem is, the Filter doesn't reduce the Data but commit all Data in the Table to the Map Process: The code looks like this for the Filter: Scan scan = new Scan(); String columns = details; String qualifier = details:page; String value = 5; scan.setFilter(new SingleColumnValueFilter(Bytes.toBytes(columns), Bytes.toBytes(qualifier), CompareOp.EQUAL, Bytes.toBytes(5))); TableMapReduceUtil.initTableMapperJob(books, scan, mapper.class, ImmutableBytesWritable.class, IntWritable.class, job); And this was my Code for filling the Table: Put put = new Put(rowkey); put.add(Bytes.toBytes(details), Bytes.toBytes(page), Bytes.toBytes(rand.nextInt(20))); and I didn't understand why the filter doesn't work! I hope anybody can help me. Best regards, Patrick
Re: HBase MR with Filter
The below looks 'right'. Maybe try uncommenting this: this.comparator.compareTo(Arrays.copyOfRange(data, offset, offset + length)); //if (LOG.isDebugEnabled()) { // LOG.debug(compareResult= + compareResult + + Bytes.toString(data, offset, length)); //} ...in SingleColumnValueFilter. It might give you a clue as to where things are going awry. St.Ack On Tue, May 18, 2010 at 9:25 AM, Patrick Datko patrick.da...@ymc.ch wrote: Hey, i'm building a Map Reduce Job which should get Data from a HBase Table filter it an store the reduced data in another HBase Table. I used the SingleColumnValueFilter to limit the Data that will be commit to the Map-Process. The Problem is, the Filter doesn't reduce the Data but commit all Data in the Table to the Map Process: The code looks like this for the Filter: Scan scan = new Scan(); String columns = details; String qualifier = details:page; String value = 5; scan.setFilter(new SingleColumnValueFilter(Bytes.toBytes(columns), Bytes.toBytes(qualifier), CompareOp.EQUAL, Bytes.toBytes(5))); TableMapReduceUtil.initTableMapperJob(books, scan, mapper.class, ImmutableBytesWritable.class, IntWritable.class, job); And this was my Code for filling the Table: Put put = new Put(rowkey); put.add(Bytes.toBytes(details), Bytes.toBytes(page), Bytes.toBytes(rand.nextInt(20))); and I didn't understand why the filter doesn't work! I hope anybody can help me. Best regards, Patrick
Re: ClassNotFoundException: org.apache.hadoop.hbase.client.idx.IdxQualifierType
You have IHBase jar in your CLASSPATH? St.Ack On Mon, May 17, 2010 at 3:56 AM, Nitin Goel nitin.g...@in.fujitsu.com wrote: Hi, I am new to HBase and I am trying to use hbql on HBase. There I am getting the following exception Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/idx/IdxQualifierType at org.apache.hadoop.hbase.hbql.mapping.FieldType.clinit(FieldType.java:5 0) at org.apache.hadoop.hbase.hbql.mapping.ColumnDefinition.getFieldType(Colum nDefinition.java:159) at org.apache.hadoop.hbase.hbql.mapping.ColumnDefinition.newMappedColumn(Co lumnDefinition.java:99) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.columnDefinition(HBqlParse r.java:4565) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.columnDefinitionnList(HBql Parser.java:4358) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.familyMapping(HBqlParser.j ava:4317) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.familyMappingList(HBqlPars er.java:4197) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.attribMapping(HBqlParser.j ava:2577) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.hbqlStmt(HBqlParser.java:9 86) at org.apache.hadoop.hbase.hbql.antlr.HBqlParser.hbqlStatement(HBqlParser.j ava:463) at org.apache.hadoop.hbase.hbql.parser.ParserUtil.parseHBqlStatement(Parser Util.java:163) at org.apache.hadoop.hbase.hbql.impl.Utils.parseHBqlStatement(Utils.java:49 ) at org.apache.hadoop.hbase.hbql.impl.HStatementImpl.execute(HStatementImpl. java:159) at org.apache.hadoop.hbase.hbql.impl.HConnectionImpl.execute(HConnectionImp l.java:314) at org.apache.hadoop.hbase.hbql.impl.MappingManager.validatePersistentMetad ata(MappingManager.java:59) at org.apache.hadoop.hbase.hbql.impl.HConnectionImpl.init(HConnectionImpl .java:92) at org.apache.hadoop.hbase.jdbc.impl.ConnectionImpl.init(ConnectionImpl.j ava:64) at org.apache.hadoop.hbase.jdbc.Driver.getConnection(Driver.java:85) at org.apache.hadoop.hbase.jdbc.Driver.connect(Driver.java:74) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:207) at com.fujitsu.fla.tsig.gdb.hbase.HBaseHelper.main(HBaseHelper.java:42) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.idx.IdxQualifierType at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) ... 22 more Could you please let me know in which jar I can find org.apache.hadoop.hbase.client.idx.IdxQualifierType class? I have checked the source code of HBase 0.20.4, however I didn't find the required class? Thanks Regards, Nitin Goel DISCLAIMER: This e-mail and any attached files may contain confidential and/or privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive this e-mail for the recipient), you may not review, copy or distribute this message. Please contact the sender by reply e-mail and delete all copies of this message:Fujitsu Consulting India Pvt Limited
Re: Inverted word index...
... and you've seen http://github.com/akkumar/hbasene and http://github.com/thkoch2001/lucehbase? St.Ack On Mon, May 17, 2010 at 1:07 AM, Kevin Apte technicalarchitect2...@gmail.com wrote: Consider a search system with an inverted word index- in other words, an index which points to document location- with these columns- word, document ID and possibly timestamp. Given a word, how will I know which tablet to scan to find all Document IDs, with the given word. If you are indexing a large database - say 50 TB, then each word may be split across multiple tablets. There may be hundreds of such tablets each with a large number of SSTables to store the index. How will I know which tablet to search for? Is there a master index that specifies which tablet has words with range say ro to ru ? Or do I have to lookup Bloom Filters for every tablet? Kevin
Re: HBase client hangs after upgrade to 0.20.4 when used from reducer
The below looks same as blockage seen in logs posted to hbase-2545 by Kris Jirapinyo. Todd has made a fix and added a test. We'll roll a 0.20.5 hbase with this fix and a fix for hbase-2541 (missing licenses from the head of some source files) soon as we get confirmation that Todd's fix works for Kris Jirapinyo's seemingly similar issue. Thanks, St.Ack On Fri, May 14, 2010 at 9:07 AM, Todd Lipcon t...@cloudera.com wrote: It appears like we might be stuck in an infinite loop here: IPC Server handler 9 on 60020 daemon prio=10 tid=0x2aaeb42f7800 nid=0x6508 runnable [0x445bb000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.regionserver.ExplicitColumnTracker.checkColumn(ExplicitColumnTracker.java:128) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:165) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:176) It's holding a lock that some other threads are blocked on. In both of your pastes, there are some threads stuck here. JD, any thoughts? Looks like you made some changes to this code for 0.20.4. -Todd On Fri, May 14, 2010 at 6:56 AM, Friso van Vollenhoven fvanvollenho...@xebia.com wrote: Hi Todd, The row counting works fine. It is quite slow, I have to say. But I have never used row counting from the shell before, so I don't know what performance to expect from it or how it is implemented. It's just that a regular scan from software is way faster. Also, from our application we do a full table scan to populate an in-memory index of row keys, because we need to be able to quickly determine if a certain key exists or not. I triggered this scan from our UI while there were hanging reducers. This also works fine. There are close to 5 million records in the table and I checked in the web interface that the table is divided across all 4 region servers, so this process should hit them all. The earlier jstacks of the region servers were taken when the reducers (clients) were hanging, before the shutdown was requested. Some 3 to 5 reduce tasks hang, not all of them, but surely more than just one. Because of your question about what is locked up (client or region server), I SSH'ed into each of the worker machines after giving HBase the shutdown signal (so the shutdown sequence started) and tried to see if the region server was still running and if so, shutdown each individual region server manually (doing 'hbase-daemon.sh stop regionserver' on each, I'm glad there are only 4 nodes). I found that: - one of the region servers actually shut down normally (worker3) - two region servers shut down normally after the hbase-daemon.sh command (worker4 and worker1) - one region server does not shut down (worker2) I put some additional info on pastebin. Here is the jstack of worker2 (the hanging one): http://pastebin.com/5V0UZi7N There are two jstack outputs, one from before the shutdown command was given and one (starting at line 946) from after the shutdown command was given. Here are the logs of that region server: http://pastebin.com/qCXSKR2A I set the log level for org.apache.hadoop.hbase to DEBUG before doing all this, so it's more verbose (I don't know if this helps). So, it appears that it is one of the region servers that is locked up, but only for some connections while it can still serve other connections normally. From the locked up region server logs, it looks like the shutdown sequence runs completely, but the server just won't die afterwards (because of running non-daemon threads; maybe it should just do a System.exit() if all cleanup is successful). At least nothing gets corrupted, which is nice. Of course I am still trying to find out why things get locked up in the first place. I did this test twice today. During the first run it was a different region server that was hanging, so I think it has nothing to do with a problem related to that specific machine. My next step is to go through code (including HBase's, so It will take me some time...) and see what exactly happens in our scenario, because from my current knowledge the jstack outputs don't mean enough to me. Friso On May 13, 2010, at 7:09 PM, Todd Lipcon wrote: Hi Friso, When did you take the jstack dumps of the region servers? Was it when the reduce tasks were still hanging? Do all of the reduce tasks hang or is it just one that gets stuck? If, once the reduce tasks are hung, you open the hbase shell and run count 'mytable', 10 does it successfully count the rows? (I'm trying to determine if the client is locked up, or one of the RSes is locked up) Enjoy your holiday! -Todd On Thu, May 13, 2010 at 12:38 AM, Friso van Vollenhoven fvanvollenho...@xebia.com wrote: Hi, I was kind of hoping that this was a known thing and I was just overlooking something. Apparently it requires more investigation.
Re: GSoC 2010: ZooKeeper Monitoring Recipes and Web-based Administrative Interface
On Thu, May 13, 2010 at 2:30 AM, Andrei Savu savu.and...@gmail.com wrote: Hi all, My name is Andrei Savu and I am on of the GSoC2010 accepted students. My mentor is Patrick Hunt. Good to meet you Andrei. Are there any HBase / Hadoop specific ZooKeeper monitoring requirements? In the hbase shell, you can poke at your zk ensemble currently. Here is what it looks like: hbase(main):001:0 zk ZooKeeper -server host:port cmd args connect host:port get path [watch] ls path [watch] set path data [version] delquota [-n|-b] path quit printwatches on|off create [-s] [-e] path data acl stat path [watch] close ls2 path [watch] history listquota path setAcl path acl getAcl path sync path redo cmdno addauth scheme auth delete path [version] setquota -n|-b val path Thats pretty great. What'd be sweeter would be addition of a zktop command. I know its a python script at mo. Maybe there is a pure java implementation? Also in our UI, you can browse to a page of basic ensemble stats. Would be excellent if instead that were the fancy-pants zktop output. Or, if you are doing a zk UI anyways, just make sure it packaged in a way that makes it easy for us to launch as part of our UI? I'd image if it packged as a WAR file that should be fine but we'd need some way of passing in where the zk ensemble is, perhaps as arguments on the url? Thanks for writing the list Andrei, St.Ack
Re: HBase has entered Debian (unstable)
You are a good man Thomas. Thanks for pushing this through. St.Ack On Thu, May 13, 2010 at 1:59 AM, Thomas Koch tho...@koch.ro wrote: Hi, HBase 0.20.4 has entered Debian unstable, should slide into testing after the usual 14 day period and will therefor most likely be included in the upcomming Debian Squeeze. http://packages.debian.org/source/sid/hbase Please note, that this packaging effort is still very much work-in-progress and not yet suitable for production use. However the aim is to have a rock solid stable HBase in squeeze+1 respectively in Debian testing in the next months. Meanwhile the HBase package in Debian can raise HBase's visibility and lower the entrance barrier. So if somebody wants to try out HBase (on Debian), it is as easy as: aptitude install zookeeperd hbase-masterd In other news: zookeeper is in Debian testing as of today. Best regards, Thomas Koch, http://www.koch.ro
Re: Regionservers crash with an OutOfMemoryException after a data-intensive map reduce job..
Hello Vidhyashankar: How many regionservers? What version of hbase and hadoop? How much RAM on these machines in total? Can you give HBase more RAM? Also check that you don't have an exceptional cell in your input -- one that is very much larger than the 14KB you not below. 12 column families is at the extreme regards what we've played with, just FYI. You might try with a schema that has less: e.g. one CF for the big cell value and all others into the second CF. There may also be corruption in one of the storefiles given that the OOME below seems to happen when we try and open a region (but the fact of opening may have no relation to why the OOME). St.Ack On Thu, May 13, 2010 at 10:35 AM, Vidhyashankar Venkataraman vidhy...@yahoo-inc.com wrote: This is similar to a mail sent by another user to the group a couple of months back.. I am quite new to Hbase and I’ve been trying to conduct a basic experiment with Hbase.. I am trying to load 200 million records each record around 15 KB : with one column value around 14KB and the rest of the 100 column values 8 bytes each.. The 120 columns are grouped as 10 qualifiers X 12 families: hope I got my jargon right.. Note that only one value is quite large for each doc (when compared to other values)... The data is uncompressed.. And each value is uniformly randomly selected.. I used a map-reduce job to load a data file on hdfs into the database.. Soon after the job finished, the region servers crash with OOM Exception.. Below is part of the trace from the logs in one of the RS’s: I have attached the conf along with the email: Can you guys point out any anamoly in my settings? I have set a heap size of 3 gigs.. Anything significantly more, java 32-bit doesn’t run.. 2010-05-12 19:22:45,068 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=8.43782MB (8847696), Free=1791.2247MB (1878235312), M ax=1799.6626MB (1887083008), Counts: Blocks=1, Access=16947, Hit=52, Miss=16895, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.3068389603868127%, Miss Ratio=99 .69316124916077%, Evicted/Run=NaN 2010-05-12 19:22:45,069 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col5/7617863559659933969, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:45,075 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col6/1328113038200437659, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false 2010-05-12 19:22:45,078 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col6/6484804359703635950, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:45,082 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col7/1673569837212457160, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false 2010-05-12 19:22:45,085 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col7/4737399093829085995, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:47,238 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col8/8446828932792437464, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false2010-05-12 19:22:47,241 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col8/974386128174268353, isReference=false, sequen ce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:48,804 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col9/2096232603557969237, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:48,807 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col9/7088206045660348092, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegion: region DocData,4824176,1273625075099/1651418343 available; sequence id is 29607328 41 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: DocData,40682172,1273607630618 2010-05-12 19:22:48,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region DocData,40682172,1273607630618, encoded=271889952 2010-05-12 19:22:50,924 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/271889952/CONTENT/4859380626868896307, isReference=false, sequence id=2959849236, length=337563, majorCompaction=false2010-05-12 19:22:53,037 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/271889952/CONTENT/952776139755887312, isReference=false, sequ ence id=2082553088, length=110460013, majorCompaction=false 2010-05-12 19:22:57,404 DEBUG
Re: tables disappearing after upgrading 0.20.3 = 0.20.4
Whats the shelll say? Does it see the tables consistently? Can you count your content consistently? St.Ack On Thu, May 13, 2010 at 4:53 PM, Viktors Rotanovs viktors.rotan...@gmail.com wrote: Hi, after upgrading from 0.20.3 to 0.20.4 a list of tables almost immediately becomes inconsistent - master.jsp shows no tables even after creating test table in hbase shell, tables which were available before start randomly appearing and disappearing, etc. Upgrading was done by stopping, upgrading code, and then starting (no dump/restore was done). I didn't investigate yet, just checking if somebody had the same problem or if I did upgrade right (I had exactly the same issue in the past when trying to apply HBASE-2174 manually). Environment: Small tables, 100k rows Amazon EC2, c1.xlarge instance type with Ubuntu 9.10 and EBS root, HBase installed manually 1 master (namenode + jobtracker + master), 3 slaves (tasktracker + datanode + regionserver + zookeeper) Hadoop 0.20.1+169.68~1.karmic-cdh2 from Cloudera distribution Flaky DNS issue present, happens about once per day even with dnsmasq installed (heartbeat every 1s, dnsmasq forwards requests once per minute), DDNS set for internal hostnames. This is a testing cluster, nothing important on it. Cheers, -- Viktors
Re: Enabling IHbase
You saw this package doc over in the ihbase's new home on github? http://github.com/ykulbak/ihbase/blob/master/src/main/java/org/apache/hadoop/hbase/client/idx/package.html It'll read better if you build the javadoc. There is also this: http://github.com/ykulbak/ihbase/blob/master/README St.Ack On Wed, May 12, 2010 at 8:27 AM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hi Alex, Thanks for your help, but I meant something more like a how-to set it up thing, or like a tutorial of it (= I also read these ones if anyone else is interested. http://blog.sematext.com/2010/03/31/hbase-digest-march-2010/ http://search-hadoop.com/m/5MBst1uL87b1 Renato M. 2010/5/12 alex kamil alex.ka...@gmail.com regarding usage this may be helpful https://issues.apache.org/jira/browse/HBASE-2167 On Wed, May 12, 2010 at 10:48 AM, alex kamil alex.ka...@gmail.com wrote: Renato, just noticed you are looking for *Indexed *Hbase i found this http://blog.reactive.org/2010/03/indexed-hbase-it-might-not-be-what-you.html Alex On Wed, May 12, 2010 at 10:42 AM, alex kamil alex.ka...@gmail.comwrote: http://www.google.com/search?hl=ensource=hpq=hbase+tutorialaq=faqi=g-p1g-sx3g1g-sx4g-msx1aql=oq=gs_rfai= On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hi eveyone, I just read about IHbase and seems like something I could give it a try, but I haven't been able to find information (besides descriptions and advantages) regarding to how to install it or use it. Thanks in advance. Renato M.
Re: Problem with performance with many columns in column familie
You could try thread-dumping the regionserver to try and figure where its hung up. Counters are usually fast so maybe its something to do w/ 8k of them in the one row. What kinda numbers are you seeing? How much RAM you throwing at the problem? Yours, St.Ack On Tue, May 11, 2010 at 8:51 AM, Sebastian Bauer ad...@ugame.net.pl wrote: Hi, maybe i'll get help here :) I have 2 tables, UserToAdv and AdvToUsers. UserToAdv is simple: { row_id = [ {adv:id:counter }, {adv:id:counter }, .about 100 columns ] only one kind of operation is perform - increasing counter: client.atomicIncrement(UsersToAdv, ID, column, 1) AdvToUsers have one column familie: user: inside this i have about 8000 columns with format: user:cookie what i'm doing on DB is increasing counter inside user:cookie: client.atomicIncrement(AdvToUsers, ID, column, 1) i have 2 regions: first one: UsersToAdv,6FEC716B3960D1E8208DE6B06993A68D,1273580007602 stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=9, storefileIndexSizeMB=0 UsersToAdv,0FDD84B9124B98B05A5E40F47C12DC45,1273580531847 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 AdvToUsers,5735,1273580575873 stores=1, storefiles=1, storefileSizeMB=15, memstoreSizeMB=10, storefileIndexSizeMB=0 UsersToAdv,67CB411B48A7B83F0B863AC615285060,1273580533380 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,4012667F3E78C6431E3DD84641002FCE,1273580532995 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,5FE4A7506737CE0F38E254E62E23FE45,1273580533380 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,47E95EE30A11EBE45F055AC57EB2676E,1273580532995 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,37F9573415D9069B7E5810012AAD9CB7,1273580532258 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,1FFFDF082566D93153B34BFE0C44A9BF,1273580532173 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,17C93FB0047BC4D660C6570B734CBE17,1273580531847 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,27DFD8F02CD98FF57E8334837C73C57A,1273580532173 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 second one: UsersToAdv,57C568066D35D09B4AF6CD7D68681144,1273580533427 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,4FA6A1A2681E2D252CCF765B140369EF,1273580533427 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 AdvToUsers,,1273580575966 stores=1, storefiles=1, storefileSizeMB=1, memstoreSizeMB=1, storefileIndexSizeMB=0 UsersToAdv,07B296AC590061025B382B163E3C149E,1273580533023 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 UsersToAdv,3015D5DB07E2F4D30A19DEB354A85B52,1273580532258 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 AdvToUsers,5859,1273580580940 stores=1, storefiles=1, storefileSizeMB=9, memstoreSizeMB=9, storefileIndexSizeMB=0 AdvToUsers,5315,1273580575966 stores=1, storefiles=1, storefileSizeMB=14, memstoreSizeMB=12, storefileIndexSizeMB=0 AdvToUsers,5825,1273580580940 stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=8, storefileIndexSizeMB=0 AdvToUsers,5671,1273580578114 stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=7, storefileIndexSizeMB=0 UsersToAdv,,1273580533023 stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4, storefileIndexSizeMB=0 AdvToUsers,5457,1273580578114 stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=8, storefileIndexSizeMB=0 number of queries on both tables are equal, but load is greater on second region because of AdvToUsers is there any solution to increase performance atomicIncrement operation on column families with so many(8000) columns? Thank You, Sebastian Bauer
Re: WrongRegionException -- add_table.rb screwed up my hbase.
Sorry for trouble caused. I thought that 0.20.4 added updating of .regioninfo on renable of table but I don't see it. Nonetheless, I'd suggest you update to 0.20.4. It should have fixes at least to save your from WRE going forward. Thanks for writing the list, St.Ack On Tue, May 11, 2010 at 9:20 PM, maxjar10 jcuz...@gmail.com wrote: Answered my own question. The .regioninfo files are there specifically for performing fsck functionalities like using add_table.rb. The problem is that the .regioninfo files are NOT updated after an alter. This issue is described in: https://issues.apache.org/jira/browse/HBASE-2366 The purpose of the .regioninfo files is described here: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html maxjar10 wrote: Ok, here's my story in case anyone else encounters the same issue... My question is this... Why does the table descriptor/meta table information not match the .regioninfo in each region sub dir? Is this a bad thing? Read below... HBase 0.23-1 Hadoop 0.20.1 So I wanted to add compression to my HBase tables that I already had setup. So, I went to the hbase shell and ran a alter table set compression to GZ and decreased the versions from 3 to 2. I then ran a major_compact on my table to put the change into effect. Even though this appears to happen instantly I know you need to wait for fragmentation to drop to 0%. Now, I wanted to run a job and saw an exception: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=Businesses,30012005:14317197,1269286328987 for region Businesses,30012005:14317197,1269286328987, row '30012013:692', but failed after 10 attempts. How very strange... Well, this was odd so I went to HBase shell and ran a count Businesses and it hung (looped, whatever) when it got ~ to the above start row. So, I cancelled the count and saw a brief message in the datanode log about the fact that there was a WrongRegionException. Hmmm... When I looked at the .META. tables I saw that the endRow didn't match up to the next startRow like it should so it looked as though a region was missing. Double h... Because I clearly had something wrong I decided to try to run the add_table.rb as suggested in another thread about seeing WrongRegionExceptions. So, I proceeded to run the add_table.rb only to have it fail. I modified the script and noticed that it was outputting the .regioninfo information that is in each regions sub dir. The problem is that this .regioninfo DOES NOT match the actual table description. There was no COMPRESSION and the version was back at 3. This was confusing because I could see in the actual data when I would open the data files that it was clearly gzipped. Sigh, so, I went ahead and modified the script to output what does match the table description and the only thing it uses the .regioninfo files for is the startKey and the endKey. The rest is the data that actually matches my alter. As of right now the count is proceeding passed the row I had a problem with BUT I'm not sure if the data is actually good. I'll need to scan a few rows. My question is this... Why does the table descriptor/meta table information not match the .regioninfo in each region sub dir? Is this a bad thing? Thanks! -- View this message in context: http://old.nabble.com/WrongRegionExceptionadd_table.rb-screwed-up-my-hbase.-tp28531026p28531887.html Sent from the HBase User mailing list archive at Nabble.com.
Re: java.lang.IllegalAccessError while opening region
This looks like https://issues.apache.org/jira/browse/HBASE-1925. A storefile in the problematic region is missing its sequenceid. Try and figure it (see the issue for clues). There is also the hfile tool for examing metainfo in storefiles: ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile Do the above for usage. Move aside the bad storefile in hdfs (./bin/hadoop fs -mv SRC TGT). Region should deploy then. Thereafter, please update your hbase so you get the fix that makes it so this doesn't happen in future. Yours, St.Ack On Mon, May 10, 2010 at 1:35 PM, g00dn3ss g00dn...@gmail.com wrote: Hi All, I have reams of errors in my master log for a single region. It keeps trying and failing to open the region. An example of the exception is below. Are there conceivable ways of fixing this or am I most likely going to have to delete the region. If deleting the region is the likely option, are there any tools to help with this? I saw the recent message from Stack describing how to delete a region. Does that mean I have to manually edit .META. from the hbase shell? Do I also have to edit the .regioninfo files in the modified regions' directories with the new start/end keys? Thanks! (Exception follows) 2010-05-10 13:15:48,114 INFO master.ProcessRegionClose$1 (ProcessRegionClose.java:call(85)) - region set as unassigned: regionid 2010-05-10 13:15:48,123 INFO master.RegionManager (RegionManager.java:doRegionAssignment(337)) - Assigning region regionid to regionserver whatnot 2010-05-10 13:15:48,144 INFO master.ServerManager (ServerManager.java:processMsgs(441)) - Processing MSG_REPORT_CLOSE: regionid: java.lang.IllegalAccessError: Has not been initialized at org.apache.hadoop.hbase.regionserver.StoreFile.getMaxSequenceId(StoreFile.java:216) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:417) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1641) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:320) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1575) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1542) at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1462) at java.lang.Thread.run(Unknown Source)
Re: locality and tasktracker vs split hostname
On Mon, May 10, 2010 at 6:26 PM, John Sichi jsi...@facebook.com wrote: ... (a) making the Hadoop locality code use a hostname comparison which is insensitive to the presence of the trailing dot or (b) making the HBase split's hostname consistent with the task tracker Any opinions? Lets do (b). Users will see the fix sooner. Want to file an issue John? Good on you, St.Ack
Re: Multiple get
Is this the new patch up in hbase-1845 that you are messing with Slava? If so, please add stacktrace to the issue so we can take a look. St.Ack On Tue, May 4, 2010 at 5:01 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi. After I applied the patch and started to use multiple get, the test application started to hangs on exit (DestroyJavaVM thread hangs) after I called Htable.batch() . Regular get operation is continue to work as expected. Best Regards. On Thu, Mar 4, 2010 at 9:24 PM, Slava Gorelik slava.gore...@gmail.comwrote: Thank You. At least I can apply the patch on 0.20.3. On Thu, Mar 4, 2010 at 7:18 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, Mar 4, 2010 at 9:07 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi Erik, Thank You for the quick reply. Is this patch will be integrated into HBase 0.21.0 ? If it's ready before we release, then yes. If yes, there any estimated release date for the 0.21.0 ? It's complicated. Currently it would be at least after the Hadoop 0.21.0 release (which also has no estimate since Y! isn't focusing on it). J-D Thank You. On Thu, Mar 4, 2010 at 5:14 PM, Erik Holstad erikhols...@gmail.com wrote: Hey Slava! There is work being done to be able to do this https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832114#action_12832114 I think that there is also another Jira that s related to this topic, but I don't know what that number is. -- Regards Erik
Re: Improving HBase scanner
Are you waiting too long between invocations of next? (i.e. the scanner lease period?) Or, perhaps you are fetching too much in the one go. If you fetch 1000 at a time -- scanner caching -- and you don't get the next batch within the scanner lease period, again you will timeout. St.Ack On Tue, May 4, 2010 at 1:46 AM, Michelan Arendse miche...@addynamo.com wrote: Hi I would like to know how configure HBase to improve the scanner fetching data from the table or another method of using scanner, as my database is very large and scanner times out. Kind Regards, Michelan Arendse Junior Developer | AD:DYNAMO // happy business ;-) Office 0861 Dynamo (0861 396266) | Fax +27 (0) 21 465 2587 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com http://www.addynamo.com
Re: HBase / Pig integration
On Tue, May 4, 2010 at 12:02 AM, Dmitriy Ryaboy dmit...@twitter.com wrote: We have an apache license file in the root of the project; I am not sure if we need to put it in every file. Will check with the lawyers. Generally you put notice at head of each src file. I was particularly referring to this that we have in all hbase files: * Copyright 2010 The Apache Software Foundation We got this practise from hadoop but looking there, they no longer seem to do it (I need to talk to lawyers too -- smile). Regarding the first and last slice, the problem is that I have no way of knowing what the first and last, respectively, key values are. With the first slice I can maybe cache the first key I see, and use that in conjunction with the end of the region to calculate the size of the keyspace; but with that last region, the max is infinity, so I can't really estimate how much more I have left until I have none.. do regions store any metadata that countains a rough count of the number of records they hold? Regions no. StoreFiles yes. They have number of entries but this is not really available via API. We should expose it or something like it. It could only be an estimate since delete and put both records and a delete can remove cell, column or family. I guess they only keep track of the byte size of the data, not the number of records per se. Maybe I can get the total byte size of the region, and calculate offsets based on the size of the returned data? This would be likely wrong due to pushed down projections and filters, of course. Any other ideas? How do people normally handle this when writing regular MR jobs that scan HBase tables? I think most tasks that go against hbase should 0% progress and then 100% when done. We could expose a getLastRowInRegion or what if we added an estimated row count to the Split (Maybe thats not the right place to expose this info? What is the canonical way?). I suspect this is actually a bit of a problem, btw -- since I don't report the amount of remaining work for these slices accurately, and I (hopefully) do a reasonable job for the ones where I can calculate the size of the keyspace, speculative execution may get overeager with these two slices. Good point. We should fix this. Keeping a counter of how many rows in a region wouldn't be hard. It could be updated on compaction, etc. A row count would be good enough. PigCounterHelper just deals with some oddities of Hadoop counters (they may not be available when you first try to increment a counter -- the helper buffers increment requests until the reporter becomes available). Are HBase counters special things or also just Hadoop counters under the covers? Check them out. They are not hadoop counters. Keep up a count on anything. Might be of use given what you are doing. Update it thousands of times a second, etc. The lzo files are probably unrelated.. there shouldn't be anything LZO-specific in the HBase code. We are, in fact, lzo'ing hbase content in the sense that that's the compression we have for HDFS, and I think HBase is supposed to inherit that. No. You need to enable it on the column family. See how COMPRESSION can be NONE, GZ, or LZO. LZO needs to be installed as per hadoop. Search the hbase wiki home page for lzo. The hbase+lzo page has had some experience baked in to it so may be of some use to you. St.Ack
Re: Improving HBase scanner
If long periods between next invocations, up the scanner lease. See: property namehbase.regionserver.lease.period/name value6/value descriptionHRegion server lease period in milliseconds. Default is 60 seconds. Clients must report in within this period else they are considered dead./description /property St.Ack On Tue, May 4, 2010 at 7:04 AM, Michelan Arendse miche...@addynamo.com wrote: Yes I am waiting long periods between invocation of next. I didn't know that I am fetching too much data at once. I am using HBase 0.20.3. This is my code: scan.setTimeRange(fromDate.getTime(), toDate.getTime()); ResultScanner scanner = table.getScanner(scan); while( (result = scanner.next()) != null) { channelRow = getChannelDeliveryRow(Bytes.toString(result.getRow())); channelRowList.add(channelRow); } This is some of the output from the log file: 2010-05-04 15:27:44,546 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started. Attempting to free 62791520 bytes 2010-05-04 15:27:44,552 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed. Freed 62797944 bytes. Priority Sizes: Single=279.4997MB (293076672), Multi=224.35243MB (235250576),Memory=0.0MB (0) -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: 04 May 2010 03:55 PM To: hbase-user@hadoop.apache.org Subject: Re: Improving HBase scanner Are you waiting too long between invocations of next? (i.e. the scanner lease period?) Or, perhaps you are fetching too much in the one go. If you fetch 1000 at a time -- scanner caching -- and you don't get the next batch within the scanner lease period, again you will timeout. St.Ack On Tue, May 4, 2010 at 1:46 AM, Michelan Arendse miche...@addynamo.com wrote: Hi I would like to know how configure HBase to improve the scanner fetching data from the table or another method of using scanner, as my database is very large and scanner times out. Kind Regards, Michelan Arendse Junior Developer | AD:DYNAMO // happy business ;-) Office 0861 Dynamo (0861 396266) | Fax +27 (0) 21 465 2587 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com http://www.addynamo.com
Re: HBase / Pig integration
On Tue, May 4, 2010 at 7:34 AM, Stack st...@duboce.net wrote: On Tue, May 4, 2010 at 12:02 AM, Dmitriy Ryaboy dmit...@twitter.com wrote: We should fix this. Keeping a counter of how many rows in a region wouldn't be hard. It could be updated on compaction, etc. A row count would be good enough. Post-Coffee On second thoughts, keeping a counter wouldn't be that easy particularly if multiple column families. I wonder what happens if you do a getClosestRowBefore on the last region passing in the key. Will it return the last row in the region? (I'll try it later if you don't get to it first). St.Ack
Re: Multiple get
I'd suggest sticking full stack trace into the issue so Marc can take a look (it looks like pool-1-thread-1 is not daemon thread). Thanks, St.Ack On Tue, May 4, 2010 at 7:42 AM, Slava Gorelik slava.gore...@gmail.com wrote: Yes, I'm using the patch from 1845. At the when Java program is hanged up there no more stack trace. It's hanging up after Java build in system exit() method. I see only this: EmployeeSample [Java Application]: Sample.EmployeeSample at localhost:2916 : Daemon Thread [main-EventThread] (Running) Daemon Thread [main-SendThread] (Running) Thread [pool-1-thread-1] (Running) Thread [DestroyJavaVM] (Running) Best Regards. On Tue, May 4, 2010 at 4:52 PM, Stack st...@duboce.net wrote: Is this the new patch up in hbase-1845 that you are messing with Slava? If so, please add stacktrace to the issue so we can take a look. St.Ack On Tue, May 4, 2010 at 5:01 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi. After I applied the patch and started to use multiple get, the test application started to hangs on exit (DestroyJavaVM thread hangs) after I called Htable.batch() . Regular get operation is continue to work as expected. Best Regards. On Thu, Mar 4, 2010 at 9:24 PM, Slava Gorelik slava.gore...@gmail.com wrote: Thank You. At least I can apply the patch on 0.20.3. On Thu, Mar 4, 2010 at 7:18 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, Mar 4, 2010 at 9:07 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi Erik, Thank You for the quick reply. Is this patch will be integrated into HBase 0.21.0 ? If it's ready before we release, then yes. If yes, there any estimated release date for the 0.21.0 ? It's complicated. Currently it would be at least after the Hadoop 0.21.0 release (which also has no estimate since Y! isn't focusing on it). J-D Thank You. On Thu, Mar 4, 2010 at 5:14 PM, Erik Holstad erikhols...@gmail.com wrote: Hey Slava! There is work being done to be able to do this https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832114#action_12832114 I think that there is also another Jira that s related to this topic, but I don't know what that number is. -- Regards Erik
Re: HBase / Pig integration
Hey Dmitry: I took a quick look. Your files are missing a copyright? I like your using of BinaryComparatory and the lte, gte, options in skipRegion setting up filters. Regards: // No way to know max.. just return 0. Sorry, reporting on the last slice is janky. // So is reporting on the first slice, by the way -- it will start out too high, possibly at 100%. if (endRow_.length==0) return 0; ...if your keys are kinda regular, you might be able to do better in a slice. See in Bytes where there are methods that do BigDecimal math. You can ask them to divide the slice. Might work. Then you could do progress (Looks like you are doing some later in the file -- does it work?). Try to use the same version ofHBaseConfiguration conf = new HBaseConfiguration(); throughout rather than create a new one each time. Can be more costly. Whats this? if (counterHelper_ == null) counterHelper_ = new PigCounterHelper(); A pig counter? You don't want to use hbase counters? Whats the lzo stuff about? It seems to be for loading files. Are you lzo'ing your hbase content? Oh man ... base64'ing There are two files w/ mention of hbase, is that right? St.Ack On Mon, May 3, 2010 at 12:23 PM, Dmitriy Ryaboy dmit...@twitter.com wrote: Hi folks, I recently rewrote the Pig HBase loader to work with binary data, push down filters, and do other things that make it more versatile. If you use, or plan to use, both Pig and HBase, please try it out, take a look at the code, let me know what you think. I am just starting to learn about HBase, so I am especially interested to learn if there are HBase capabilities I am not using and should be. The code is part of our ElephantBird project, here: http://github.com/kevinweil/elephant-bird/ and more specifically: http://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load/ Thanks, -Dmitriy
Re: EC2 + Thrift inserts
? What is the bottleneck there? CPU utilization and network packets went up when I disabled the indexes, I don't think those are the bottlenecks for the indexes. I was even able to add another 15 insert process (total of 40) and only lost about 10% on a per process throughput. I probably could go even higher, none of the nodes are above CPU 60% utilization and IO wait was at most 3.5%. Each rowkey is unique, so there should not be any blocking on the row locks. I'll do more indexed tests tomorrow. thanks, -chris On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote: Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 and you should be good to go. _18 is a botched release if I ever saw one. -Todd On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas c...@email.com wrote: Hi Stack, Thanks for looking. I checked the ganglia charts, no server was at more than ~20% CPU utilization at any time during the load test and swap was never used. Network traffic was light - just running a count through hbase shell generates a much higher use. One the server hosting meta specifically, it was at about 15-20% CPU, and IO wait never went above 3%, was usually down at near 0. The load also died with a thrift timeout on every single node (each node connecting to localhost for its thrift server), it looks like a datanode just died and caused every thrift connection to timeout - I'll have to up that limit to handle a node death. Checking logs this appears in the logs of the region server hosting meta, looks like the dead datanode causing this error: 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_508630839844593817_11180java.io.IOException: Bad response 1 for block blk_508630839844593817_11180 from datanode 10.195.150.255:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423) The regionserver log on teh dead node, 10.195.150.255 has some more errors in it: http://pastebin.com/EFH9jz0w I found this in the .out file on the datanode: # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x62263c] # # An error report file with more information is saved as: # /usr/local/hadoop-0.20.1/hs_err_pid1364.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # There is not a single error in the datanode's log though. Also of note - this happened well into the test, so the node dying cause the load to abort but not the prior poor performance. Looking through the mailing list it looks like java 1.6.0_18 has a bad rep so I'll update the AMI (although I'm using the same JVM on other servers in the office w/o issue and decent single node performance and never dying...). Thanks for any help! -chris On Apr 28, 2010, at 10:10 PM, Stack wrote: What is load on the server hosting meta like? Higher than others? On Apr 28, 2010, at 8:42 PM, Chris Tarnas c...@email.com wrote: Hi JG, Speed is now down to 18 rows/sec/table per process. Here is a regionserver log that is serving two of the regions: http://pastebin.com/Hx5se0hz Here is the GC Log from the same server: http://pastebin.com/ChrRvxCx Here is the master log: http://pastebin.com/L1Kn66qU The thrift server logs have nothing in them in the same time period. Thanks in advance! -chris On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote: Hey Chris, That's a really significant slowdown. I can't think of anything obvious that would cause that in your setup. Any chance of some regionserver and master logs from the time it was going slow? Is there any activity in the logs of the regionservers hosting the regions of the table being written to? JG -Original Message- From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of Chris Tarnas Sent: Wednesday, April 28, 2010 6:27 PM To: hbase-user@hadoop.apache.org Subject: EC2 + Thrift inserts Hello all, First, thanks to all the HBase developers for producing this, it's a great project and I'm glad to be able to use it. I'm looking for some help and hints here with insert performance help. I'm doing some benchmarking, testing how I can scale up using HBase, not really looking at raw speed. The testing is happening on EC2, using Andrew's scripts (thanks - those were very helpful) to set them up and with a slightly customized version of the default AMIs (added my application modules). I'm using HBase 20.3 and Hadoop 20.1. I've looked at the tips in the Wiki and it looks like Andrew's scripts are already setup that way. I'm inserting into HBase from a hadoop streaming job that runs perl and uses the thrift gateway. I'm also using the Transactional tables so that alone could
Re: hbase with multiple interfaces
On Thu, Apr 29, 2010 at 4:39 AM, Michael Segel michael_se...@hotmail.com wrote: The problem with Hadoop and of course HBase is that they determine their own IP network based on the machine's actual name, so that even if you have multiple interfaces, the nodes will choose the interface that matches the machine name. (IMHO this is a defect that should be fixed.) I made HBASE-2502 to cover this issue. Thanks Michael, St.Ack
Re: Unique row ID constraint
Would the incrementValue [1] work for this? St.Ack 1. http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano tatsuy...@snowcocoa.info wrote: Hi, I'd like to implement unique row ID constraint (like the primary key constraint in RDBMS) in my application framework. Here is a code fragment from my current implementation (HBase 0.20.4rc) written in Scala. It works as expected, but is there any better (shorter) way to do this like checkAndPut()? I'd like to pass a single Put object to my function (method) rather than passing rowId, family, qualifier and value separately. I can't do this now because I have to give the rowLock object when I instantiate the Put. === def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], qualifier: Array[Byte], value: Array[Byte]): Unit = { val get = new Get(rowId) val lock = table.lockRow(rowId) // will expire in one minute try { if (table.exists(get)) { throw new DuplicateRowException(Tried to insert a duplicate row: + Bytes.toString(rowId)) } else { val put = new Put(rowId, lock) put.add(family, qualifier, value) table.put(put) } } finally { table.unlockRow(lock) } } === Thanks, -- 河野 達也 Tatsuya Kawano (Mr.) Tokyo, Japan twitter: http://twitter.com/tatsuya6502
Re: org.apache.hadoop.hbase.UnknownScannerException: Name: -1
Does it work if not transactional in the mix? That might help narrow down what is going on here? St.Ack On Tue, Apr 27, 2010 at 6:51 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi to All. Tried to investigate a bit this problem in the debugger. It looks like the failure is in connecting region server to open scanner. And it seems that there problem to connect region server to open scanner (in any other cases region server is available for example for put/get operations) , after 10 tries the scanner ID is still -1 and it passed to the region server (somehow in this case connection with region server is succeeded) and server throw an exception about wrong scanner name. I have only one region server that is on the same machine where master is located (single node installation - not a pseudo, including zookeeper) and also I'm using transactional table (from contrib). Any idea what could be a problem ? Best Regards. On Thu, Mar 18, 2010 at 5:54 PM, Slava Gorelik slava.gore...@gmail.comwrote: Hi. I also don't have any solution yet. Best Regards. On Thu, Mar 18, 2010 at 8:29 AM, Alex Baranov alex.barano...@gmail.comwrote: I have a similar problem, but even with standard filter, when I use it on the remote client ( http://old.nabble.com/Adding-filter-to-scan-at-remote-client-causes-UnknownScannerException-td27934345.html ). Haven't solved yet. Alex Baranau On Tue, Mar 16, 2010 at 8:12 PM, Slava Gorelik slava.gore...@gmail.com wrote: Hi Dave. Thank You for your reply, but all .out files (master and region server) are empty from any exception. Best Regards. On Tue, Mar 16, 2010 at 7:45 PM, Dave Latham lat...@davelink.net wrote: Is there anything informative in the .out file? I remember one time I had an error in a filter's static initializer that caused the class to fail to load, and it manifested as an uncaught NoClassDefFoundError ( https://issues.apache.org/jira/browse/HBASE-1913 ) showing up there instead of the .log file. Dave On Tue, Mar 16, 2010 at 9:52 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi. Sure i restarted both sides. The log has ony one exception that I specified - Name: -1. Scanner on .META and .ROOT are works fine (I put break points on call() method that actually calls openScanner() and till my scanner it works fine). Best Regards. On Tue, Mar 16, 2010 at 5:39 PM, Stack st...@duboce.net wrote: On Tue, Mar 16, 2010 at 1:42 AM, Slava Gorelik slava.gore...@gmail.com wrote: Hi. I added my filters to the HbaseObjectWritable but the problem is not solved. And for sure you restarted both sides of the connection and both sides are same compiled code? If so, next up will be seeing whats in the log over on the server. Mismatched interfaces are ugly to debug. The messages that come out don't tell much about whats actually wrong. If you remove your code, all works fine? So its just the addition of your filters that is the prob? St.Ack
Re: multiple scanners on same table/region
Are you closing Scanners? If not, they are occupying slots until they time out. St.Ack On Thu, Apr 22, 2010 at 8:10 PM, steven zhuang steven.zhuang.1...@gmail.com wrote: hi, sorry I start another thread here. This mail is actually answer to a previous thread multiple scanners on same table will cause problem? Scan results change among different tries.. the mail system kept saying that I am spamming, now it seems that it's right! :) here is my reply to people in that thread: I don't know if there is a limit on reads to a single row/region in HBase, but if there is, I might have exceeded that limit. :( in my case, there are hundreds of rows, with dozens of kilos of cells in a row(a 256 MB region may contain 10- rows). for each row, I started a thread on each CF, there are 8 of them, so there might be dozens of scanners on the same region. and, to Tim, I could not see your attached mail, my test code is pasted below, it just iterate on the rows and column families, output all the cells. private void doScan() throws Exception { if (null == CopyOfTestTTT234.table) { return; } Scan s = new Scan(); s.setStartRow(aaa.getBytes()); s.setStopRow(ccc.getBytes()); s.setCaching(CopyOfTestTTT234.ROWCACHING); //it's 1 here. ResultScanner scanner = CopyOfTestTTT234.table.getScanner(s); while (true) { Result row = scanner.next(); if(null==row) break; String rowKey = new String(row.getRow()); NavigableMapbyte[], NavigableMapbyte[], byte[] fm = row .getNoVersionMap(); while (fm.size() 0) { Entrybyte[], NavigableMapbyte[], byte[] ee = fm .pollFirstEntry(); String fName = new String(ee.getKey()); NavigableMapbyte[], byte[] ff = ee.getValue(); while (ff.size() 0) { Entrybyte[], byte[] cell = ff.pollFirstEntry(); String key = new String(cell.getKey()); String val = new String(cell.getValue()); System.out.println(Thread.currentThread().hashCode() + \t + rowKey + \t + fName + \t + key + \t + val); } } } }
Re: multiple scanners on same table/region
You may be running into HBASE-2481? St.Ack On Tue, Apr 27, 2010 at 1:30 AM, steven zhuang steven.zhuang.1...@gmail.com wrote: the first thread can be found at: http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/10074 After some dig, it seems that the problem is caused by long pause between two scanner.next() call. In my case the program has to spent a relatively long while to process one row, when it calls scanner.next() again, seems that the returned Result will be null even if there should be more rows in the tables. The rowcaching is set to 1. I have checked some of the source code, seems there is some mechanism which will call the close() method of the ClientScanner, but I am still checking. I don't know if there is a certain timeout on ClientScanner/ScannerCallable after a row has been successfully returned, seems that timeout cause my problem here. Any reply is appreciated. On Fri, Apr 23, 2010 at 11:10 AM, steven zhuang steven.zhuang.1...@gmail.com wrote: hi, sorry I start another thread here. This mail is actually answer to a previous thread multiple scanners on same table will cause problem? Scan results change among different tries.. the mail system kept saying that I am spamming, now it seems that it's right! :) here is my reply to people in that thread: I don't know if there is a limit on reads to a single row/region in HBase, but if there is, I might have exceeded that limit. :( in my case, there are hundreds of rows, with dozens of kilos of cells in a row(a 256 MB region may contain 10- rows). for each row, I started a thread on each CF, there are 8 of them, so there might be dozens of scanners on the same region. and, to Tim, I could not see your attached mail, my test code is pasted below, it just iterate on the rows and column families, output all the cells. private void doScan() throws Exception { if (null == CopyOfTestTTT234.table) { return; } Scan s = new Scan(); s.setStartRow(aaa.getBytes()); s.setStopRow(ccc.getBytes()); s.setCaching(CopyOfTestTTT234.ROWCACHING); //it's 1 here. ResultScanner scanner = CopyOfTestTTT234.table.getScanner(s); while (true) { Result row = scanner.next(); if(null==row) break; String rowKey = new String(row.getRow()); NavigableMapbyte[], NavigableMapbyte[], byte[] fm = row .getNoVersionMap(); while (fm.size() 0) { Entrybyte[], NavigableMapbyte[], byte[] ee = fm .pollFirstEntry(); String fName = new String(ee.getKey()); NavigableMapbyte[], byte[] ff = ee.getValue(); while (ff.size() 0) { Entrybyte[], byte[] cell = ff.pollFirstEntry(); String key = new String(cell.getKey()); String val = new String(cell.getValue()); System.out.println(Thread.currentThread().hashCode() + \t + rowKey + \t + fName + \t + key + \t + val); } } } }
Re: optimizing for random access
On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey ghend...@decarta.com wrote: My thought with memory mapping was, as you noted, *not* to try to map files that are inside of HDFS but rather to copy as many blocks as possible out of HDFS, onto region server filesystems, and memory map the file on the region server. TB drives are now common. The virtual memory system of the Operating System manages paging in and out of real memory off disk when you use memory mapping. My experience with memory mapped ByteBuffer in Java is that it is very fast and scalable. By fast, I mean I have clocked reads in the microseconds using nanotime. So I was just wondering why you wouldn't at least make a 2nd level cache with memory mapping. Are memory-mapped files scalable in java? I'm curious. Its a while since I played with them (circa java 1.5) but then they did not scale. I was only able to open a few files concurrently before I started running into interesting issue. In hbase I'd need to be able to keep hundreds or even thousands open concurrently. I've thought about doing something like you propose Geoff -- keeping some subset of storefiles locally (we could even write two places when compacting say, local and out to hdfs) -- but it always devolved quickly into a complicated mess keeping up the local copy with the remote set making sure the local didn't overflow local storage and that local files were aged out on compactions and splits. If you have suggestion on how it'd work, I'd all ears. Thanks, St.Ack
Slides for HUG10 are up at http://www.meetup.com/hbaseusergroup/calendar/12689490/
... for those interested in the talks at Mondays HUG10. Checkout Andrew's talk on TrendMicro Architecture, Coprocessors in HBase as well as John Sichi's talk on Hive+HBase integration and Mahadev on Zookeeper+HBase. St.Ack
Re: extremely sluggish hbase
On Tue, Apr 20, 2010 at 10:29 AM, Geoff Hendrey ghend...@decarta.com wrote: Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}! Is MR job running concurrently? Whats happening on your servers? High load? I see this error occur frequently in the region server logs. Any ideas on what this might be 2010-04-20 04:19:41,401 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020, call next(-750587486574522252) from 10.241.6.80:51850: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -750587486574522252 I also see this in the regions server logs: 2010-04-20 04:21:44,559 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5849633296569445699 lease expired 2010-04-20 04:21:44,560 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_1799401938583830364_69702 from any node: java.io.IOException: No live nodes contain current block So, this is usually because the client took long between 'next' invocations on the scanner or the server is under such load its holding on to the 'next' call for so long that the next time 'next' is called, the scanner lease has expired. However hadoop dfsadmin -report doesn't show any HDFS issues. Looks totally healthy. When I do status from HBase shell I get hbase(main):008:0 status 2 servers, 0 dead, 484. average load which also seems healthy to me. Your servers are carrying 500 regions each. Any suggestions? Look at top. Look for loading. Are you swapping? Look in hbase logs. Whats it say its doing? Fat GC pauses? St.Ack
Re: extremely sluggish hbase
If you scan '.META.' table is it slow also? You could have a case of hbase-2451? There is a script in the patch to that issue. Try it. See if that helps. St.Ack On Tue, Apr 20, 2010 at 12:02 PM, Geoff Hendrey ghend...@decarta.com wrote: Answers below, prefixed by geoff: -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Tuesday, April 20, 2010 11:23 AM To: hbase-user@hadoop.apache.org Subject: Re: extremely sluggish hbase On Tue, Apr 20, 2010 at 10:29 AM, Geoff Hendrey ghend...@decarta.com wrote: Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}! Is MR job running concurrently? Geoff: no Whats happening on your servers? High load? Geoff: no, 99% idle on both servers I see this error occur frequently in the region server logs. Any ideas on what this might be 2010-04-20 04:19:41,401 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020, call next(-750587486574522252) from 10.241.6.80:51850: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -750587486574522252 I also see this in the regions server logs: 2010-04-20 04:21:44,559 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5849633296569445699 lease expired 2010-04-20 04:21:44,560 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_1799401938583830364_69702 from any node: java.io.IOException: No live nodes contain current block So, this is usually because the client took long between 'next' invocations on the scanner or the server is under such load its holding on to the 'next' call for so long that the next time 'next' is called, the scanner lease has expired. However hadoop dfsadmin -report doesn't show any HDFS issues. Looks totally healthy. When I do status from HBase shell I get hbase(main):008:0 status 2 servers, 0 dead, 484. average load which also seems healthy to me. Your servers are carrying 500 regions each. Geoff: Is this high, moderate, or low for a typical installation? Any suggestions? Look at top. Look for loading. Are you swapping? Geoff: I will look into the swapping and see if I can get some numbers. Look in hbase logs. Whats it say its doing? Fat GC pauses? Geoff: I monitor all the logs and I don't see any GC pauses. I am running 64 bit java with 8GB of heap. I'll look into GC further and see if I can get some concrete data. St.Ack
Re: extremely sluggish hbase
Below it says blockcache is true for the 'info' family on .META. What happens if you scan '-ROOT-', does it still stay info family has blockcache false? Best way to get the script is to either update your hbase to 0.20.4 or just apply said patch. The script will then be in your bin directory. St.Ack On Tue, Apr 20, 2010 at 1:24 PM, Geoff Hendrey ghend...@decarta.com wrote: Does look like the .META. BLOCKCACHE is false. What's the best way to get a patch for https://issues.apache.org/jira/browse/HBASE-2451 hbase(main):001:0 describe .META. DESCRIPTION ENABLED {NAME = '.META.', IS_META = 'true', MEMSTORE_FLUSHSIZE = '16384', F true AMILIES = [{NAME = 'historian', COMPRESSION = 'NONE', VERSIONS = ' 2147483647', TTL = '604800', BLOCKSIZE = '8192', IN_MEMORY = 'false ', BLOCKCACHE = 'false'}, {NAME = 'info', COMPRESSION = 'NONE', VER SIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Tuesday, April 20, 2010 12:45 PM To: hbase-user@hadoop.apache.org Subject: Re: extremely sluggish hbase If you scan '.META.' table is it slow also? You could have a case of hbase-2451? There is a script in the patch to that issue. Try it. See if that helps. St.Ack On Tue, Apr 20, 2010 at 12:02 PM, Geoff Hendrey ghend...@decarta.com wrote: Answers below, prefixed by geoff: -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Tuesday, April 20, 2010 11:23 AM To: hbase-user@hadoop.apache.org Subject: Re: extremely sluggish hbase On Tue, Apr 20, 2010 at 10:29 AM, Geoff Hendrey ghend...@decarta.com wrote: Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}! Is MR job running concurrently? Geoff: no Whats happening on your servers? High load? Geoff: no, 99% idle on both servers I see this error occur frequently in the region server logs. Any ideas on what this might be 2010-04-20 04:19:41,401 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020, call next(-750587486574522252) from 10.241.6.80:51850: error: org.apache.hadoop.hbase.UnknownScannerException: Name: -750587486574522252 I also see this in the regions server logs: 2010-04-20 04:21:44,559 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5849633296569445699 lease expired 2010-04-20 04:21:44,560 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_1799401938583830364_69702 from any node: java.io.IOException: No live nodes contain current block So, this is usually because the client took long between 'next' invocations on the scanner or the server is under such load its holding on to the 'next' call for so long that the next time 'next' is called, the scanner lease has expired. However hadoop dfsadmin -report doesn't show any HDFS issues. Looks totally healthy. When I do status from HBase shell I get hbase(main):008:0 status 2 servers, 0 dead, 484. average load which also seems healthy to me. Your servers are carrying 500 regions each. Geoff: Is this high, moderate, or low for a typical installation? Any suggestions? Look at top. Look for loading. Are you swapping? Geoff: I will look into the swapping and see if I can get some numbers. Look in hbase logs. Whats it say its doing? Fat GC pauses? Geoff: I monitor all the logs and I don't see any GC pauses. I am running 64 bit java with 8GB of heap. I'll look into GC further and see if I can get some concrete data. St.Ack
Re: Zookeeper watcher error: java.lang.NoClassDefFoundError: org/apache/zookeeper/Watcher
) org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:159) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) root cause java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387) org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233) java.lang.ClassLoader.loadClassInternal(ClassLoader.java:398) java.lang.ClassLoader.defineClass1(Native Method) java.lang.ClassLoader.defineClass(ClassLoader.java:698) java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1847) org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:890) org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1354) org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233) java.lang.ClassLoader.loadClassInternal(ClassLoader.java:398) org.apache.hadoop.hbase.client.HConnectionManager.getClientZooKeeperWatcher(HConnectionManager.java:170) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getZooKeeperWrapper(HConnectionManager.java:932) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:948) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:675) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:675) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:638) org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) org.apache.hadoop.hbase.client.HTable.init(HTable.java:128) org.apache.hadoop.hbase.client.HTable.init(HTable.java:106) com.nokia.dataos.api.StoreImpl.get(StoreImpl.java:57) com.nokia.dataos.api.NodeResource.node(NodeResource.java:25) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:124) org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:247) org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:212) org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:202) org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:441) org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:418) org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:111) org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:217) org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:159) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) note The full stack trace of the root cause is available in the Apache Tomcat/6.0.18 logs.
Re: Hackathon agenda
On Sat, Apr 17, 2010 at 11:54 AM, Jonathan Gray jg...@facebook.com wrote: Agreed that it's good to try to be agenda-less, but in the past we've always taken the first couple hours to do a group discussion around some of the key topics. Given there's a bunch of fairly major changes/testing going on these days, I think there is a good bit of stuff that would benefit from group discussion. After that, we can break up into smaller groups or individually to start hacking away. Or for those not interested in the topics, you can just hack from the start. The above sounds good, discussion of a few near future concerns. I think it important we not let it go on too long. All of the below look good. I've added a few comments. (Anyone else have suggestions on what should be discussed?) More potential topics of discussion I had in mind: - Compaction, split, and flush policies/heuristics (HBASE-2453, HBASE-2462, HBASE-2457, HBASE-2375, HBASE-1892, etc...) - Define our desired behaviors related to versioning, deletes, and removal of deletes in minor/major compactions. (HBASE-2453, HBASE-2457, HBASE-2243, etc) The new HBase ACID spec should get at least a passing airing (Its 'done'. All releases post 0.20.4 are required to adhere to it). - Brainstorm on doing better distributed scenario testing (HBASE-2414) We also badly need to work on breaking down current daemons so their functions can be made more standalone and thus testable; e.g. load balancer in master, compacting code in regionservers, etc. - Brainstorm on performance improvement ideas (top HDFS issues, better use of HFile seeking, blooms, block pre-fetch, etc...) Would be cool to have a wiki page w/ a list of these things. A wiki page of items to be discussed? - Brainstorm on new functionality / updated road map. What priorities do the various sponsoring companies have, what are nice to haves but not on anyones schedule yet, etc. Again, this can seed a new (or updated) wiki page and/or update the currently outdated road map wiki page. I moved aside the current 0.21 roadmap moving it to a page of its own rather than have it as a section on the roadmap page (http://wiki.apache.org/hadoop/HBase/Original021Roadmap#preview), and added comments on the state of roadmap items listed therein. A roadmap discussion could start with what is listed in the old roadmap, I'd suggest. I'd suggest that roadmap discussion should not go into deep depth because it has a tendency to consume mostly because it turns into a blue skying session and besides, 0.21 is fairly imminent and there isn't that much we could get into this release anyways. - HBase PR. We could use a new web site (maven and otherwise), a centralized blog, and also a refresh/cleanup of documentation. There's also agreement on shipping w/ a few different configurations, which should be part of a new set of getting started / new user docs. Would like to get everyones thoughts and also come up with a schedule. On monday our petition to become an apache top level project goes before the apache board. If it passes, rolling out a site revamp might be timed to match our move to TLP. - Ideas for future HUGs For anyone that will not be able to attend the hackathon we will post a wrap-up afterwards with notes about all the discussions we had. Whatever comes out of the hackathon should be posted into the proper jiras or mailing list for full community discussion. Also, if anyone was not able to sign up for the HUG or Hackathon (both are full now) and is a regular contributor, please contact me directly. Very awesome. Gonna be a great day of HBase! Agreed. St.Ack JG From: Andrew Purtell [apurt...@apache.org] Sent: Saturday, April 17, 2010 10:28 AM To: hbase-...@hadoop.apache.org Cc: hbase-user@hadoop.apache.org Subject: Hackathon agenda The Hackathon is basically agenda-less, but I'd like to propose a general topic of discussion we should cover while we are all in the room together: - For HBASE-1964 (HBASE-2183, HBASE-2461, and related): injecting and/or mocking exceptions thrown up from DFSClient. I think we want a toolkit for that. Could be incorporated into the unit testing framework. Should be possible to swap out a jar or something and make it active running on a real cluster with real load. Should be possible to inject random exceptions with adjustable probability. So what does HDFS have already? What do we need? If we're adding something, does it make sense to put it into HBase or contribute to HDFS? I think the latter. Let's gather a list of other topics, if any, that hackathon participants want to see covered so we can make sure it will happen. - Andy
Re: hundreds of reads of a metafile
Raghu: Sounds like you have a case of https://issues.apache.org/jira/browse/HBASE-2451. There is a script in the patch that will turn on .META. caching. Its likely off. Yours, St.Ack On Tue, Apr 13, 2010 at 12:19 PM, Raghu Angadi rang...@apache.org wrote: I think this happens all through the job... I need to double check. This job was adding around 300k rows. will get more info on latest runs. Raghu. On Mon, Apr 12, 2010 at 8:33 PM, Stack st...@duboce.net wrote: On Mon, Apr 12, 2010 at 5:04 PM, Stack st...@duboce.net wrote: On Mon, Apr 12, 2010 at 4:00 PM, Raghu Angadi rang...@apache.org wrote: Sorry for the delay. This is 3.5K reads per second. This goes on for may be minutes. Or is this on startup of a big job against a big table? St.Ack
Re: Additional Information about HBase internals?
You've seen the wiki at hbase.org? There is a dated architecture document there but beside is a link to recent docs by our Lars George that are detailed with lots of nice pictures. If the above is insufficient, please let us know what you need. St.Ack On Fri, Apr 16, 2010 at 6:36 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hi everyone, I would like to know if there is any additional information about HBase internals besides the Google paper and the code. Thanks in advance. Renato M.
Re: roadmap update?
Hey Thomas: I think you should do the 0.20.x. There'll very likely be a 0.20.5 within the timeframe you are talking of. 0.21 hadoop is getting some loving these times. It could be out by then but I can't say for sure. On the roadmap, yes, needs updating. Next monday at the hackathon it'll get an updating. St.Ack On Wed, Apr 14, 2010 at 6:14 AM, Thomas Koch tho...@koch.ro wrote: Hi, the current roadmap under http://wiki.apache.org/hadoop/HBase/RoadMaps still lists aug/sep 09 for an estimated release date of 0.21 Could you please be so kind to update the site. I'm trying to figure out, whether I should rather consider 0.20.4 or 0.21 for a HBase deployment in mai/june. I also want to build Debian packages of HBase. Would it make sense to package 0.21 for Debian squeeze? Best regards, Thomas Koch, http://www.koch.ro
Re: Region server goes away
On Wed, Apr 14, 2010 at 8:27 PM, Geoff Hendrey ghend...@decarta.com wrote: Hi, I have posted previously about issues I was having with HDFS when I was running HBase and HDFS on the same box both pseudoclustered. Now I have two very capable servers. I've setup HDFS with a datanode on each box. I've setup the namenode on one box, and the zookeeper and HDFS master on the other box. Both boxes are region servers. I am using hadoop 20.2 and hbase 20.3. What do you have for replication? If two datanodes, you've set it to two rather than default 3? I have set dfs.datanode.socket.write.timeout to 0 in hbase-site.xml. This is probably not necessary. I am running a mapreduce job with about 200 concurrent reducers, each of which writes into HBase, with 32,000 row flush buffers. Why don't you try with just a few reducers first and then build it up? See if that works? About 40% through the completion of my job, HDFS started showing one of the datanodes was dead (the one *not* on the same machine as the namenode). Do you think it dead -- what did a threaddump say? -- or was it just that you couldn't get into it? Any errors in the datanode logs complaining about xceiver count or perhaps you need to up the number of handlers? I stopped HBase, and magically the datanode came back to life. Any suggestions on how to increase the robustness? I see errors like this in the datanode's log: 2010-04-14 12:54:58,692 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: D atanodeRegistration(10.241.6.80:50010, storageID=DS-642079670-10.241.6.80-50010- 1271178858027, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel I believe this harmless. Its just the DN timing out the socket -- you set the timeout to 0 in the hbase-site.xml rather than in hdfs-site.xml where it would have an effect. See HADOOP-3831 for detail. to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10 .241.6.80:50010 remote=/10.241.6.80:48320] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeo ut.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutput Stream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutput Stream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSe nder.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSen der.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXcei ver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.ja : Here I show the output of 'hadoop dfsadmin -report'. First time it is invoked, all is well. Second time, one datanode is dead. Third time, the dead datanode has come back to life.: [had...@dt1 ~]$ hadoop dfsadmin -report Configured Capacity: 1277248323584 (1.16 TB) Present Capacity: 1208326105528 (1.1 TB) DFS Remaining: 1056438108160 (983.88 GB) DFS Used: 151887997368 (141.46 GB) DFS Used%: 12.57% Under replicated blocks: 3479 Blocks with corrupt replicas: 0 Missing blocks: 0 - Datanodes available: 2 (2 total, 0 dead) Name: 10.241.6.79:50010 Decommission Status : Normal Configured Capacity: 643733970944 (599.52 GB) DFS Used: 75694104268 (70.5 GB) Non DFS Used: 35150238004 (32.74 GB) DFS Remaining: 532889628672(496.29 GB) DFS Used%: 11.76% DFS Remaining%: 82.78% Last contact: Wed Apr 14 11:20:59 PDT 2010 Yeah, my guess as per above is that the reporting client couldn't get on to the datanode because handlers were full or xceivers exceeded. Let us know how it goes. St.Ack Name: 10.241.6.80:50010 Decommission Status : Normal Configured Capacity: 633514352640 (590.01 GB) DFS Used: 76193893100 (70.96 GB) Non DFS Used: 33771980052 (31.45 GB) DFS Remaining: 523548479488(487.59 GB) DFS Used%: 12.03% DFS Remaining%: 82.64% Last contact: Wed Apr 14 11:14:37 PDT 2010 [had...@dt1 ~]$ hadoop dfsadmin -report Configured Capacity: 643733970944 (599.52 GB) Present Capacity: 609294929920 (567.45 GB) DFS Remaining: 532876144640 (496.28 GB) DFS Used: 76418785280 (71.17 GB) DFS Used%: 12.54% Under replicated blocks: 3247 Blocks with corrupt replicas: 0 Missing blocks: 0 - Datanodes available: 1 (2 total, 1 dead) Name: 10.241.6.79:50010 Decommission Status : Normal Configured Capacity: 643733970944 (599.52 GB) DFS Used: 76418785280 (71.17 GB) Non DFS Used: 34439041024 (32.07 GB) DFS Remaining: 532876144640(496.28 GB) DFS Used%: 11.87% DFS Remaining%: 82.78% Last contact: Wed Apr 14 11:28:38 PDT 2010 Name: 10.241.6.80:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS
Re: can't run data-import and map-reduce-job on a Htable simultaneously
On Wed, Apr 14, 2010 at 4:05 PM, Sujee Maniyam su...@sujee.net wrote: scenario: - I am writing data into Hbase - I am also kicking off a MR job that READS from the same table When the MR job starts, data-inserts pretty much halt, as if the table is 'locked out'. Is this behavior to be expected? No. If you let the MR job run alone, it runs cleanly to the end? St.Ack my pseudo write code : HBaseConfiguration hbaseConfig = new HBaseConfiguration(); Htable table = new HTable(hbaseConfig, logs); table.setAutoFlush(false); table.setWriteBufferSize(1024 * 1024 * 12); My MR job: - reads from log table - writes to another table can I do these two in parallel? thanks Sujee http://sujee.net
Re: Deadlock when mapping a table?
Joost: These are same as what you pasted on IRC? The hold-up passed? Can you make it happen again? St.Ack On Mon, Apr 12, 2010 at 12:18 PM, Joost Ouwerkerk jo...@openplaces.org wrote: Thread dump of TaskTracker: http://gist.github.com/363898 Thread dump of RegionServer: http://gist.github.com/363899 Not clear what's going on. I'm going to have a look at HBASE-2180... joost. On Sat, Apr 10, 2010 at 10:41 PM, Stack st...@duboce.net wrote: On Sat, Apr 10, 2010 at 4:38 PM, Joost Ouwerkerk jo...@openplaces.org wrote: We're mapping a table with about 2 million rows in 100 regions on 40 nodes. In each map, we're doing a random read on the same table. We're encountering a situation that looks alot like deadlock. When the job is launched, some of the tasktrackers appear to get blocked in doing the first random read. The only trace we get is an eventual Unknown Scanner Exception in the RegionServer log, at which point the task is actually reported as successfully completed by MapReduce (1 row processed). There is no error in the task's log. The job completes as SUCCESSFUL with an incomplete number of rows. In the worst case scenario, we've actually seen ALL the tasktrackers encounter this problem; the job completes succesfully with 100 rows processed (1 per region). Any chance of a threaddump on the the problematic RS at the time? Can you even figure the culprit? There is a known deadlock that can happen writing (HBASE-2322) but this seems like something else. If its a deadlock, often JVM can recognize it as so and it'll be detailed on the tail of the threaddump. Todd has been messing too w/ jcarder (sp)? That found HBASE-2322 but thats all it found I believe (I need to run it on next release candidate before it becomes a release candidate). Maybe you're running into very slow reads because you don't have HBASE-2180? St.Ack When we remove the code that does the random read in the map, there are no problems. Anyone? This is driving me crazy because I can't reproduce it locally (it only seems to be a problem in a distributed environment with many nodes) and because there is no stacktrace besides the scanner exception (which is clearly a symptom, not a cause). j
Re: hitting xceiverCount limit (2047)
Looks like you'll have to up your xceivers or up the count of hdfs nodes. St.Ack On Tue, Apr 13, 2010 at 11:37 AM, Sujee Maniyam su...@sujee.net wrote: Hi all, I have been importing a bunch of data into my hbase cluster, and I see the following error: Hbase error : hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink A.B.C.D Hadoop data node error: DataXceiver : java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047 I have configured dfs.datanode.max.xcievers = 2047 in hadoop/conf/hdfs-site.xml Config: amazon ec2 c1.xlarge instances (8 CPU, 8G RAM) 1 master + 4 region servers hbase heap size = 3G Upping the xcievers count, is an option. I want to make sure if I need to tweak any other parameters to match this. thanks Sujee http://sujee.net
Recommendations (WAS - Re: DFSClient errors during massive HBase load)
Thanks for the writing back the list Oded. I changed the subject so its easier to find your suggestions amongst the mailing list weeds going forward. On swappyness, setting it to 0 is extreme but since you've supplied links, users can do as you suggest or do something not so radical. Good stuff, St.Ack On Mon, Apr 12, 2010 at 11:31 AM, Oded Rosen o...@legolas-media.com wrote: The tips you guys gave me made a huge difference. I also used other tips from the Troubleshooting section in hbase wiki, and from all around the web. I would like to share my current cluster configuration, as only few places around the web offer a guided tour of these important configuration changes. This might be helpful for other people with small clusters, that have problems with loading large amounts of data to hbase on a regular basis. I am not a very experienced user (yet...) so if I got something wrong, or if I am missing anything, please say so. Thanks in advance *1. Prevent your regionserver machines from memory swap* - this is a must have, it seems, for small hbase clusters that handle large loads: *Edit this file (on each regionserver) and then activate the following commands.* *File:* /etc/sysctl.conf *Add Values:* vm.swappiness = 0 (this one - on datanodes only!) *Then run (In order to apply the changes immediately):* sysctl -p /etc/sysctl.conf service network restart note: this is a kernel property change. swappiness with a zero value means the machine will not use virtual memory at all (or at least that what I understood). So handle with care. a low value (around 5 or 10, from the maximum value of 100) might also work. My configuration is zero. (Further explanations: http://cloudepr.blogspot.com/2009/09/cluster-facilities-hardware-and.html ) *2. Increase file descriptor limit* - this is also a must have for almost any use of hbase. *Edit these two files (on each datanode/namenode) and then activate the following commands.* *File:* /etc/security/limits.conf *Add Values:* hadoop soft nofile 32768 hadoop hard nofile 32768 *File:* /etc/sysctl.conf *Add Values:* fs.file-max = 32768 *Then run:* sysctl -p /etc/sysctl.conf service network restart note: you can perform steps 1+2 together, they both edit sysctl.conf. notice step 1 is only for regionservers (datanodes), while this one can also be made to the master (namenode) - although I'm not so sure it's necessary. (see http://wiki.apache.org/hadoop/Hbase/Troubleshootinghttp://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6) *3. Raise HDFS + HBASE connection limit upper bound:* Edit hadoop/hbase configuration files to include these entries: (you might want to change the specific values, according to your cluster properties and usage). *File:* hdfs-site.xml *Add Properties:* name: dfs.datanode.max.xcievers value: 2047 name: dfs.datanode.handler.count value: 10 (at least as the number of nodes in the cluster, or more if needed). (see http://wiki.apache.org/hadoop/Hbase/Troubleshootinghttp://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 ) *File:* hbase-site.xml *Add Properties:* property namehbase.regionserver.handler.count/name value100/value /property property namehbase.zookeeper.property.maxClientCnxns/name value100/value /property (see http://hadoop.apache.org/hbase/docs/current/api/overview-summary.htmlhttp://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description) If you can remember other changes you've made to increase hbase stability, you are welcome to reply. Cheers. On Thu, Apr 1, 2010 at 11:43 PM, Andrew Purtell apurt...@apache.org wrote: First, ulimit: 1024 That's fatal. You need to up file descriptors to something like 32K. See http://wiki.apache.org/hadoop/Hbase/Troubleshooting, item #6 From there, let's see. - Andy From: Oded Rosen o...@legolas-media.com Subject: DFSClient errors during massive HBase load To: hbase-user@hadoop.apache.org Date: Thursday, April 1, 2010, 1:19 PM **Hi all, I have a problem with a massive HBase loading job. It is from raw files to hbase, through some mapreduce processing + manipulating (so loading direcly to files will not be easy). After some dozen million successful writes - a few hours of load - some of the regionservers start to die - one by one - until the whole cluster is kaput. The hbase master sees a znode expired error each time a regionserver falls. The regionserver errors are attached. Current configurations: Four nodes - one namenode+master, three datanodes+regionservers. dfs.datanode.max.xcievers: 2047 ulimit: 1024 servers: fedora hadoop-0.20, hbase-0.20, hdfs (private servers, not on ec2 or anything). *The specific errors from the regionserver log (from IP6, see comment):* 2010-04-01 11:36:00,224 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception
Re: hundreds of reads of a metafile
On Mon, Apr 12, 2010 at 4:00 PM, Raghu Angadi rang...@apache.org wrote: Sorry for the delay. This is 3.5K reads per second. This goes on for may be minutes. Against META? I wonder why META blocks are not going up into cache. We have not been running load against this cluster yet. will update soon with behavior with the cluster stability. Good on you Raghu. Overall, do you think writes are expected to be gracefully slowed down if the RAM is not enough? It should. Gates come down to prevent OOMEing and if we start to be overrun with files in filesystem and compactions are not keeping up. St.Ack Raghu. Thanks for digging in Raghu. Looking in logs, lots of churn -- other regionservers shedding regions, restarting? -- and write load would probably do better if given more RAM. You keep hitting the ceiling where we put down a gate blocking writes until flushes finish. What time interval are we talking of regards the 3.5k reads across 20 blocks? Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory? This regionserver was carrying the .META. region it looks like so its going to be popular. I'd think the cache should be running interference on i/o but maybe its not doing a good job of it. The write load/churn might be blowing the cache. Yeah, log at DEBUG and we'll get a better idea. You re doing big upload? Cache is a config where you set how much of heap to allocate. Default is 0.2 IIRC. St.Ack We are increasing memory from 3GB to 6GB. Any pointers about how to set size of block cache will be helpful. will enable DEBUG for LruBlockCache. Raghu. On Thu, Apr 8, 2010 at 12:46 AM, Stack st...@duboce.net wrote: On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi rang...@apache.org wrote: Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses inline below. Do. A 0.20.4 should be around in next week or so which will be better still, FYI. On Wed, Apr 7, 2010 at 8:52 PM, Stack st...@duboce.net wrote: On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org wrote: We are working with a small HBase cluster (5 nodes) with fairly beefy nodes. While looking at why all the regionservers died at one time, noticed that these servers read some files 100s of times a second. This may not be cause of the error... but do you think this is odd? Check end of regionserver log. Should say why RegionServer went away. The usual reason is long GC pause, one that is longer than zk session timeout. This seems to be the case... There were CMS GC failures (promotion failed, Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minute or so. Is that enough to kill ZK session? We are increasing the memory and will go through tuning tips on wiki. ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this is common enough until a bit of tuning is done. If you update to 0.20.3 at least, the zk session is 60 seconds but you should try and avoid the promotion failures if you can. There are various other errors in the log over couple of hours of RS run. will post a link to the full log. --- failure on RS-72 --- 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x127d58da4e70002 to sun.nio.ch.selectionkeyi...@426295eb java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x27d58da6de0088 to sun.nio.ch.selectionkeyi...@283f4633 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 27 on 60020, call put([...@20a192c7, [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211 : error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345) These look like the zk session time out sequence. --- failure on RS-73 after a few minutes --- 2010-04-06 22:21:41,867 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4957903368956265878 lease expired 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x127d58da4e7002a to sun.nio.ch.selectionkeyi...@15ef1241 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:21:47,806 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39) at java.nio.ByteBuffer.allocate(ByteBuffer.java:312
Re: hundreds of reads of a metafile
On Mon, Apr 12, 2010 at 5:04 PM, Stack st...@duboce.net wrote: On Mon, Apr 12, 2010 at 4:00 PM, Raghu Angadi rang...@apache.org wrote: Sorry for the delay. This is 3.5K reads per second. This goes on for may be minutes. Or is this on startup of a big job against a big table? St.Ack
Re: Couple of HBase related issues...
On Mon, Apr 12, 2010 at 2:20 PM, Something Something mailinglist...@gmail.com wrote: 2) After 30 minutes or so on in-activity, HBase (or may be Zookeeper) stops working. I have to restart it. How do I ensure that it always stays up? What are the symptoms. You have some log on that? What versions are we talking here? Thanks, St.Ack
Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE
is defined in my .bash_profile, so it's already there and I see it expanded in the debug statements with the correct path. I even tried hard-coding the $HBASE_HOME path above just in case and I had the same issue. I any case, I'm passed it now. I'll have to check whether the same issue happens on our dev environment running on Ubuntu on EC2. If not, then at least it's localized to my OSX environment. -GS On Fri, Apr 9, 2010 at 7:32 PM, Stack st...@duboce.net wrote: Very odd. I don't have to do that running MR jobs. I wonder whats different? (I'm using 0.20.4 near-candidate rather than 0.20.3, 1.6.0u14). I have a HADOOP_ENV like this. export HBASE_HOME=/home/hadoop/0.20 export HBASE_VERSION=20.4-dev #export HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.20.4-dev.jar:$HBASE_HOME/build/hbase-0.20.4-dev-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar export HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}.jar:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar St.Ack On Fri, Apr 9, 2010 at 4:19 PM, George Stathis gstat...@gmail.com wrote: Solved: for those interested, I had to explicitly copy zookeeper-3.2.2.jar to $HADOOP_HOME/lib even though I had added its' path to $HADOOP_CLASSPATH under $HADOOP_HOME/conf/hadoop-env.sh. It makes no sense to me why that particular JAR would not get picked up. It was even listed in the classpath debug output when I ran the job using the hadoop shell script. If anyone can enlighten, please do. -GS On Fri, Apr 9, 2010 at 5:56 PM, George Stathis gstat...@gmail.com wrote: No dice. Classpath is now set. Same error. Meanwhile, I'm running $ hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 just fine, so MapRed is working at least. Still looking for suggestions then I guess. -GS On Fri, Apr 9, 2010 at 5:31 PM, George Stathis gstat...@gmail.com wrote: RTFMing http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlright now...Hadoop classpath not being set properly could be the issue... On Fri, Apr 9, 2010 at 5:26 PM, George Stathis gstat...@gmail.com wrote: Hi folks, I hope this is just a newbie problem. Context: - Running 0.20.3 tag locally in pseudo cluster mode - $HBASE_HOME is in env and $PATH - Running org.apache.hadoop.hbase.mapreduce.Export in the shell such as: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels /bkps/channels/01 Symptom: - Getting an NPE at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110): [...] 110 this.scanner = this.htable.getScanner(newScan); [...] Full output is bellow. Not sure why htable is still null at that point. User error? Any help is appreciated. -GS Full output: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels /bkps/channels/01 2010-04-09 17:13:57.407::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-04-09 17:13:57.408::INFO: verisons=1, starttime=0, endtime=9223372036854775807 10/04/09 17:13:58 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Found ROOT at 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cached location for channels,,1270753106916 is 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cache hit for row in tableName channels: location server 192.168.1.16:52159 , location region name channels,,1270753106916 10/04/09 17:13:58 DEBUG mapreduce.TableInputFormatBase: getSplits: split - 0 - 192.168.1.16:, 10/04/09 17:13:58 INFO mapred.JobClient: Running job: job_201004091642_0009 10/04/09 17:13:59 INFO mapred.JobClient: map 0% reduce 0% 10/04/09 17:14:09 INFO mapred.JobClient: Task Id : attempt_201004091642_0009_m_00_0, Status : FAILED java.lang.NullPointerException at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119
Re: hundreds of reads of a metafile
On Thu, Apr 8, 2010 at 2:26 PM, Raghu Angadi rang...@apache.org wrote: Regionserver log on node 72 is at : http://bit.ly/cd9acU (160K gzipped). To give a scale of reads, the local datanode had 3.5K reads spread across about 20 blocks. Pretty much all the reads were from the same DFSClient ID. will watch if this happens again. Thanks for digging in Raghu. Looking in logs, lots of churn -- other regionservers shedding regions, restarting? -- and write load would probably do better if given more RAM. You keep hitting the ceiling where we put down a gate blocking writes until flushes finish. What time interval are we talking of regards the 3.5k reads across 20 blocks? Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory? This regionserver was carrying the .META. region it looks like so its going to be popular. I'd think the cache should be running interference on i/o but maybe its not doing a good job of it. The write load/churn might be blowing the cache. Yeah, log at DEBUG and we'll get a better idea. You re doing big upload? Cache is a config where you set how much of heap to allocate. Default is 0.2 IIRC. St.Ack We are increasing memory from 3GB to 6GB. Any pointers about how to set size of block cache will be helpful. will enable DEBUG for LruBlockCache. Raghu. On Thu, Apr 8, 2010 at 12:46 AM, Stack st...@duboce.net wrote: On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi rang...@apache.org wrote: Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses inline below. Do. A 0.20.4 should be around in next week or so which will be better still, FYI. On Wed, Apr 7, 2010 at 8:52 PM, Stack st...@duboce.net wrote: On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org wrote: We are working with a small HBase cluster (5 nodes) with fairly beefy nodes. While looking at why all the regionservers died at one time, noticed that these servers read some files 100s of times a second. This may not be cause of the error... but do you think this is odd? Check end of regionserver log. Should say why RegionServer went away. The usual reason is long GC pause, one that is longer than zk session timeout. This seems to be the case... There were CMS GC failures (promotion failed, Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minute or so. Is that enough to kill ZK session? We are increasing the memory and will go through tuning tips on wiki. ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this is common enough until a bit of tuning is done. If you update to 0.20.3 at least, the zk session is 60 seconds but you should try and avoid the promotion failures if you can. There are various other errors in the log over couple of hours of RS run. will post a link to the full log. --- failure on RS-72 --- 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x127d58da4e70002 to sun.nio.ch.selectionkeyi...@426295eb java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x27d58da6de0088 to sun.nio.ch.selectionkeyi...@283f4633 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 27 on 60020, call put([...@20a192c7, [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211: error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345) These look like the zk session time out sequence. --- failure on RS-73 after a few minutes --- 2010-04-06 22:21:41,867 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4957903368956265878 lease expired 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x127d58da4e7002a to sun.nio.ch.selectionkeyi...@15ef1241 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:21:47,806 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39) at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) --- This is zk session timeout and an OOME. The GC couldn't succeed. How much memory you giving these puppies? CMS is kinda sloppy so you need to give it a bit more space to work in. [...] 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-7610953303919156937_1089667 from any node: java.io.IOException: No live nodes contain current block [...] Are you
Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE
Very odd. I don't have to do that running MR jobs. I wonder whats different? (I'm using 0.20.4 near-candidate rather than 0.20.3, 1.6.0u14). I have a HADOOP_ENV like this. export HBASE_HOME=/home/hadoop/0.20 export HBASE_VERSION=20.4-dev #export HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.20.4-dev.jar:$HBASE_HOME/build/hbase-0.20.4-dev-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar export HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}.jar:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar St.Ack On Fri, Apr 9, 2010 at 4:19 PM, George Stathis gstat...@gmail.com wrote: Solved: for those interested, I had to explicitly copy zookeeper-3.2.2.jar to $HADOOP_HOME/lib even though I had added its' path to $HADOOP_CLASSPATH under $HADOOP_HOME/conf/hadoop-env.sh. It makes no sense to me why that particular JAR would not get picked up. It was even listed in the classpath debug output when I ran the job using the hadoop shell script. If anyone can enlighten, please do. -GS On Fri, Apr 9, 2010 at 5:56 PM, George Stathis gstat...@gmail.com wrote: No dice. Classpath is now set. Same error. Meanwhile, I'm running $ hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 just fine, so MapRed is working at least. Still looking for suggestions then I guess. -GS On Fri, Apr 9, 2010 at 5:31 PM, George Stathis gstat...@gmail.com wrote: RTFMing http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html right now...Hadoop classpath not being set properly could be the issue... On Fri, Apr 9, 2010 at 5:26 PM, George Stathis gstat...@gmail.comwrote: Hi folks, I hope this is just a newbie problem. Context: - Running 0.20.3 tag locally in pseudo cluster mode - $HBASE_HOME is in env and $PATH - Running org.apache.hadoop.hbase.mapreduce.Export in the shell such as: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels /bkps/channels/01 Symptom: - Getting an NPE at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110): [...] 110 this.scanner = this.htable.getScanner(newScan); [...] Full output is bellow. Not sure why htable is still null at that point. User error? Any help is appreciated. -GS Full output: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels /bkps/channels/01 2010-04-09 17:13:57.407::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-04-09 17:13:57.408::INFO: verisons=1, starttime=0, endtime=9223372036854775807 10/04/09 17:13:58 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Found ROOT at 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cached location for .META.,,1 is 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cached location for channels,,1270753106916 is 192.168.1.16:52159 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cache hit for row in tableName channels: location server 192.168.1.16:52159, location region name channels,,1270753106916 10/04/09 17:13:58 DEBUG mapreduce.TableInputFormatBase: getSplits: split - 0 - 192.168.1.16:, 10/04/09 17:13:58 INFO mapred.JobClient: Running job: job_201004091642_0009 10/04/09 17:13:59 INFO mapred.JobClient: map 0% reduce 0% 10/04/09 17:14:09 INFO mapred.JobClient: Task Id : attempt_201004091642_0009_m_00_0, Status : FAILED java.lang.NullPointerException at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/04/09 17:14:15 INFO mapred.JobClient: Task Id : attempt_201004091642_0009_m_00_1, Status : FAILED java.lang.NullPointerException at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 10/04/09 17:14:21 INFO mapred.JobClient: Task Id : attempt_201004091642_0009_m_00_2, Status : FAILED java.lang.NullPointerException at
Re: exceptions i got in HDFS - append problem?
On Fri, Apr 9, 2010 at 3:07 AM, Gokulakannan M gok...@huawei.com wrote: Hi, I got the following exceptions , when I am using HDFS to write the logs coming from Scribe 1. java.io.IOException: Filesystem closed stack trace call to org.apache.hadoop.fs.FSDataOutputStream::write failed! Above seems to be saying that filesystem is closed and as a consequence, you are not able to write it. 2. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file xxx-2010-04-01-12-40_0 for DFSClient_1355960219 on client 10.18.22.55 because current leaseholder is trying to recreate file stack trace call to org.apache.hadoop.conf.FileSystem::append((Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FSDataOutputStream;)failed! Someone holds the lease on the file you are trying to open? You mention scribe. Do you have hdfs-200 and friends applied to your cluster? I didn't apply the HDFS-265 to my hadoop patch yet. What hadoop version are you running? hdfs-265 won't apply to hadoop 0.20.x if that is what you are running. Are these exceptions due to the bugs in existing append-feature?? or some other reason? Should I need to apply the complete append patch or a simple patch will solve this. I haven't looked, but my guess is that scribe documentation probably has description of the patchset required to run on hadoop. St.Ack
Re: hundreds of reads of a metafile
On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi rang...@apache.org wrote: Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses inline below. Do. A 0.20.4 should be around in next week or so which will be better still, FYI. On Wed, Apr 7, 2010 at 8:52 PM, Stack st...@duboce.net wrote: On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org wrote: We are working with a small HBase cluster (5 nodes) with fairly beefy nodes. While looking at why all the regionservers died at one time, noticed that these servers read some files 100s of times a second. This may not be cause of the error... but do you think this is odd? Check end of regionserver log. Should say why RegionServer went away. The usual reason is long GC pause, one that is longer than zk session timeout. This seems to be the case... There were CMS GC failures (promotion failed, Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minute or so. Is that enough to kill ZK session? We are increasing the memory and will go through tuning tips on wiki. ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this is common enough until a bit of tuning is done. If you update to 0.20.3 at least, the zk session is 60 seconds but you should try and avoid the promotion failures if you can. There are various other errors in the log over couple of hours of RS run. will post a link to the full log. --- failure on RS-72 --- 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x127d58da4e70002 to sun.nio.ch.selectionkeyi...@426295eb java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x27d58da6de0088 to sun.nio.ch.selectionkeyi...@283f4633 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 27 on 60020, call put([...@20a192c7, [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211: error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345) These look like the zk session time out sequence. --- failure on RS-73 after a few minutes --- 2010-04-06 22:21:41,867 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4957903368956265878 lease expired 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x127d58da4e7002a to sun.nio.ch.selectionkeyi...@15ef1241 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-04-06 22:21:47,806 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39) at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) --- This is zk session timeout and an OOME. The GC couldn't succeed. How much memory you giving these puppies? CMS is kinda sloppy so you need to give it a bit more space to work in. [...] 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-7610953303919156937_1089667 from any node: java.io.IOException: No live nodes contain current block [...] Are you accessing from mapreduce? If so, does your hadoop have hdfs-127? Then there are the usual suspects. Xceivers count -- up it to 2k or so -- and ulimit should be much greater than the default 1024. yes. Most of the traffic now is puts from reducers. I think HDFS is a recent Cloudera release. I will check. Most likely it wont have hdfs-127. My guess is that it does.. but yeah, check (You should remember that one -- smile) yup.. we hit Xceivers limit very early. The limit is 2k and fd limit is also high. [...] There are thousands of repeated reads of many small files like this. --- From NN log, this block was created for /hbase/.META./1028785192/info/1728561479703335912 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.META./1028785192/info/1728561479703335912. blk_8972126557191254374_1090962 Btw, we had single replication set for this file by mistake. So, if any error getting a block, there is no recourse. Was there concurrent processes sucking i/o from HDFS running at same time? Writing, clients need to figure where to write. They'll do this by doing lookup in .META. They'll then cache the info. If clients are short-lived, then lots of .META. hits. Client here is HBase client (in our case reducers)? Your reducers run for a while? So clients have chance to exploit cache of meta data info? And as Ryan says, whats the caching stats
Re: Regions assigned multiple times after disabling table
On Thu, Apr 8, 2010 at 6:38 AM, Martin Fiala fial...@gmail.com wrote: Now, the table is disabled, but the region is online on fernet1-v49.ng.seznam.cz!! Is there some race condition? Enabling/disabling largish tables has always been problematic. We intend fixing it properly in the next major hbase but meantime we're patching what is currently in place, effectively the sending of a mesage across the cluster and then hoping rather than guaranteeing its received by all regionservers and that there are no issues during the enable/disable. You could try being extra careful. On disable, ensure all are offline before proceeding. This probably requires manual inspection of meta region (region gets status=offline feature when disabled successfully) for now. 0.20.4 which has improvements that particularly address this issue should be out soon also. Meantime, its probably best to do as little enable/disable as possible. St.Ack
Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.
It'll depend on your access patterns but in general we'll be doing lots of small accesses... many more. A recently added clienttrace log, in this case the client referred to is dfsclient, will log messages like the following: 2010-04-07 22:15:52,078 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.20.20.189:50010, dest: /10.20.20.189:56736, bytes: 2022080, op: HDFS_READ, cliID: DFSClient_-994492608, srvID: DS-1740361948-10.20.20.189-50010-1270703663528, blockid: blk_2797215769808904384_1015 Lots of them, one per access. You could turn them off explicitly in your log4j. That should help. Don't run DEBUG level in datanode logs. Other answers inlined below. On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang steven.zhuang.1...@gmail.com wrote: ... At present, my idea is calculating the data IO quantity of both HDFS and HBase for a given day, and with the result we can have a rough estimate of the situation. Can you use the above noted clientrace logs to do this? Are clients on different hosts -- i.e. the hdfs clients and hbase clients? If so that'd make it easy enough. Otherwise, it'd be a little difficult. There is probably an easier way but one (awkward) means of calculating would be by writing a mapreduce job that took clienttrace messages and al blocks in the filesystem and then had it sort the clienttrace messages that belong to the ${HBASE_ROOTDIR} subdirectory. One problem I met now is to decide from the regionserver log the quantity of data been read/written by Hbase, should I count the lengths in following log records as lengths of data been read/written?: org.apache.hadoop.hbase.regionserver.Store: loaded /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780, isReference=false, sequence id=1526201715, length=*72426373*, majorCompaction=true 2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region table_word_in_doc, resort all-2010/01/01,1267629092479. Current region memstore size *40.5m* here I am not sure the *72426373/40.5m is the length (in byte) of data read by HBase. * Thats just file size. Above we opened a storefile and we just logged its size. We don't log how much we've read/written any where in hbase logs. St.Ack
Re: How to recover or clear the data of a bad region
Delete region from filesystem and from .META. table. Then, to close the gap in .META. that you've just made, merge the two regions on either side of the hole (Or just rewrite the upper or lower regioninfo in .META. so its start/end keys cover over the hole just made). St.Ack On Thu, Apr 8, 2010 at 12:54 AM, 无名氏 sitong1...@gmail.com wrote: hi How to recover or clear the data of a bad region, but not to affect other region, so that continue write/read data to/from hbase. thks.
Re: HTable Client RS caching
Please update your hbase from 0.20.1. Its not much fun for those helping debug issues to discover, after having expended some effort debugging, that the issue has already been fixed. UnknownScannerException usually means the client has taken too long to report back to the regionserver between next invocations or the regionserver was stuck GC'ing longer than the scanner lease. St.Ack On Thu, Apr 8, 2010 at 11:03 AM, Ted Yu yuzhih...@gmail.com wrote: Here are snippets from master log w.r.t. region domaincrawltable,,1270600690648: 2010-04-07 00:00:38,504 DEBUG [RegionManager.metaScanner] master.BaseScanner(385): GET on domaincrawltable,,1270600690648 got different startcode than SCAN: sc=1270602502182, serverAddress=1270597824201 2010-04-07 00:00:38,541 INFO [RegionManager.metaScanner] master.BaseScanner(224): RegionManager.metaScanner scan of 7 row(s) of meta region {server: 10.10.30.82:60020, regionname: .META.,,1, startKey: } complete 2010-04-07 18:19:37,384 DEBUG [HMaster] master.ProcessRegionOpen(98): Adding to onlineMetaRegions: {server: 10.10.30.82:60020, regionname: .META.,,1, startKey: } 2010-04-07 18:19:39,417 INFO [IPC Server handler 11 on 6] master.ServerManager(440): Processing MSG_REPORT_PROCESS_OPEN: domaincrawltable,,1270600690648 from snvgold.pr.com,60020,1270689385704; 1 of 2 2010-04-07 18:19:39,417 INFO [IPC Server handler 11 on 6] master.ServerManager(440): Processing MSG_REPORT_OPEN: domaincrawltable,,1270600690648 from snvgold.pr.com,60020,1270689385704; 2 of 2 2010-04-07 18:19:39,419 DEBUG [HMaster] master.HMaster(486): Processing todo: PendingOpenOperation from snvgold.pr.com,60020,1270689385704 2010-04-07 18:19:39,419 INFO [HMaster] master.ProcessRegionOpen(70): domaincrawltable,,1270600690648 open on 10.10.30.82:60020 2010-04-07 18:19:39,423 INFO [HMaster] master.ProcessRegionOpen(80): Updated row domaincrawltable,,1270600690648 in region .META.,,1 with startcode=1270689385704, server=10.10.30.82:60020 We use hbase 0.20.1 on server and client. The most peculiar log from one of regionservers is: 2010-04-08 10:26:38,391 ERROR [IPC Server handler 61 on 60020] regionserver.HRegionServer(844): org.apache.hadoop.hbase.UnknownScannerException: Name: -1 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1925) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) On Thu, Apr 8, 2010 at 10:40 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: No it's there: domaincrawltable,,1270600690648 J-D On Thu, Apr 8, 2010 at 10:38 AM, Ted Yu yuzhih...@gmail.com wrote: What if there is no region information in NSRE ? 2010-04-08 10:26:38,385 ERROR [IPC Server handler 60 on 60020] regionserver.HRegionServer(846): Failed openScanner org.apache.hadoop.hbase.NotServingRegionException: domaincrawltable,,1270600690648 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1893) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) On Thu, Apr 8, 2010 at 9:39 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Wed, Apr 7, 2010 at 11:38 PM, Al Lias al.l...@gmx.de wrote: Occationally my HTable clients get a response that no server is serving a particular region... Normally, the region is back a few seconds later (perhaps a split?). Or the region moved. Anyway, the client (Using HTablePool) seems to need a restart to forget this. Seems wrong, would love a stack trace. Is there a config value to manipulate the caching time of regionserver assignments in the client? Nope, when the client sees a NSRE, it queries .META. to find the new location. I set a small value for hbase.client.pause to get failures fast. I am using 0.20.3 . Splits are still kinda slow, takes at least 2 seconds to happen, but finding the new location of a region is a core feature in HBase and it's rather well tested, Can you pin down your exact problem? Next time a NSRE happens, see which region it was looking for and grep the master log for it, you should see the history and how much time it took to move. Thx, Al
Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.
On Thu, Apr 8, 2010 at 7:42 PM, steven zhuang steven.zhuang.1...@gmail.com wrote: In this case, lots of access records, but fairly less data than usual Hadoop jobs, can we say usually there are many more blocks involved in a Hbase HDFS access than in a Hadoop HDFS access, this cannot be efficient. For a random access, usually one block only is involved in hbase at least. If you first indexed your content in HDFS, it'd be about the same. I know sometime there are small region store files, but if they are small, they would be merged into one by compaction, right? The aim is storefiles of about the size of a single block. Usually I'd say we spill over into the second block. On compaction files are compacted and will tend to grow in size (add more blocks). Is there anyway we lower number of small data access? maybe by setting higher rowcaching number, but that should be App dependent. Any other options we can use to lower this number? What are you reads like? Lots of random reads? Or are they all scans? Do they adhere to any kind of pattern or are they random? Yes, you could up your cache size too. What is the problem you are trying to address? Are you saying all the i/o is killing our HDFS or something? Or is it just making big logs that you are trying to address? You could turn them off explicitly in your log4j. That should help. Don't run DEBUG level in datanode logs. we are running the cluster at INFO level. Do you see the clienttrace loggings? You could explicitly disable this class's loggings. That should make a big difference in log size. St.Ack Other answers inlined below. On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang steven.zhuang.1...@gmail.com wrote: ... At present, my idea is calculating the data IO quantity of both HDFS and HBase for a given day, and with the result we can have a rough estimate of the situation. Can you use the above noted clientrace logs to do this? Are clients on different hosts -- i.e. the hdfs clients and hbase clients? If so that'd make it easy enough. Otherwise, it'd be a little difficult. There is probably an easier way but one (awkward) means of calculating would be by writing a mapreduce job that took clienttrace messages and al blocks in the filesystem and then had it sort the clienttrace messages that belong to the ${HBASE_ROOTDIR} subdirectory. yeah, the hbase regionserver and datanode are on same host. so I cannot get the data read/written by HBase just from the datanode log. the Map/Reduce way may have a problem, we can not get the historical block info from HDFS file system, I mean there are lots of blocks been garbage collected when we import or delete data. One problem I met now is to decide from the regionserver log the quantity of data been read/written by Hbase, should I count the lengths in following log records as lengths of data been read/written?: org.apache.hadoop.hbase.regionserver.Store: loaded /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780, isReference=false, sequence id=1526201715, length=*72426373*, majorCompaction=true 2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region table_word_in_doc, resort all-2010/01/01,1267629092479. Current region memstore size *40.5m* here I am not sure the *72426373/40.5m is the length (in byte) of data read by HBase. * Thats just file size. Above we opened a storefile and we just logged its size. We don't log how much we've read/written any where in hbase logs. St.Ack
Re: hundreds of reads of a metafile
On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org wrote: We are working with a small HBase cluster (5 nodes) with fairly beefy nodes. While looking at why all the regionservers died at one time, noticed that these servers read some files 100s of times a second. This may not be cause of the error... but do you think this is odd? Check end of regionserver log. Should say why RegionServer went away. The usual reason is long GC pause, one that is longer than zk session timeout. HBase version : 0.20.1. The cluster was handling mainly write traffic. Can you run a more recent hbase Raghu? Lots of fixes since 0.20.1. Note that in datanode log, there are a lot of reads these files. One of RS logs: --- 2010-04-06 21:51:33,923 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: campaign,4522\x234\x23201003,1268865840941 2010-04-06 21:51:34,211 INFO org.apache.hadoop.hbase.regionserver.HRegion: region campaign,4522\x234\x23201003,1268865840941/407724784 available; sequence id is 1607026498 2010-04-06 21:51:43,327 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_8972126557191254374_1090962 from any node: java.io.IOException: No live nodes contain current block 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-5586169098563059270_1078171 from any node: java.io.IOException: No live nodes contain current block 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-7610953303919156937_1089667 from any node: java.io.IOException: No live nodes contain current block [...] Are you accessing from mapreduce? If so, does your hadoop have hdfs-127? Then there are the usual suspects. Xceivers count -- up it to 2k or so -- and ulimit should be much greater than the default 1024. portion of grep for one the blocks mentioned above in datanode log : 39725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, blockid: blk_8972126557191254374_1090962, duration: 97000 2010-04-06 21:51:43,307 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.10.0.72:50010, dest: /10.10.0.72:43699, bytes: 6976, op: HDFS_READ, cliID: DFSClient_-1439725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, blockid: blk_8972126557191254374_1090962, duration: 76000 2010-04-06 21:51:43,310 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.10.0.72:50010, dest: /10.10.0.72:45123, bytes: 6976, op: HDFS_READ, cliID: DFSClient_-1439725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, blockid: blk_8972126557191254374_1090962, duration: 93000 2010-04-06 21:51:43,314 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.10.0.72:50010, dest: /10.10.0.72:41891, bytes: 6976, op: HDFS_READ, cliID: DFSClient_-1439725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, blockid: blk_8972126557191254374_1090962, duration: 267000 2010-04-06 21:51:43,318 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.10.0.72:50010, dest: /10.10.0.72:46412, bytes: 6976, op: HDFS_READ, cliID: DFSClient_-1439725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, blockid: blk_8972126557191254374_1090962, duration: 91000 2010-04-06 21:51:46,330 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.10.0.72:50010, dest: /10.10.0.72:40657, bytes: 6976, op: HDFS_READ, cliID: DFSClient_-1439725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020, blockid: blk_8972126557191254374_1090962, duration: 85000 -- There are thousands of repeated reads of many small files like this. --- From NN log, this block was created for /hbase/.META./1028785192/info/1728561479703335912 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.META./1028785192/info/1728561479703335912. blk_8972126557191254374_1090962 Btw, we had single replication set for this file by mistake. So, if any error getting a block, there is no recourse. Was there concurrent processes sucking i/o from HDFS running at same time? Writing, clients need to figure where to write. They'll do this by doing lookup in .META. They'll then cache the info. If clients are short-lived, then lots of .META. hits. And as Ryan says, whats the caching stats look like for the .META. region? (See server it was hosted on and check its logs -- we dump cache metrics every minute or so). St.Ack
Re: Tuning question...
On Wed, Apr 7, 2010 at 3:36 PM, Michael Dalton mwdal...@gmail.com wrote: Correct me if I'm wrong on any of this, but is escape analysis safe at all to turn on in production given that it's totally disabled now in the most recent Java builds? Thats sounds right. Ryan, you are running u16? Its off in u19 too I believe. St.Ack
Re: enabling hbase metrics on a running instance
Also need to do configuration in hbase/conf/hadoop-metrics.xml (yes, thats hadoop-metrics, not hbase-metrics) which I believe is only read on restart. So double-no. St.Ack On Tue, Apr 6, 2010 at 4:18 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: This boils down to the question: can you enable JMX while the JVM is running? The answer is no (afaik). More doc here http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html J-D On Tue, Apr 6, 2010 at 4:12 PM, Igor Ranitovic irani...@gmail.com wrote: Is it possible to enable the hbase metrics without a restart? Thanks. i.
Re: Beginner question about querying records
2010/4/4 Onur AKTAS onur.ak...@live.com: Thank you very much for your answers.. I'm checking the document that you gave. In short words, unless massive traffic and massive data size and massive scale is needed, stick with regular RDBMSs, then if we grow up to terabytes of data to be querried, then we can switch no NO-SQL databases. Thanks so much. Well, the above is basically the case for hbase. We should be better at smaller scales than we currently are but that is another story. But generalizing your lesson to all NOSQL is another matter. The category is broad covering myriad database types. St.Ack Date: Sat, 3 Apr 2010 20:56:21 -0700 Subject: Re: Beginner question about querying records From: st...@duboce.net To: hbase-user@hadoop.apache.org 2010/4/3 Onur AKTAS onur.ak...@live.com: Hi all, I'm thinking of to switch from RDBMS to No-SQL database, but having lots of unanswered questions in my mind. Please correct me if I'm wrong, is Hbase not suitable for small environments? Like if we have 1 million records with no cluster or maybe 2 machines, is it not required? It'll work but thats not what its built for. You'll be better off sticking with your current RDMBS if your dataset if that size and going by the rest of your questions below. As far as I know, Hbase does not support querying, but having Pig to perform SQL like queries. It is multi dimensional hashmap distributed across the network to be accessed fast by key. So if we need to query something then we need to index it by ourselves. Yes. 1) If we have a user list, and a potential Give me all people above/beyond age 30 query, then do we need to create an index from the beginning of the first data as: above_30_list : value: [ A, B, C ]beyond_30_list :value: [ X, Y, Z ] ? Yes. Or if you can tolerate getting answer offline, run a mapreduce against the table. Or, if this is the only query you'll be running , think about how you could design the primary key so you can answer this question: e.g. userid_age. 2) What if we need just people at age 45. Then do we need to get all above_30 and scan each of them one by one? 3) If we need so many various queries, then should we create such keys as I wrote above for all potential queries? And entering the data to all that indexes when inserting. Effectively yes. 4) Parallelizing across clusters to share scanning is what HBase or Map Reduce technique does to solve this issue? In short words, I'm willing to switch Hbase for my applications, and wondering how can I do all these kind of operations in HBase with better performance than I do in RDBMSs. HBase is about scaling. To achieve scale, the model is changed. Moving your RDBMS schema to hbase will take some thought and not all will make it across. For a considered thesis on nosql vs rdbms modeling, see http://j.mp/2PjPB. St.Acka Thanks so much. _ Yeni Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin. http://windows.microsoft.com/shop _ Yeni Windows 7: Gündelik işlerinizi basitleştirin. Size en uygun bilgisayarı bulun. http://windows.microsoft.com/shop
Re: Beginner question about querying records
2010/4/3 Onur AKTAS onur.ak...@live.com: Hi all, I'm thinking of to switch from RDBMS to No-SQL database, but having lots of unanswered questions in my mind. Please correct me if I'm wrong, is Hbase not suitable for small environments? Like if we have 1 million records with no cluster or maybe 2 machines, is it not required? It'll work but thats not what its built for. You'll be better off sticking with your current RDMBS if your dataset if that size and going by the rest of your questions below. As far as I know, Hbase does not support querying, but having Pig to perform SQL like queries. It is multi dimensional hashmap distributed across the network to be accessed fast by key. So if we need to query something then we need to index it by ourselves. Yes. 1) If we have a user list, and a potential Give me all people above/beyond age 30 query, then do we need to create an index from the beginning of the first data as: above_30_list : value: [ A, B, C ]beyond_30_list :value: [ X, Y, Z ] ? Yes. Or if you can tolerate getting answer offline, run a mapreduce against the table. Or, if this is the only query you'll be running , think about how you could design the primary key so you can answer this question: e.g. userid_age. 2) What if we need just people at age 45. Then do we need to get all above_30 and scan each of them one by one? 3) If we need so many various queries, then should we create such keys as I wrote above for all potential queries? And entering the data to all that indexes when inserting. Effectively yes. 4) Parallelizing across clusters to share scanning is what HBase or Map Reduce technique does to solve this issue? In short words, I'm willing to switch Hbase for my applications, and wondering how can I do all these kind of operations in HBase with better performance than I do in RDBMSs. HBase is about scaling. To achieve scale, the model is changed. Moving your RDBMS schema to hbase will take some thought and not all will make it across. For a considered thesis on nosql vs rdbms modeling, see http://j.mp/2PjPB. St.Acka Thanks so much. _ Yeni Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin. http://windows.microsoft.com/shop
Re: More about LogFlusher
On Fri, Apr 2, 2010 at 10:59 AM, ChingShen chingshenc...@gmail.com wrote: Thanks, Jean-Daniel I haven't patched yet, but what intention does it just write a marker into the file? IIRC, its so that if the file is corrupted, the parse can be picked up at other side of the corrupt section as soon as the parse trips over the next marker. St.Ack
Re: TableMapper and getSplits
Splitting a table on its Regions makes most sense when one table only involved. For your case, just override the splitter and make different split objects. As to the 'underloaded' hbase when one task per region, I'd say try it first. If many regions on the one regionserver, could make for a decent load on the regionserver hosting. Good luck, St.Ack On Fri, Apr 2, 2010 at 12:19 PM, Geoff Hendrey ghend...@decarta.com wrote: Hello, I have subclassed TableInputFormat and TableMapper. My job needs to read from two tables (one row from each) during its map method. the reduce method needs to write out to a table. For both the reads and the writes, I am using simple Get and Put respectively with autoflush true. One problem I see is that the number of map tasks that I get with HBase is limited to the number of regions in the table. This seems to make the job slower than it would be if I had many more mappers. Could I improve the situation by overriding getSplits so that I could have many more mappers? I saw the following doc'd in TableMapReduceUtil: Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table. Is there some reason one would want to insure that the number of tasks doesn't exceed the number of regions? It just seems to me that having one region serv only a single task would result in an underloaded HBase. Thoughts? -geoff
Re: come to HUG10!
HBaseWorld. HBaseDisneyLand! St.Ack On Fri, Apr 2, 2010 at 2:31 PM, Jonathan Gray jg...@facebook.com wrote: Three cheers for Andrew and Trend Micro! This is very awesome. HBaseCon? HBase Summit? -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Friday, April 02, 2010 11:39 AM To: hbase-user@hadoop.apache.org Subject: come to HUG10! We are holding an all day event (noon - 9pm(ish)) on Monday April 19th, for the 10th HBase Users' Group meetup: Hackathon: http://www.meetup.com/hackathon/ Meetup: http://www.meetup.com/hbaseusergroup/calendar/12689490/ There is space available for 30 at the Hackathon (8 RSVP so far) and 100 at the meetup (67 RSVP so far). Consider coming out to one or both functions to meet the HBase developers and user community, to network, or just to learn more about HBase and Hadoop. Come to the Hackathon if you are a HBase developer or power user or committer or have interest in becoming one; bring your laptop, coding skills, and creativity. We have arranged to host HUG10 at the Sheraton San Jose, in Milpitas. Sheraton San Jose 1801 Barber Lane Milpitas, CA 95035 (408)-943-0600 This is a little south for many HBasers or would-be HBasers, but in exchange for the drive we have arranged: - Private salon for hackathon and meetup - Classroom for 30 from noon-5PM for the hackathon with catered lunch buffet at 1PM - Meetup for 100 from 6PM-9PM(ish), catered dinner at 8PM (southwest buffet), cash bar, and use of an outdoor terrace - Free wireless Internet - Special room rate of $135/night Best regards, Andrew Purtell apurt...@apache.org andrew_purt...@trendmicro.com
Re: PerformanceEvaluation times
sequentialWrite 2 makes a job of two clients (two tasks only) each doing 1M rows. Your job only has 2 tasks total, right? My guess is you are paying MR overhead (though 10k seconds is excessive -- something else is going on). You could try sequentialWrite 20 (20 tasks each writing 1M rows). Also, set your cluster to have 1 map and 1 reduce per slave only. MR can impinge on DN and RS stealing i/o, RAM. You don't have that much RAM so start with a small number of slots. St.Ack On Thu, Apr 1, 2010 at 3:07 AM, Michael Dalton mwdal...@gmail.com wrote: Hi, I have an issue I've been running into with the Performance Evaluation results on my cluster. We have a cluster with 5 slaves, quad-core machines with 8GB RAM/2x1TB disk. There are 4 map and 4 reduce slots per slave. The MapReduce-related tests seem to be running really slow. For example, sequentialWrite 1 (no MapReduce) takes 335 seconds to insert 1 million rows. Running PerfEval with sequentialWrite 2 --nomapred takes approximately 750 seconds. However, sequentialWrite 2 with MapReduce enabled takes 9483 seconds to finish the 20 Map tasks, over 10x longer than the --nomapred version. I do have jvm reuse set to -1, but I don't see why this should drastically increase MR latency. Am I missing some tuning or configuration parameter, or has anyone seen a drastic drop in performance when executing PerformanceEvaluation in MapReduce mode? I understand this is a bit of degenerate case for MapReduce, since PerformanceEvaluation generates its row values at runtime in memory, but these numbers seems a bit excessive at first glance. Thanks, Best regards, Mike
Re: why hbase using much more space than actual ?
Even after major compacting it all? hbase major_compact TABLENAME .. then wait a while or just leave the cluster up 24 hours and do your measurement again. What did you du? du /hbase or du /hbase/TABLENAME? The former will include all WAL logs still outstanding. If size is a concern, run w/ lzo. See wiki page for how-to. St.Ack On Tue, Mar 30, 2010 at 10:45 PM, Chen Bangzhong bangzh...@gmail.com wrote: Hi, ALL I am benchmarking HBase. I found that HBase used much more space than actual size. Here is my test environment. One NameNode Server One JobTracker Server (Secondary NameNode also on this machine) One DataNode dfs.replication set to 1 property namedfs.replication/name value1/value /property My HBase Cluster includes one Master, one region server and one zookeeper on 3 servers. I used the example code in HBase documentation to fill the test table. From hadoop, I found that the space used is about 3 times the actual size. for example, I wrote 10k records to the table, each record is about 20k, the actual size would be 2G. But from hadoop du command, the size used is more than 6G. I don't know if this is by design? Or my configuration is wrong. thanks
Re: could not be reached after 1 tries
2010/3/30 y_823...@tsmc.com: I am a veggie ^_^ That's a slogan; It urges people to eat more veggetables on Monday for saving our planet. I'm down w/ that (smile). St.Ack
Re: Is NotServingRegionException really an Exception?
I always thought that the throwing of an exception to signal moved region was broke if only for the reason that it disturbing to new users. See https://issues.apache.org/jira/browse/HBASE-72 Would be nice to change it. I don't think it easy though. We'd need to rig the RPC so calls were enveloped or some such so we could pass status messages along with (or instead of) a query results. St.Ack On Wed, Mar 31, 2010 at 8:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Wed, Mar 31, 2010 at 11:02 AM, Gary Helmling ghelml...@gmail.com wrote: Well I would still view it as an exceptional condition. The client asked for data back from a server that does not own that data. Sending back an exception seems like the appropriate response, to me at least. It's just an exceptional condition that's allowed to happen in favor of the optimization of caching region locations in memory on the client. I could see the reporting of the exception being misleading though if it's being logged at an error or warn level when it's a normal part of operations. What's the logging level of the messages? On Wed, Mar 31, 2010 at 10:51 AM, Al Lias al.l...@gmx.de wrote: Am 31.03.2010 16:47, schrieb Gary Helmling: NotServingRegionException is a normal part of operations when regions transition (ie due to splits). It's how the region server signals back to the client that it needs to re-lookup the region location in .META. (which is normally cached in memory by the client, so can become stale). I'm sure it can also show up as a symptom of other problems, but if you're not seeing any other issues, then it's nothing to be concerned about. Thx Gary, this is my point: I see this many times in the (production) logs when it is actually nothing to worry about. Should'nt this rather be a normal response of a region server, instead an Exception? Al On Wed, Mar 31, 2010 at 7:38 AM, Al Lias al.l...@gmx.de wrote: As I do see this Exception really often in our logs. I wonder if this indicates a regular thing (within splits etc) or if this is something that should not normally happen. I see it often in Jira as a reason for something else that fails, but for a regular client request, where the client not perfectly up-to-date with region information it looks as something normal. Am I right here? Al The LDAP api's throw a ReferralException when you try to update a read only slave, so heir is a precedence for that. But true that an exception may be strong for something that is technically a warning.
Re: Using SPARQL against HBase
Writes would update your in-memory graph and the backing hbase store? The in-memory graph would hold all data or just metadata? You might look at IHBase in the 'indexed' contrib to see how it loads an index on region open (It subclasses hbase so it can catch key transistions) Why HBase and not a native graph database? Yours, St.Ack On Wed, Mar 31, 2010 at 8:27 AM, Basmajian, Raffi rbasmaj...@oppenheimerfunds.com wrote: We are currently researching how to use SPARQL against data in Hbase. I understand the use of Get and Scan classes in the Hbase API, but these search classes do not return data in the same way SPARQL against RDF data returns it. My colleagues and I were discussing that these types of search results will require creating an in-memory graph first from Hbase, then using SPARQL against that graph. We are not sure how this is accomplished. Any advice would help, thank you -RNY -- This e-mail transmission may contain information that is proprietary, privileged and/or confidential and is intended exclusively for the person(s) to whom it is addressed. Any use, copying, retention or disclosure by any person other than the intended recipient or the intended recipient's designees is strictly prohibited. If you are not the intended recipient or their designee, please notify the sender immediately by return e-mail and delete all copies. OppenheimerFunds may, at its sole discretion, monitor, review, retain and/or disclose the content of all email communications. ==
Re: could not be reached after 1 tries
Was it busy at the time the client was trying to access it? Do subsequent accesses to this regionserver work? St.Ack 2010/3/29 y_823...@tsmc.com: Hi, One of my region server is still listed on the webpage Region Server, but it raised folloing message while running my program. 10/03/30 13:11:18 INFO ipc.HbaseRPC: Server at /10.81.47.43:60020 could not be reached after 1 tries, giving up Any suggestion? Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) john smith js1987.sm...@gmaTo: hbase-user@hadoop.apache.org il.com cc: (bcc: Y_823910/TSMC) Subject: Re: Region assignment in Hbase 2010/03/30 10:49 AM Please respond to hbase-user J-D thanks for your reply. I have some doubts which I posted inline . Kindly help me On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Inline. J-D On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com wrote: Hi all, I read the issue HBase-57 ( https://issues.apache.org/jira/browse/HBASE-57 ) . I don't really understand the use of assigning regions keeping DFS in mind. Can anyone give an example usecase showing its advantages A region is composed of files, files are composed of blocks. To read data, you need to fetch those blocks. In HDFS you normally have access to 3 replicas and you fetch one of them over the network. If one of the replica is on the local datanode, you don't need to go through the network. This means less network traffic and better response time. Is this the scenario that occurs for catering the read requests? In the thread Data distribution in HBase , one of the people mentioned that the data hosted by the Region Server may not actually reside on the same machine . So when asked for data , it fetches from the system containing the data. Am I right? Why is the data hosted by a Region Server doesn't lie on the same machine . Doesn't the name name Region Server imply that it holds all the regions it contains? Is it due to splits or restarting the HBase ? Can map-reduce exploit it's advantage in any way (if data is distributed in the above manner) or is it just the read-write performance that gets improved . MapReduce works in the exact same way, it always tries to put the computation next to where the data is. I recommend reading the MapReduce tutorial http://hadoop.apache.org/common/docs/r0.20.0 /mapred_tutorial.html#Overview Also the same case Applies here I guess . When a map is run on a Region Server, It's data may not actually lie on the same machine . So it fetches from the machine containing it. This reduces the data locality ! Can some one please help me in understanding this. Regards JS --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---
Re: could not be reached after 1 tries
Well, what do the logs tell about that servers state? StAck 2010/3/30 y_823...@tsmc.com: Was it busy at the time the client was trying to access it? No Do subsequent accesses to this regionserver work? No Showing that message all the time. Thanks Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) saint@gmail.c om To: hbase-user@hadoop.apache.org Sent by: cc: (bcc: Y_823910/TSMC) saint@gmail.cSubject: Re: could not be reached after 1 tries om 2010/03/30 03:44 PM Please respond to hbase-user Was it busy at the time the client was trying to access it? Do subsequent accesses to this regionserver work? St.Ack 2010/3/29 y_823...@tsmc.com: Hi, One of my region server is still listed on the webpage Region Server, but it raised folloing message while running my program. 10/03/30 13:11:18 INFO ipc.HbaseRPC: Server at /10.81.47.43:60020 could not be reached after 1 tries, giving up Any suggestion? Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) john smith js1987.sm...@gmaTo: hbase-user@hadoop.apache.org il.com cc: (bcc: Y_823910/TSMC) Subject: Re: Region assignment in Hbase 2010/03/30 10:49 AM Please respond to hbase-user J-D thanks for your reply. I have some doubts which I posted inline . Kindly help me On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Inline. J-D On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com wrote: Hi all, I read the issue HBase-57 ( https://issues.apache.org/jira/browse/HBASE-57 ) . I don't really understand the use of assigning regions keeping DFS in mind. Can anyone give an example usecase showing its advantages A region is composed of files, files are composed of blocks. To read data, you need to fetch those blocks. In HDFS you normally have access to 3 replicas and you fetch one of them over the network. If one of the replica is on the local datanode, you don't need to go through the network. This means less network traffic and better response time. Is this the scenario that occurs for catering the read requests? In the thread Data distribution in HBase , one of the people mentioned that the data hosted by the Region Server may not actually reside on the same machine . So when asked for data , it fetches from the system containing the data. Am I right? Why is the data hosted by a Region Server doesn't lie on the same machine . Doesn't the name name Region Server imply that it holds all the regions it contains? Is it due to splits or restarting the HBase ? Can map-reduce exploit it's advantage in any way (if data is distributed in the above manner) or is it just the read-write performance that gets improved . MapReduce works in the exact same way, it always tries to put the computation next to where the data is. I recommend reading the MapReduce tutorial http://hadoop.apache.org/common/docs/r0.20.0 /mapred_tutorial.html#Overview Also the same case Applies here I guess . When a map is run on a Region Server, It's data may not actually lie on the same machine . So it fetches from the machine containing it. This reduces the data locality ! Can some one please help me in understanding this. Regards JS --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. --- --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any
Re: Contrib tableindexed package vs. custom indexes
Sorry George for lack of response. I think it probably a bit of 3.) and then 4.) which is that you know cleanly the options so there is nothing really to add. My sense is that when fellas say roll your own indexes, between the lines I think what they are saying is that they do not want to do the two updates transactionally -- that they do not want to pay the ITHBase tax -- and are ok w/ losing a index add or two. The preso at ApacheCon was a callout to THBase and ITHBase recognizing that its been a contrib for a good while now and that perhaps its graduated beyond our designation of it as 'experimental'. St.Ack On Mon, Mar 29, 2010 at 12:51 PM, George Stathis gstat...@gmail.com wrote: Hi folks, I've seen some people around the list that recommend rolling one's own indexes. Others say to just go with the org.apache.hadoop.hbase.client.tableindexed package. A quick scan of the wiki does not reveal any best practices. Presentations from the devs such as the Oakland ApacheCon slides point to the contrib package. Some of the comments in the list seem to note that IndexedTable is not very performant; then again, I would assume that a custom index would have to wrap any table+index operations in a transaction anyway. So unless folks forego transactions when rolling their own indexes, I don't see how a custom implementation could be that much faster. What do the majority of people here do for indexing? Is there a generally accepted good middle-of-the-road approach offering an acceptable compromise between performance and maintainability? I must admit that rolling our own indexes does not seem like a viable long term approach to me (from a maintenance POV). I'm interested in people's opinion. -GS
Re: could not be reached after 1 tries
If you are still on HBase version : 0.20.2, r834515, you should update. There are a few NPE fixes in 0.20.3 and if I look at the code in the branch around where you are getting the NPE below, its got lots of protections against what you are seeing below. 週一無肉日吃素救地球(Meat Free Monday Taiwan) What is the above about? Thanks, St.Ack 2010/3/30 y_823...@tsmc.com: I found an error in the log. 2010-03-30 10:10:32,135 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.NullPointerException 2010-03-30 10:10:32,136 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 45 on 60020, call delete([...@7a914631, row=N900W06LS1110_810593676N900W06LS111, ts=9223372036854775807, families={}) from 10.81.47.36:41022: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859) at org.apache.hadoop.hbase.regionserver.HRegionServer.delete(HRegionServer.java:2028) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) Caused by: java.lang.NullPointerException Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) saint@gmail.c om To: hbase-user@hadoop.apache.org Sent by: cc: (bcc: Y_823910/TSMC) saint@gmail.cSubject: Re: could not be reached after 1 tries om 2010/03/31 12:15 AM Please respond to hbase-user Well, what do the logs tell about that servers state? StAck 2010/3/30 y_823...@tsmc.com: Was it busy at the time the client was trying to access it? No Do subsequent accesses to this regionserver work? No Showing that message all the time. Thanks Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) saint@gmail.c om To: hbase-user@hadoop.apache.org Sent by: cc: (bcc: Y_823910/TSMC) saint@gmail.cSubject: Re: could not be reached after 1 tries om 2010/03/30 03:44 PM Please respond to hbase-user Was it busy at the time the client was trying to access it? Do subsequent accesses to this regionserver work? St.Ack 2010/3/29 y_823...@tsmc.com: Hi, One of my region server is still listed on the webpage Region Server, but it raised folloing message while running my program. 10/03/30 13:11:18 INFO ipc.HbaseRPC: Server at /10.81.47.43:60020 could not be reached after 1 tries, giving up Any suggestion? Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) john smith js1987.sm...@gmaTo: hbase-user@hadoop.apache.org il.com cc: (bcc: Y_823910/TSMC) Subject: Re: Region assignment in Hbase 2010/03/30 10:49 AM Please respond to hbase-user J-D thanks for your reply. I have some doubts which I posted inline . Kindly help me On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Inline. J-D On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com wrote: Hi all, I read the issue HBase-57 ( https://issues.apache.org/jira/browse/HBASE-57 ) . I don't really understand the use of assigning regions keeping DFS in mind. Can anyone give an example usecase showing its advantages A region is composed of files, files are composed of blocks. To read data, you need to fetch those blocks. In HDFS you normally have access to 3 replicas and you fetch one of them over the network. If one of the replica is on the local datanode, you don't need to go through the network. This means less network traffic and better response time. Is this the scenario that occurs for catering the read requests? In the thread Data distribution in HBase , one of the people mentioned that the data hosted by the Region Server may not actually reside on the same machine . So
Re: Multi ranges Scan
Can you use a filter to do this? If no pattern to the excludes then it's tougher. How do you know what to exclude? It's in a repository somewhere? Add a filter to query this repo? On Mar 25, 2010, at 4:07 PM, Andriy Kolyadenko cryp...@mail.saturnfans.com wrote: Ok, it would work for regions pruning. And what about actual rows pruning inside single region? Do you have any ideas how to implement it? --- Stack wrote: --- I think you need to make a custom splitter for your mapreduce job, one that makes splits that align with the ranges you'd have your job run over. A permutation on HBASE-2302 might work for you. St.Ack On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko cryp...@mailx.ru wrote: Hi all, maybe somebody could give me advice in the following situation: Currently HBase Scan interface provides ability to set up only first and last rows for MR scanning. Is it any way to get multiple ranges into the map input? For example let's assume I have following table: key value 1 v1 2 v2 3 v3 4 v4 5 v5 What I need is to get for example [1,2) and [4,5) ranges as input for my Map task. Actually I need this for the performance optimization. Any advice? Thanks. _ Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/
Re: Data Loss During Bulk Load
Is this mr? If so do u have hdfs-127 applied to your cluster? On Mar 24, 2010, at 11:22 AM, Nathan Harkenrider nathan.harkenri...@gmail.com wrote: Thanks Ryan. We have this config setting in place and are currently running an insert of 40 million rows into an empty pair of tables. The job has inserted 25 million rows so far, and we are not seeing any failed compact/split errors in the log. I'll report back after the import is complete and we've verified integrity of the data. Regards, Nathan On Wed, Mar 24, 2010 at 11:12 AM, Ryan Rawson ryano...@gmail.com wrote: You'll want this one: property namedfs.datanode.socket.write.timeout/name value0/value /property A classic standby from just over a year ago. It should be in the recommended config - might not be anymore, but I am finding it necessary now. On Wed, Mar 24, 2010 at 11:06 AM, Rod Cope rod.c...@openlogic.com wrote: This describes my situation, too. I never could get rid of the SocketTimeoutException's, even after dozens of hours of research and applying every tuning and configuration suggestion I could find. Rod On 3/24/10 Wednesday, March 24, 201011:45 AM, Tuan Nguyen tua...@gmail.com wrote: Hi Nathan, We recently run a performance test again hbase 0.20.3 and hadoop 0.20.2. We have a quite similar problem to your. At the first scan test , we notice that we loose some data on certain column in certain row and out log have the errors such Error Recovery for block, Coul Not get the block, IOException, SocketTimeoutException: 48 millis timeout... And the test completely fail at the middle. After various tuning the GC, caching, xcievier... We can finish the test without any data loss. Our log have only SocketTimeoutException: 48 millis timeout error left. Tuan Nguyen! -- Rod Cope | CTO and Founder rod.c...@openlogic.com Follow me on Twitter @RodCope 720 240 4501| phone 720 240 4557| fax 1 888 OpenLogic| toll free www.openlogic.com Follow OpenLogic on Twitter @openlogic
Re: Cannot open filename Exceptions
So, for sure ugly stuff is going on. I filed https://issues.apache.org/jira/browse/HBASE-2365. It looks like we're doubly assigning a region. Can you confirm that 209 lags behind the master (207) by about 25 seconds? Are you running NTP on these machines so they sync their clocks? With DEBUG enabled have you been able to reproduce? That said there might be enough in these logs to go on if you can confirm the above. Thanks for your patience Zheng. St.Ack On Thu, Mar 18, 2010 at 11:43 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, I must say thank you, for your patience too. I'm sorry for that you had tried for many times but the logs you got were not that usful. Now I have turn the logging to debug level, so if we get these exceptions again, I will send you debug logs. Anyway, I still upload the logs you want to rapidshare, although they are not in debug level. The urls: http://rapidshare.com/files/365292889/hadoop-root-namenode-cactus207.log.2010-03-15.html http://rapidshare.com/files/365293127/hbase-root-master-cactus207.log.2010-03-15.html http://rapidshare.com/files/365293238/hbase-root-regionserver-cactus208.log.2010-03-15.html http://rapidshare.com/files/365293391/hbase-root-regionserver-cactus209.log.2010-03-15.html http://rapidshare.com/files/365293488/hbase-root-regionserver-cactus210.log.2010-03-15.html For sure you've upped xceivers on your hdfs cluster and you've upped the file descriptors as per the 'Getting Started'? (Sorry, have to ask). Before I got your mail, we didn't set the properties you mentioned, because we didn't got the too many open files or something which are mentioned in getting start docs. But now I have upped these properties. We'll see what will happen. If you need more infomations, just tell me. Thanks again, LvZheng. 2010/3/19 Stack st...@duboce.net Yeah, I had to retry a couple of times (Too busy; try back later -- or sign up premium service!). It would have been nice to have wider log snippets. I'd like to have seen if the issue was double assignment. The master log snippet only shows the split. Regionserver 209's log is the one where the interesting stuff is going on around this time, 2010-03-15 16:06:51,150, but its not in the provided set. Neither are you running at DEBUG level so it'd be harder to see what is up even if you provided it. Looking in 208, I see a few exceptions beyond the one you paste below. For sure you've upped xceivers on your hdfs cluster and you've upped the file descriptors as per the 'Getting Started'? (Sorry, have to ask). Can I have more of the logs? Can I have all of the namenode log, all of the master log and 209's log? This rapidshare thing is fine with me. I don't mind retrying. Sorry it took me a while to get to this. St.Ack On Wed, Mar 17, 2010 at 8:32 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, Sorry. It's taken me a while. Let try and get to this this evening Is it downloading the log files what take you a while? I m sorry, I used to upload files to skydrive, but now we cant access the website. Is there any netdisk or something you can download fast? I can upload to it. LvZheng 2010/3/18 Stack saint@gmail.com Sorry. It's taken me a while. Let try and get to this this evening Thank you for your patience On Mar 17, 2010, at 2:29 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, Did you receive my mail?It looks like you didnt. LvZheng 2010/3/16 Zheng Lv lvzheng19800...@gmail.com Hello Stack, I have uploaded some parts of the logs on master, regionserver208 and regionserver210 to: http://rapidshare.com/files/363988384/master_207_log.txt.html http://rapidshare.com/files/363988673/regionserver_208_log.txt.html http://rapidshare.com/files/363988819/regionserver_210_log.txt.html I noticed that there are some LeaseExpiredException and 2010-03-15 16:06:32,864 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region ... before 17 oclock. Did these lead to the error? Why did these happened? How to avoid these? Thanks. LvZheng 2010/3/16 Stack st...@duboce.net Maybe just the master log would be sufficient from around this time to figure the story. St.Ack On Mon, Mar 15, 2010 at 10:04 PM, Stack st...@duboce.net wrote: Hey Zheng: On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, After we got these exceptions, we restart the cluster and restarted the job that failed, and the job succeeded. Now when we access /hbase/summary/1491233486/metrics/5046821377427277894, we got Cannot access /hbase/summary/1491233486/metrics/5046821377427277894: No such file or directory. . So, that would seem to indicate that the reference was in memory only.. that file was not in filesystem. You could
Re: The occasion to add region server
You saw no difference. Before starting job all regionservers had equal amount of regions. Were regionservers loaded while the job ran. Try with 5 regionservers. If same for sure something is of. On Mar 23, 2010, at 6:02 PM, y_823...@tsmc.com wrote: Hi, Our cluster with 20 machines(4 core, 12G Ram, 1U server) HBase Version 0.20.2, r834515 ZK :3 DataNode : 20 Total Region : 2088 Region Server: 10 -- Client Connection: 20 My job took time : 1264 sec ZK :3 DataNode : 20 Total Region : 2088 Region Server: 15 -- Client Connection: 20 My job took time : 1257 sec ZK :3 DataNode : 20 Total Region : 2088 Region Server: 20 -- Client Connection: 20 My job took time : 1267 sec According to the above result, it didn't get the performance enhance after adding region servers. why? I wonder when is the best occasion to add extra region server. In my machine's spec ,maybe a region server will always get a good performance with regions under 300. Any ideas, thanks. Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) --- --- - TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. --- --- -
Re: Cannot open filename Exceptions
On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, So, for sure ugly stuff is going on. I filed https://issues.apache.org/jira/browse/HBASE-2365. It looks like we're doubly assigning a region. Can you tell me how this happened in detail? Thanks a lot. Yes. Splits are run by the regionserver. It figures a region needs to be split and goes ahead closing the parent and creating the daughter regions. It then adds edits to the meta table offlining the parent and inserting the two new daughter regions. Next it sends a message to the master telling it that a region has been split. The message contains names of the daughter regions. On receipt of the message, the master adds the new daughter regions to the unassigned regions list so they'll be passed out the next time a regionserver checks in. Concurrently, the master is running a scan of the meta table every minute making sure all is in order. One thing it does is if it finds unassigned regions, it'll add them to the unassigned regions (this process is what gets all regions assigned after a startup). In your case, whats happening is that there is a long period between the add of the new split regions to the meta table and the report of split to the master. During this time, the master meta scan ran, found one of the daughters and went and assigned it. Then the split message came in and the daughter was assigned again! There was supposed to be protection against this happening IIRC. Looking at responsible code, we are trying to defend against this happening in ServerManager: /* * Assign new daughter-of-a-split UNLESS its already been assigned. * It could have been assigned already in rare case where there was a large * gap between insertion of the daughter region into .META. by the * splitting regionserver and receipt of the split message in master (See * HBASE-1784). * @param hri Region to assign. */ private void assignSplitDaughter(final HRegionInfo hri) { MetaRegion mr = this.master.regionManager.getFirstMetaRegionForRegion(hri); Get g = new Get(hri.getRegionName()); g.addFamily(HConstants.CATALOG_FAMILY); try { HRegionInterface server = master.connection.getHRegionConnection(mr.getServer()); Result r = server.get(mr.getRegionName(), g); // If size 3 -- presume regioninfo, startcode and server -- then presume // that this daughter already assigned and return. if (r.size() = 3) return; } catch (IOException e) { LOG.warn(Failed get on + HConstants.CATALOG_FAMILY_STR + ; possible double-assignment?, e); } this.master.regionManager.setUnassigned(hri, false); } So, the above is not working in your case for some reason. I'll take a look but I'm not sure I can figure it w/o DEBUG (thanks for letting me know about the out-of-sync clocks... Now I can have more faith in what the logs are telling me). With DEBUG enabled have you been able to reproduce? These days the exception did not appera again, if it would, I'll show you the logs. For sure, if you come across it again, I'm interested. Thanks Zheng, St.Ack
Re: Data Loss During Bulk Load
For sure each record in the input data is being uploaded with a unique key? For example, if same rowid and column and you are asking the regionserver to supply the timestamp, if you add two cells with same row+column coordinates, they'll both end up with the same row/family/qualifier/timestamp key. When you do your count, we'll only see the last instance added. St.Ack On Mon, Mar 22, 2010 at 8:15 AM, Nathan Harkenrider nathan.harkenri...@gmail.com wrote: Thanks Ryan. We currently have the xceiver count set to 16k (not sure if this is too high) and the fh max is 32k, and are still seeing the data loss issue. I'll dig through the datanode logs for errors and report back. Regards, Nathan On Sun, Mar 21, 2010 at 7:11 PM, Ryan Rawson ryano...@gmail.com wrote: Maybe you are having HDFS capacity issues? Check your datanode logs for any exceptions. While you are at it, double check the xceiver count is set high (2048 is a good value) and the ulimit -n (fh max) is also reasonably high - 32k should do it. I recently ran an import of 36 hours and perfectly imported 24 billion rows into 2 tables and the row counts between the tables lined up exactly. PS: one other thing, in your close() method of your map reduce, you call HTable#flushCommits() right? right? On Sun, Mar 21, 2010 at 3:50 PM, Nathan Harkenrider nathan.harkenri...@gmail.com wrote: Hi All, I'm currently running into data loss issues when bulk loading data into HBase. I'm loading data via a Map/Reduce job that is parsing XML and inserting rows into 2 HBase tables. The job is currently configured to run 30 mappers concurrently (3 per node) and is inserting at a rate of approximately 6000 rows/sec. The Map/Reduce job appears to run correctly, however, when I run the HBase rowcounter job on the tables afterwards the row count is less than expected. The data loss is small percentage wise (~200,000 rows out of 80,000,000) but concerning nevertheless. I managed to locate the following errors in the regionserver logs related to failed compactions and/or splits. http://pastebin.com/5WjDpS9F I'm running HBase 0.20.3 and Cloudera CDH2, on CentOS 5.4. The cluster is comprised of 11 machines, 1 master and 10 region servers. Each machine is 8 cores, 8GB ram. A Any advice is appreciated. Thanks, Nathan Harkenrider nathan.harkenri...@gmail.com
Re: Data Loss During Bulk Load
On Mon, Mar 22, 2010 at 1:37 PM, Stack st...@duboce.net wrote: ...For example, if same rowid and column and you are asking the regionserver to supply the timestamp, if you add two cells with same row+column coordinates, they'll both end up with the same row/family/qualifier/timestamp key. Sorry, I left the 'in the same millisecond' clause out of the above sentence. St.Ack
Re: Data Loss During Bulk Load
On Sun, Mar 21, 2010 at 3:50 PM, Nathan Harkenrider nathan.harkenri...@gmail.com wrote: I managed to locate the following errors in the regionserver logs related to failed compactions and/or splits. http://pastebin.com/5WjDpS9F Is there anything else earlier in the logs on why the fail happened? You might try running one MR task per node rather than 3. You only have 8G of RAM so three concurrent children are taking resources from running datanodes and regionservers. St.Ack
Re: Cannot open filename Exceptions
Yeah, I had to retry a couple of times (Too busy; try back later -- or sign up premium service!). It would have been nice to have wider log snippets. I'd like to have seen if the issue was double assignment. The master log snippet only shows the split. Regionserver 209's log is the one where the interesting stuff is going on around this time, 2010-03-15 16:06:51,150, but its not in the provided set. Neither are you running at DEBUG level so it'd be harder to see what is up even if you provided it. Looking in 208, I see a few exceptions beyond the one you paste below. For sure you've upped xceivers on your hdfs cluster and you've upped the file descriptors as per the 'Getting Started'? (Sorry, have to ask). Can I have more of the logs? Can I have all of the namenode log, all of the master log and 209's log? This rapidshare thing is fine with me. I don't mind retrying. Sorry it took me a while to get to this. St.Ack On Wed, Mar 17, 2010 at 8:32 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, Sorry. It's taken me a while. Let try and get to this this evening Is it downloading the log files what take you a while? I m sorry, I used to upload files to skydrive, but now we cant access the website. Is there any netdisk or something you can download fast? I can upload to it. LvZheng 2010/3/18 Stack saint@gmail.com Sorry. It's taken me a while. Let try and get to this this evening Thank you for your patience On Mar 17, 2010, at 2:29 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, Did you receive my mail?It looks like you didnt. LvZheng 2010/3/16 Zheng Lv lvzheng19800...@gmail.com Hello Stack, I have uploaded some parts of the logs on master, regionserver208 and regionserver210 to: http://rapidshare.com/files/363988384/master_207_log.txt.html http://rapidshare.com/files/363988673/regionserver_208_log.txt.html http://rapidshare.com/files/363988819/regionserver_210_log.txt.html I noticed that there are some LeaseExpiredException and 2010-03-15 16:06:32,864 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region ... before 17 oclock. Did these lead to the error? Why did these happened? How to avoid these? Thanks. LvZheng 2010/3/16 Stack st...@duboce.net Maybe just the master log would be sufficient from around this time to figure the story. St.Ack On Mon, Mar 15, 2010 at 10:04 PM, Stack st...@duboce.net wrote: Hey Zheng: On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, After we got these exceptions, we restart the cluster and restarted the job that failed, and the job succeeded. Now when we access /hbase/summary/1491233486/metrics/5046821377427277894, we got Cannot access /hbase/summary/1491233486/metrics/5046821377427277894: No such file or directory. . So, that would seem to indicate that the reference was in memory only.. that file was not in filesystem. You could have tried closing that region. It would have been interesting also to find history on that region, to try and figure how it came to hold in memory a reference to a file since removed. The messages about this file in namenode logs are in here: http://rapidshare.com/files/363938595/log.txt.html This is interesting. Do you have regionserver logs from 209, 208, and 210 for corresponding times? Thanks, St.Ack The job failed startted about at 17 o'clock. By the way, the hadoop version we are using is 0.20.1, the hbase version we are using is 0.20.3. Regards, LvZheng 2010/3/16 Stack st...@duboce.net Can you get that file from hdfs? ./bin/hadoop fs -get /hbase/summary/1491233486/metrics/5046821377427277894 Does it look wholesome? Is it empty? What if you trace the life of that file in regionserver logs or probably better, over in namenode log? If you move this file aside, the region deploys? St.Ack On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Everyone, Recently we often got these in our client logs: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 172.16.1.208:60020 for region summary,SITE_32\x01pt\x012010031400\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017, row 'SITE_32\x01pt\x012010031500\x01
Re: Multi ranges Scan
I think you need to make a custom splitter for your mapreduce job, one that makes splits that align with the ranges you'd have your job run over. A permutation on HBASE-2302 might work for you. St.Ack On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko cryp...@mailx.ru wrote: Hi all, maybe somebody could give me advice in the following situation: Currently HBase Scan interface provides ability to set up only first and last rows for MR scanning. Is it any way to get multiple ranges into the map input? For example let's assume I have following table: key value 1 v1 2 v2 3 v3 4 v4 5 v5 What I need is to get for example [1,2) and [4,5) ranges as input for my Map task. Actually I need this for the performance optimization. Any advice? Thanks. --- Миллионы анкет ждут Вас на на http://mylove.in.ua Немедленная регистрация здесь http://mylove.in.ua/my/reg.phtml Биржа ссылок, тысячи отзывов о нас в Рунете http://www.sape.ru/r.7fddbf83ee.php
Re: Cannot open filename Exceptions
Sorry. It's taken me a while. Let try and get to this this evening Thank you for your patience On Mar 17, 2010, at 2:29 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, Did you receive my mail?It looks like you didnt. LvZheng 2010/3/16 Zheng Lv lvzheng19800...@gmail.com Hello Stack, I have uploaded some parts of the logs on master, regionserver208 and regionserver210 to: http://rapidshare.com/files/363988384/master_207_log.txt.html http://rapidshare.com/files/363988673/regionserver_208_log.txt.html http://rapidshare.com/files/363988819/regionserver_210_log.txt.html I noticed that there are some LeaseExpiredException and 2010-03-15 16:06:32,864 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region ... before 17 oclock. Did these lead to the error? Why did these happened? How to avoid these? Thanks. LvZheng 2010/3/16 Stack st...@duboce.net Maybe just the master log would be sufficient from around this time to figure the story. St.Ack On Mon, Mar 15, 2010 at 10:04 PM, Stack st...@duboce.net wrote: Hey Zheng: On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, After we got these exceptions, we restart the cluster and restarted the job that failed, and the job succeeded. Now when we access /hbase/summary/1491233486/metrics/5046821377427277894, we got Cannot access /hbase/summary/1491233486/metrics/5046821377427277894: No such file or directory. . So, that would seem to indicate that the reference was in memory only.. that file was not in filesystem. You could have tried closing that region. It would have been interesting also to find history on that region, to try and figure how it came to hold in memory a reference to a file since removed. The messages about this file in namenode logs are in here: http://rapidshare.com/files/363938595/log.txt.html This is interesting. Do you have regionserver logs from 209, 208, and 210 for corresponding times? Thanks, St.Ack The job failed startted about at 17 o'clock. By the way, the hadoop version we are using is 0.20.1, the hbase version we are using is 0.20.3. Regards, LvZheng 2010/3/16 Stack st...@duboce.net Can you get that file from hdfs? ./bin/hadoop fs -get /hbase/summary/1491233486/metrics/5046821377427277894 Does it look wholesome? Is it empty? What if you trace the life of that file in regionserver logs or probably better, over in namenode log? If you move this file aside, the region deploys? St.Ack On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Everyone, Recently we often got these in our client logs: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 172.16.1.208:60020 for region summary,SITE_32\x01pt\x012010031400\x01\x25E7\x258C \x25AE\x25E5\x258E\x25BF \x25E5\ x2586\ x2580\ x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D \x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD \x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC \x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583-- \x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE \x2589\x25E5\x2585\x25A8\x25E7\x259A \x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D \x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA \x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B \x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC \x2581,1268640385017, row 'SITE_32\x01pt\x012010031500\x01\x2521\x25EF\x25BC \x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F \x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T \x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF \x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F \x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD \x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA \x25E5\x2599\x25A8\x25EF\x25BC\x258C \x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583-- \x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE \x2589\x25E5\x2585\x25A8\x25E7\x259A \x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D \x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA \x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B \x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581', but failed after 10 attempts. Exceptions: java.io.IOException: java.io.IOException: Cannot open filename /hbase/summary/1491233486/metrics/5046821377427277894 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo (DFSClient.java:1474) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode (DFSClient.java:1800) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo (DFSClient.java:1616) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read (DFSClient.java:1743) at java.io.DataInputStream.read(DataInputStream.java:132
Re: Cannot open filename Exceptions
Hey Zheng: On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, After we got these exceptions, we restart the cluster and restarted the job that failed, and the job succeeded. Now when we access /hbase/summary/1491233486/metrics/5046821377427277894, we got Cannot access /hbase/summary/1491233486/metrics/5046821377427277894: No such file or directory. . So, that would seem to indicate that the reference was in memory only.. that file was not in filesystem. You could have tried closing that region. It would have been interesting also to find history on that region, to try and figure how it came to hold in memory a reference to a file since removed. The messages about this file in namenode logs are in here: http://rapidshare.com/files/363938595/log.txt.html This is interesting. Do you have regionserver logs from 209, 208, and 210 for corresponding times? Thanks, St.Ack The job failed startted about at 17 o'clock. By the way, the hadoop version we are using is 0.20.1, the hbase version we are using is 0.20.3. Regards, LvZheng 2010/3/16 Stack st...@duboce.net Can you get that file from hdfs? ./bin/hadoop fs -get /hbase/summary/1491233486/metrics/5046821377427277894 Does it look wholesome? Is it empty? What if you trace the life of that file in regionserver logs or probably better, over in namenode log? If you move this file aside, the region deploys? St.Ack On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Everyone, Recently we often got these in our client logs: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 172.16.1.208:60020 for region summary,SITE_32\x01pt\x012010031400\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017, row 'SITE_32\x01pt\x012010031500\x01\x2521\x25EF\x25BC\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA\x25E5\x2599\x25A8\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581', but failed after 10 attempts. Exceptions: java.io.IOException: java.io.IOException: Cannot open filename /hbase/summary/1491233486/metrics/5046821377427277894 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1800) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1020) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:971) at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1304) at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1186) at org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo(HalfHFileReader.java:207) at org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile(StoreFileGetScan.java:80) at org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get(StoreFileGetScan.java:65) at org.apache.hadoop.hbase.regionserver.Store.get(Store.java:1461) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1731) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source