Re: Hadoop AWS module (Spark) is inventing a secret-ket each time

2017-03-08 Thread Jonhy Stack
I have sent the message there as well, I thought I would send it here as
well because im actually setting up the hadoopConf

On Wed, Mar 8, 2017 at 6:49 PM, Ravi Prakash <ravihad...@gmail.com> wrote:

> Sorry to hear about your travails.
>
> I think you might be better off asking the spark community:
> http://spark.apache.org/community.html
>
> On Wed, Mar 8, 2017 at 3:22 AM, Jonhy Stack <so.jo...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to read a s3 bucket from Spark and up until today Spark always
>> complain that the request return 403
>>
>> hadoopConf = spark_context._jsc.hadoopConfiguration()
>> hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
>> hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
>> hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AF
>> ileSystem")
>> logs = spark_context.textFile("s3a://mybucket/logs/*)
>>
>> Spark was saying  Invalid Access key [ACCESSKEY]
>>
>> However with the same ACCESSKEY and SECRETKEY this was working with
>> aws-cli
>>
>> aws s3 ls mybucket/logs/
>>
>> and in python boto3 this was working
>>
>> resource = boto3.resource("s3", region_name="us-east-1")
>> resource.Object("mybucket", "logs/text.py") \
>> .put(Body=open("text.py", "rb"),ContentType="text/x-py")
>>
>> so my credentials ARE invalid and the problem is definitely something
>> with Spark..
>>
>> Today I decided to turn on the "DEBUG" log for the entire spark and to my
>> suprise... Spark is NOT using the [SECRETKEY] I have provided but
>> instead... add a random one???
>>
>> 17/03/08 10:40:04 DEBUG request: Sending Request: HEAD
>> https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS
>> ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4
>> Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,
>> Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type:
>> application/x-www-form-urlencoded; charset=utf-8, )
>>
>> This is why it still return 403! Spark is not using the key I provide
>> with fs.s3a.secret.key but instead invent a random one EACH time (everytime
>> I submit the job the random secret key is different)
>>
>> For the record I'm running this locally on my machine (OSX) with this
>> command
>>
>> spark-submit --packages com.amazonaws:aws-java-sdk-pom
>> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
>>
>> Could some one enlighten me on this?
>>
>
>


Hadoop AWS module (Spark) is inventing a secret-ket each time

2017-03-08 Thread Jonhy Stack
Hi,

I'm trying to read a s3 bucket from Spark and up until today Spark always
complain that the request return 403

hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark was saying  Invalid Access key [ACCESSKEY]

However with the same ACCESSKEY and SECRETKEY this was working with aws-cli

aws s3 ls mybucket/logs/

and in python boto3 this was working

resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
.put(Body=open("text.py", "rb"),ContentType="text/x-py")

so my credentials ARE invalid and the problem is definitely something with
Spark..

Today I decided to turn on the "DEBUG" log for the entire spark and to my
suprise... Spark is NOT using the [SECRETKEY] I have provided but
instead... add a random one???

17/03/08 10:40:04 DEBUG request: Sending Request: HEAD
https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS
ACCESSKEY:**[RANDON-SECRET-KEY]**, User-Agent: aws-sdk-java/1.7.4
Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,
Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type:
application/x-www-form-urlencoded;
charset=utf-8, )

This is why it still return 403! Spark is not using the key I provide with
fs.s3a.secret.key but instead invent a random one EACH time (everytime I
submit the job the random secret key is different)

For the record I'm running this locally on my machine (OSX) with this
command

spark-submit --packages com.amazonaws:aws-java-sdk-
pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

Could some one enlighten me on this?


Re: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase

2012-09-17 Thread Stack
On Mon, Sep 17, 2012 at 6:55 AM, Dai, Jason jason@intel.com wrote:
 Hi,

 I'd like to announce Project Panthera, our open source efforts that showcase 
 better data analytics capabilities on Hadoop/HBase (through both SW and HW 
 improvements), available at https://github.com/intel-hadoop/project-panthera.


...

 2)  A document store (built on top of HBase) for better query processing
Under Project Panthera, we will gradually make our implementation of the 
 document store available as an extension to HBase 
 (https://github.com/intel-hadoop/hbase-0.94-panthera). Specifically, today's 
 release provides document store support in HBase by utilizing co-processors, 
 which brings up-to 3x reduction in storage usage and up-to 1.8x speedup in 
 query processing. Going forward, we will also use 
 HBase-6800https://issues.apache.org/jira/browse/HBASE-6800 as the umbrella 
 JIRA to track our efforts to get the document store idea reviewed and 
 hopefully incorporated into Apache HBase.


Thanks for open sourcing this stuff Jason.  It looks great.

I took a quick look.  Like Andy, I see that Pathera -- great name by
the way, J-D is playing Pantera (too!) loud here in our space since
this note showed up on the list -- includes a full HBase.  Do you have
to deliver Panthera that way?  Can we help make it so you do not need
to include HBase core?  Do you have a list of things we need to change
so you can go downstream of core?

Good on you Jason,
St.Ack


Re: why hbase doesn't provide Encryption

2012-09-05 Thread Stack
On Tue, Sep 4, 2012 at 9:52 PM, Farrokh Shahriari
mohandes.zebeleh...@gmail.com wrote:
 Hello
 I just wanna know why hbase doesn't provide Encryption ?


Please be more specific.  You want us to encrypt each cell for you
automatically.  How would you suggest it work in a generic way?
Thanks,
St.Ack


Re: HBase MasterNotRunningException

2012-08-30 Thread Stack
On Thu, Aug 30, 2012 at 12:17 PM, Jilani Shaik jilani2...@gmail.com wrote:
 telnet is working for 60010, 60030 and 9000 from both the local and remote
 boxes.


Then the hbase daemons are not running or as Anil is suggesting, the
connectivity between machines needs fixing (It looks like all binds to
localhost.. can you fix that?).  Once your connectivity fixed, then
try running HBase.

St.Ack


Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain

2012-06-07 Thread Stack
On Thu, Jun 7, 2012 at 2:18 AM, Manu S manupk...@gmail.com wrote:
 *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress:
 Could not resolve the DNS name of localhost.localdomain

This is pretty basic.  Fix this first and then your hbase will work.

Please stop spraying your queries across multiple lists.  Doing so
makes us think you arrogant which I am sure is not the case.  Pick the
list that seems most appropriate.  For example, in this case, it seems
like the hbase-user list would have been the right place to write; not
common-user and cdh-user.  If it turns out you've chosen wrong,
usually the chosen list will help you figure the proper target.

Thanks,
St.Ack


Re: Need Help with HBase

2011-08-16 Thread Stack
On Tue, Aug 16, 2011 at 3:53 PM, Taylor, Ronald C
ronald.tay...@pnnl.gov wrote:
 Re file systems: while HBase can theoretically run on other scalable file 
 systems, I remember somebody on the HBase list saying, in effect, that unless 
 you are a file system guru and willing to put in a heck of a lot of work, the 
 only practical choice as an underlying file system is Hadoop's HDFS. I think 
 that was something like half a year ago or more, so maybe things have 
 changed.  Any of the HBase developers on the HBase list have an update (or a 
 correction to my recollection)?


See our book on 'Which Hadoop':
http://hbase.apache.org/book.html#hadoop.  It tells our 'which version
of hadoop' story.   It talks of how you need to use the unreleased
branch-0.20-append branch or run Cloudera's CDH3u1.  It also mentions
the newcomer MapR as an alternative.  They did the work to make HBase
run on their filesystem.

St.Ack


Re: HBase Mapreduce cannot find Map class

2011-07-28 Thread Stack
See 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
for some help.
St.Ack

On Thu, Jul 28, 2011 at 4:04 AM, air cnwe...@gmail.com wrote:
 -- Forwarded message --
 From: air cnwe...@gmail.com
 Date: 2011/7/28
 Subject: HBase Mapreduce cannot find Map class
 To: CDH Users cdh-u...@cloudera.org


 import java.io.IOException;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;

 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.client.HTable;
 import org.apache.hadoop.hbase.client.Put;
 import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.lib.NullOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;


 public class LoadToHBase extends Configured implements Tool{
    public static class XMapK, V extends MapReduceBase implements
 MapperLongWritable, Text, K, V{
        private JobConf conf;

        @Override
        public void configure(JobConf conf){
            this.conf = conf;
            try{
                this.table = new HTable(new HBaseConfiguration(conf),
 observations);
            }catch(IOException e){
                throw new RuntimeException(Failed HTable construction, e);
            }
        }

        @Override
        public void close() throws IOException{
            super.close();
            table.close();
        }

        private HTable table;
        public void map(LongWritable key, Text value, OutputCollector
 output, Reporter reporter) throws IOException{
            String[] valuelist = value.toString().split(\t);
            SimpleDateFormat sdf = new  SimpleDateFormat(-MM-dd
 HH:mm:ss);
            Date addtime = null; // 用户注册时间
            Date ds = null;
            Long delta_days = null;
            String uid = valuelist[0];
            try {
                addtime = sdf.parse(valuelist[1]);
            } catch (ParseException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

            String ds_str = conf.get(load.hbase.ds, null);
            if (ds_str != null){
                try {
                    ds = sdf.parse(ds_str);
                } catch (ParseException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }else{
                ds_str = 2011-07-28;
            }

            if (addtime != null  ds != null){
                delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
 60 * 1000);
            }

            if (delta_days != null){
                byte[] rowKey = uid.getBytes();
                Put p = new Put(rowKey);
                p.add(content.getBytes(), attr1.getBytes(),
 delta_days.toString().getBytes());
                table.put(p);
            }
        }
    }
    /**
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub
        int exitCode = ToolRunner.run(new HBaseConfiguration(), new
 LoadToHBase(), args);
        System.exit(exitCode);
    }

    @Override
    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        JobConf conf = new JobConf(getClass());
        TableMapReduceUtil.addDependencyJars(conf);
        FileInputFormat.addInputPath(conf, new Path(args[0]));
        conf.setJobName(LoadToHBase);
        conf.setJarByClass(getClass());
        conf.setMapperClass(XMap.class);
        conf.setNumReduceTasks(0);
        conf.setOutputFormat(NullOutputFormat.class);
        JobClient.runJob(conf);
        return 0;
    }

 }

 execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
 and it says:

 ..
 11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
 attempt_201107261532_2625_m_04_1, Status : FAILED
 java.lang.RuntimeException: Error in configuring object
        at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    

Re: Custom TableInputFormat not working correctly

2011-06-20 Thread Stack
Do you have  100k rows?
St.Ack

On Sun, Jun 19, 2011 at 8:49 AM, edward choi mp2...@gmail.com wrote:
 Hi,

 I have implemented a custom TableInputFormat.
 I call it TableInputFormatMapPerRow and that is exactly what it does.
 The getSplits() of my custom TableInputFormat creates a TableSplit for each
 row in the HBase.
 But when I actually run an application with my custom TableInputFormat,
 there are not enough map tasks than there should be.
 I really don't know what I am doing wrong.
 Any suggestions please?
 Below is my TableInputFormatMapPerRow.java

 Ed

 --

 /**
  * Copyright 2007 The Apache Software Foundation
  *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * License); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an AS IS BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 package org.apache.hadoop.hbase.mapreduce;

 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.List;

 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;

 import org.apache.hadoop.conf.Configurable;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.mapreduce.JobContext;
 import org.apache.hadoop.mapreduce.InputFormat;
 import org.apache.hadoop.mapreduce.InputSplit;
 import org.apache.hadoop.mapreduce.RecordReader;
 import org.apache.hadoop.mapreduce.TaskAttemptContext;
 import org.apache.hadoop.util.StringUtils;

 import org.apache.hadoop.hbase.client.HTable;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.ResultScanner;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.filter.KeyOnlyFilter;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.hbase.mapreduce.TableInputFormatBase;
 import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
 import org.apache.hadoop.hbase.util.Bytes;

 /**
  * Convert HBase tabular data into a format that is consumable by
 Map/Reduce.
  */
 public class TableInputFormatMapPerRow extends
 InputFormatImmutableBytesWritable, Result
 implements Configurable {

  private final Log LOG =
 LogFactory.getLog(TableInputFormatMapPerRow.class);

  /** Job parameter that specifies the input table. */
  public static final String INPUT_TABLE = hbase.mapreduce.inputtable;
  /** Base-64 encoded scanner. All other SCAN_ confs are ignored if this is
 specified.
   * See {@link TableMapReduceUtil#convertScanToString(Scan)} for more
 details.
   */
  public static final String SCAN = hbase.mapreduce.scan;
  /** Column Family to Scan */
  public static final String SCAN_COLUMN_FAMILY =
 hbase.mapreduce.scan.column.family;
  /** Space delimited list of columns to scan. */
  public static final String SCAN_COLUMNS = hbase.mapreduce.scan.columns;
  /** The timestamp used to filter columns with a specific timestamp. */
  public static final String SCAN_TIMESTAMP =
 hbase.mapreduce.scan.timestamp;
  /** The starting timestamp used to filter columns with a specific range of
 versions. */
  public static final String SCAN_TIMERANGE_START =
 hbase.mapreduce.scan.timerange.start;
  /** The ending timestamp used to filter columns with a specific range of
 versions. */
  public static final String SCAN_TIMERANGE_END =
 hbase.mapreduce.scan.timerange.end;
  /** The maximum number of version to return. */
  public static final String SCAN_MAXVERSIONS =
 hbase.mapreduce.scan.maxversions;
  /** Set to false to disable server-side caching of blocks for this scan.
 */
  public static final String SCAN_CACHEBLOCKS =
 hbase.mapreduce.scan.cacheblocks;
  /** The number of rows for caching that will be passed to scanners. */
  public static final String SCAN_CACHEDROWS =
 hbase.mapreduce.scan.cachedrows;

  /** The configuration. */
  private Configuration conf = null;
  private HTable table = null;
  private Scan scan = null;
  private TableRecordReader tableRecordReader = null;

  /**
   * Returns the current configuration.
   *
   * @return The current configuration.
   * @see org.apache.hadoop.conf.Configurable#getConf()
   */
  @Override
  public Configuration getConf() {
    return conf;
  }

  /**
   * Sets the configuration. This is used to set the details for the table
 to
   * be scanned.
   *
   * @param 

Re: Hbase startup error: NoNode for /hbase/master after running out of space

2011-06-07 Thread Stack
On Mon, Jun 6, 2011 at 8:29 PM, Zhong, Sheng sheng.zh...@searshc.com wrote:
 I am appreciated by any help and suggestion. P.S: we're using apache
 hadoop 0.20.2 and hbase 0.20.3, and zookeeper is running via
 zookeeper-3.2.2 (not managed by Hbase).

Can you upgrade you hbase and hadoop?
St.Ack


Re: Hadoop not working after replacing hadoop-core.jar with hadoop-core-append.jar

2011-06-06 Thread Stack
On Mon, Jun 6, 2011 at 6:23 AM, praveenesh kumar praveen...@gmail.com wrote:
 Changing the name of the hadoop-apppend-core.jar file to
 hadoop-0.20.2-core.jar did the trick..
 Its working now..
 But is this the right solution to this problem ??


It would seem to be.  Did you have two hadoop*jar versions in your lib
directory by any chance?  You did not remove the first?
St.Ack


Client seeing wrong data on nodeDataChanged

2010-10-28 Thread Stack
I'm trying to debug an issue that maybe you fellas have some ideas for figuring.

In short:

Client 1 updates a znode setting its content to X, then X again, then
Y, and then finally it deletes the znode.  Client 1 is watching the
znode and I can see that its getting three nodeDataChanged events and
a nodeDeleted.

Client 2 is also watching the znode.  It gets notified three times:
two nodeDataChanged events(only) and a nodeDeleted event.  I'd expect
3 nodeDataChanged events but understand a client might skip states.
The problem is that when client 2 looks at the data in the znode on
nodeDataChanged, for both cases the data is Y.  Not X and then Y, but
Y both times.  This is unexpected.

This is 3.3.1 on a 5 node ensemble.

I have full zk logging enabled.  Would it help posting these?

St.Ack


Re: Client seeing wrong data on nodeDataChanged

2010-10-28 Thread Stack
On Thu, Oct 28, 2010 at 7:32 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 Client 2 is not guaranteed to see X if it doesn't get to asking before the
 value has been updated to Y.

Right, but I wouldn't expect the watch to be triggered twice with value Y.

Anyways, I think we have a handle on whats going on: at the time of
the above incident, the master process is experiencing a flood of zk
changes and our thought is that we're not paying sufficient attention
to the order of receipt.  Will be back if this is not the issue.

Thanks,
St.Ack


Re: HBase MR: run more map tasks than regions

2010-09-14 Thread Stack
On Tue, Sep 14, 2010 at 10:10 AM, Alex Baranau alex.barano...@gmail.com wrote:
Is the only way is to enhance TableInputFormat for me?


Currently, yes, you must enhance TIF or use an alternate TIF.
St.Ack


Re: Meaning of storefileIndexSize

2010-05-18 Thread Stack
On Tue, May 18, 2010 at 2:15 AM, Renaud Delbru renaud.del...@deri.org wrote:
 Hi,

 after some tuning, like increasing the hfile block size to 128KB,  I have
 noticed that the storefileIndexSize is now half of what it was before
 (~250). Do storefileIndexSize is the size of the in-memory hfile block index
 ?

Yes.

So, yes, doubling the block size should halve the index size.

How come your index is so big?  Do you have big keys?  Lots of data?
Lots of storefiles?

Looking in HRegionServer I see that its calculated so:

 storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024);

In the Store, we do this:

  /**
   * @return The size of the store file indexes, in bytes.
   */
  long getStorefilesIndexSize() {
long size = 0;
for (StoreFile s: storefiles.values()) {
  Reader r = s.getReader();
  if (r == null) {
LOG.warn(StoreFile  + s +  has a null Reader);
continue;
  }
  size += r.indexSize();
}
return size;
  }

The indexSize is out of the HFile metadata.

St.Ack


 Thanks
 --
 Renaud Delbru

 On 17/05/10 15:27, Renaud Delbru wrote:

 Hi,

 I would like to understand the meaning of the storefileIndexSize metric,
 could someone point me to a definition or explain me what does that mean ?

 Also, we are performing a large table import (90M rows, size of the row
 varying between hundreds of kb to 8 MB), and we are encountering memory
 problem (OOME). My observation is that it always happens after a while, when
 the storefileIndexSize starts to be large ( 500). Is there a way to reduce
 it ?

 Thanks,




Re: Meaning of storefileIndexSize

2010-05-18 Thread Stack
On Tue, May 18, 2010 at 9:04 AM, Renaud Delbru renaud.del...@deri.org wrote:
 How come your index is so big?  Do you have big keys?  Lots of data?
 Lots of storefiles?


 We have 90M of rows, each rows varies from a few hundreds of kilobytes to
 8MB.


Index keeps the 'key' that starts each block in an hfile and its
offset where the 'key' is a combination of row+column+timestamp (not
the value).  Your 'keys' are large?

 I have also changed at the same time another parameter, the
 hbase.hregion.max.filesize. It was set to 1GB (from previous test), and I
 switch it back to the default value (256MB).
 So, in the previous tests, there was a few number of region files (like
 250), but a very large index file size (500).

 In my last test (hregion.max.filesize=256, block size=128K), the number of
 region files increased (I have now more than 1000 region file), but the
 index file size is now less than 200.

 Do you think the hregion.max.filesize could had impact on the index file
 size ?


Hmm.  You have same amount of data just more files because you
lowered max filesize (by a factor of 4 so 4x the number of files) so
I'd expect that index would be of the same size.

If inclined to do more digging, you can use the hfile tool:

./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile

Do the above and you'll get usage.  Print out the metadata on hfiles.
Might help you figure whats going on.

 Looking in HRegionServer I see that its calculated so:

  storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024);


 So, storefileIndexSize indicates the number of MB of heap used by the index.
 And, in our case, 500 was too excessive given the fact that our region
 server is limited to 1GB of heap.


If 1GB only, then yeah, big indices will cause a prob.  How many
regions per regionserver?  Sounds like you have a few?  If so, can you
add more servers?  Or up the RAM in your machines?

Yours,
St.Ack


Re: HBase MR with Filter

2010-05-18 Thread Stack
On Tue, May 18, 2010 at 9:25 AM, Patrick Datko patrick.da...@ymc.ch wrote:
 Hey,

 i'm building a Map Reduce Job which should get Data from a HBase Table
 filter it an store the reduced data in another HBase Table. I used the
 SingleColumnValueFilter to limit the Data that will be commit to the
 Map-Process. The Problem is, the Filter doesn't reduce the Data but
 commit all Data in the Table to the Map Process:

 The code looks like this for the Filter:

 Scan scan = new Scan();
 String columns = details;
 String qualifier = details:page;
 String value = 5;
 scan.setFilter(new SingleColumnValueFilter(Bytes.toBytes(columns),
 Bytes.toBytes(qualifier), CompareOp.EQUAL, Bytes.toBytes(5)));
 TableMapReduceUtil.initTableMapperJob(books, scan, mapper.class,
 ImmutableBytesWritable.class, IntWritable.class, job);


 And this was my Code for filling the Table:

 Put put = new Put(rowkey);
 put.add(Bytes.toBytes(details), Bytes.toBytes(page),
 Bytes.toBytes(rand.nextInt(20)));

 and I didn't understand why the filter doesn't work! I hope anybody can
 help me.

 Best regards,
 Patrick






Re: HBase MR with Filter

2010-05-18 Thread Stack
The below looks 'right'.

Maybe try uncommenting this:

  this.comparator.compareTo(Arrays.copyOfRange(data, offset,
offset + length));
//if (LOG.isDebugEnabled()) {
//  LOG.debug(compareResult= + compareResult +   +
Bytes.toString(data, offset, length));
//}

...in SingleColumnValueFilter.  It might give you a clue as to where
things are going awry.

St.Ack

On Tue, May 18, 2010 at 9:25 AM, Patrick Datko patrick.da...@ymc.ch wrote:
 Hey,

 i'm building a Map Reduce Job which should get Data from a HBase Table
 filter it an store the reduced data in another HBase Table. I used the
 SingleColumnValueFilter to limit the Data that will be commit to the
 Map-Process. The Problem is, the Filter doesn't reduce the Data but
 commit all Data in the Table to the Map Process:

 The code looks like this for the Filter:

 Scan scan = new Scan();
 String columns = details;
 String qualifier = details:page;
 String value = 5;
 scan.setFilter(new SingleColumnValueFilter(Bytes.toBytes(columns),
 Bytes.toBytes(qualifier), CompareOp.EQUAL, Bytes.toBytes(5)));
 TableMapReduceUtil.initTableMapperJob(books, scan, mapper.class,
 ImmutableBytesWritable.class, IntWritable.class, job);


 And this was my Code for filling the Table:

 Put put = new Put(rowkey);
 put.add(Bytes.toBytes(details), Bytes.toBytes(page),
 Bytes.toBytes(rand.nextInt(20)));

 and I didn't understand why the filter doesn't work! I hope anybody can
 help me.

 Best regards,
 Patrick






Re: ClassNotFoundException: org.apache.hadoop.hbase.client.idx.IdxQualifierType

2010-05-17 Thread Stack
You have IHBase jar in your CLASSPATH?
St.Ack

On Mon, May 17, 2010 at 3:56 AM, Nitin Goel nitin.g...@in.fujitsu.com wrote:
 Hi,



 I am new to HBase and I am trying to use hbql on HBase. There I am
 getting the following exception



 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/hbase/client/idx/IdxQualifierType

            at
 org.apache.hadoop.hbase.hbql.mapping.FieldType.clinit(FieldType.java:5
 0)

            at
 org.apache.hadoop.hbase.hbql.mapping.ColumnDefinition.getFieldType(Colum
 nDefinition.java:159)

            at
 org.apache.hadoop.hbase.hbql.mapping.ColumnDefinition.newMappedColumn(Co
 lumnDefinition.java:99)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.columnDefinition(HBqlParse
 r.java:4565)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.columnDefinitionnList(HBql
 Parser.java:4358)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.familyMapping(HBqlParser.j
 ava:4317)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.familyMappingList(HBqlPars
 er.java:4197)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.attribMapping(HBqlParser.j
 ava:2577)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.hbqlStmt(HBqlParser.java:9
 86)

            at
 org.apache.hadoop.hbase.hbql.antlr.HBqlParser.hbqlStatement(HBqlParser.j
 ava:463)

            at
 org.apache.hadoop.hbase.hbql.parser.ParserUtil.parseHBqlStatement(Parser
 Util.java:163)

            at
 org.apache.hadoop.hbase.hbql.impl.Utils.parseHBqlStatement(Utils.java:49
 )

            at
 org.apache.hadoop.hbase.hbql.impl.HStatementImpl.execute(HStatementImpl.
 java:159)

            at
 org.apache.hadoop.hbase.hbql.impl.HConnectionImpl.execute(HConnectionImp
 l.java:314)

            at
 org.apache.hadoop.hbase.hbql.impl.MappingManager.validatePersistentMetad
 ata(MappingManager.java:59)

            at
 org.apache.hadoop.hbase.hbql.impl.HConnectionImpl.init(HConnectionImpl
 .java:92)

            at
 org.apache.hadoop.hbase.jdbc.impl.ConnectionImpl.init(ConnectionImpl.j
 ava:64)

            at
 org.apache.hadoop.hbase.jdbc.Driver.getConnection(Driver.java:85)

            at
 org.apache.hadoop.hbase.jdbc.Driver.connect(Driver.java:74)

            at
 java.sql.DriverManager.getConnection(DriverManager.java:582)

            at
 java.sql.DriverManager.getConnection(DriverManager.java:207)

            at
 com.fujitsu.fla.tsig.gdb.hbase.HBaseHelper.main(HBaseHelper.java:42)

 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.client.idx.IdxQualifierType

            at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

            at java.security.AccessController.doPrivileged(Native
 Method)

            at
 java.net.URLClassLoader.findClass(URLClassLoader.java:188)

            at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

            at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)

            at java.lang.ClassLoader.loadClass(ClassLoader.java:251)

            at
 java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

            ... 22 more



 Could you please let me know in which jar I can find
 org.apache.hadoop.hbase.client.idx.IdxQualifierType class? I have
 checked the source code of HBase 0.20.4, however I didn't find the
 required class?



 Thanks  Regards,

 Nitin Goel





 DISCLAIMER:
 This e-mail and any attached files may contain confidential and/or privileged 
 material for the sole use of the intended recipient.  Any review, use, 
 distribution or disclosure by others is strictly prohibited. If you are not 
 the intended recipient (or authorized to receive this e-mail for the 
 recipient), you may not review, copy or distribute this message.  Please 
 contact the sender by reply e-mail and delete all copies of this 
 message:Fujitsu Consulting India Pvt Limited



Re: Inverted word index...

2010-05-17 Thread Stack
... and you've seen http://github.com/akkumar/hbasene and
http://github.com/thkoch2001/lucehbase?
St.Ack

On Mon, May 17, 2010 at 1:07 AM, Kevin Apte
technicalarchitect2...@gmail.com wrote:
    Consider a search system with an inverted word index- in other words, an
 index which points to document location- with these columns- word, document
 ID and possibly timestamp.

 Given a word, how will I know which tablet to scan to find all Document IDs,
 with the given word.

 If you are indexing a large database - say 50 TB, then each word may be
 split across multiple tablets. There may be hundreds  of such tablets each
 with a large number of SSTables  to store the index. How will I know which
 tablet to search for?  Is there a master index that specifies which tablet
 has words with range say ro to ru  ?    Or do I have to lookup Bloom
 Filters for every tablet?

 Kevin



Re: HBase client hangs after upgrade to 0.20.4 when used from reducer

2010-05-14 Thread Stack
The below looks same as blockage seen in logs posted to hbase-2545 by
Kris Jirapinyo.

Todd has made a fix and added a test.

We'll roll a 0.20.5 hbase with this fix and a fix for hbase-2541
(missing licenses from the head of some source files) soon as we get
confirmation that Todd's fix works for Kris Jirapinyo's seemingly
similar issue.

Thanks,
St.Ack

On Fri, May 14, 2010 at 9:07 AM, Todd Lipcon t...@cloudera.com wrote:
 It appears like we might be stuck in an infinite loop here:

 IPC Server handler 9 on 60020 daemon prio=10 tid=0x2aaeb42f7800
 nid=0x6508 runnable [0x445bb000]
   java.lang.Thread.State: RUNNABLE
        at
 org.apache.hadoop.hbase.regionserver.ExplicitColumnTracker.checkColumn(ExplicitColumnTracker.java:128)
        at
 org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:165)
        at
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:176)

 It's holding a lock that some other threads are blocked on. In both of your
 pastes, there are some threads stuck here.

 JD, any thoughts? Looks like you made some changes to this code for 0.20.4.

 -Todd

 On Fri, May 14, 2010 at 6:56 AM, Friso van Vollenhoven 
 fvanvollenho...@xebia.com wrote:

 Hi Todd,

 The row counting works fine. It is quite slow, I have to say. But I have
 never used row counting from the shell before, so I don't know what
 performance to expect from it or how it is implemented. It's just that a
 regular scan from software is way faster.

 Also, from our application we do a full table scan to populate an in-memory
 index of row keys, because we need to be able to quickly determine if a
 certain key exists or not. I triggered this scan from our UI while there
 were hanging reducers. This also works fine. There are close to 5 million
 records in the table and I checked in the web interface that the table is
 divided across all 4 region servers, so this process should hit them all.

 The earlier jstacks of the region servers were taken when the reducers
 (clients) were hanging, before the shutdown was requested. Some 3 to 5
 reduce tasks hang, not all of them, but surely more than just one.

 Because of your question about what is locked up (client or region server),
 I SSH'ed into each of the worker machines after giving HBase the shutdown
 signal (so the shutdown sequence started) and tried to see if the region
 server was still running and if so,  shutdown each individual region server
 manually (doing 'hbase-daemon.sh stop regionserver' on each, I'm glad there
 are only 4 nodes). I found that:
 - one of the region servers actually shut down normally (worker3)
 - two region servers shut down normally after the hbase-daemon.sh command
 (worker4 and worker1)
 - one region server does not shut down (worker2)

 I put some additional info on pastebin.

 Here is the jstack of worker2 (the hanging one):
 http://pastebin.com/5V0UZi7N
 There are two jstack outputs, one from before the shutdown command was
 given and one (starting at line 946) from after the shutdown command was
 given.

 Here are the logs of that region server: http://pastebin.com/qCXSKR2A
 I set the log level for org.apache.hadoop.hbase to DEBUG before doing all
 this, so it's more verbose (I don't know if this helps).

 So, it appears that it is one of the region servers that is locked up, but
 only for some connections while it can still serve other connections
 normally. From the locked up region server logs, it looks like the shutdown
 sequence runs completely, but the server just won't die afterwards (because
 of running non-daemon threads; maybe it should just do a System.exit() if
 all cleanup is successful). At least nothing gets corrupted, which is nice.
 Of course I am still trying to find out why things get locked up in the
 first place.

 I did this test twice today. During the first run it was a different region
 server that was hanging, so I think it has nothing to do with a problem
 related to that specific machine.

 My next step is to go through code (including HBase's, so It will take me
 some time...) and see what exactly happens in our scenario, because from my
 current knowledge the jstack outputs don't mean enough to me.



 Friso




 On May 13, 2010, at 7:09 PM, Todd Lipcon wrote:

  Hi Friso,
 
  When did you take the jstack dumps of the region servers? Was it when the
  reduce tasks were still hanging?
 
  Do all of the reduce tasks hang or is it just one that gets stuck?
 
  If, once the reduce tasks are hung, you open the hbase shell and run
 count
  'mytable', 10 does it successfully count the rows?
 
  (I'm trying to determine if the client is locked up, or one of the RSes
 is
  locked up)
 
  Enjoy your holiday!
 
  -Todd
 
  On Thu, May 13, 2010 at 12:38 AM, Friso van Vollenhoven 
  fvanvollenho...@xebia.com wrote:
 
  Hi,
 
  I was kind of hoping that this was a known thing and I was just
 overlooking
  something. Apparently it requires more investigation.
 
  

Re: GSoC 2010: ZooKeeper Monitoring Recipes and Web-based Administrative Interface

2010-05-13 Thread Stack
On Thu, May 13, 2010 at 2:30 AM, Andrei Savu savu.and...@gmail.com wrote:
 Hi all,

 My name is Andrei Savu and I am on of the GSoC2010 accepted students.
 My mentor is Patrick Hunt.


Good to meet you Andrei.

 Are there any HBase  / Hadoop  specific ZooKeeper monitoring requirements?


In the hbase shell, you can poke at your zk ensemble currently.  Here
is what it looks like:

hbase(main):001:0 zk
ZooKeeper -server host:port cmd args
connect host:port
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
create [-s] [-e] path data acl
stat path [watch]
close
ls2 path [watch]
history
listquota path
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
setquota -n|-b val path


Thats pretty great.

What'd be sweeter would be addition of a zktop command.  I know its a
python script at mo.  Maybe there is a pure java implementation?

Also in our UI, you can browse to a page of basic ensemble stats.
Would be excellent if instead that were the fancy-pants zktop output.
Or, if you are doing a zk UI anyways, just make sure it packaged in a
way that makes it easy for us to launch as part of our UI?  I'd image
if it packged as a WAR file that should be fine but we'd need some way
of passing in where the zk ensemble is, perhaps as arguments on the
url?

Thanks for writing the list Andrei,
St.Ack


Re: HBase has entered Debian (unstable)

2010-05-13 Thread Stack
You are a good man Thomas.  Thanks for pushing this through.
St.Ack

On Thu, May 13, 2010 at 1:59 AM, Thomas Koch tho...@koch.ro wrote:
 Hi,

 HBase 0.20.4 has entered Debian unstable, should slide into testing after the
 usual 14 day period and will therefor most likely be included in the upcomming
 Debian Squeeze.

 http://packages.debian.org/source/sid/hbase

 Please note, that this packaging effort is still very much work-in-progress
 and not yet suitable for production use. However the aim is to have a rock
 solid stable HBase in squeeze+1 respectively in Debian testing in the next
 months. Meanwhile the HBase package in Debian can raise HBase's visibility and
 lower the entrance barrier.

 So if somebody wants to try out HBase (on Debian), it is as easy as:

 aptitude install zookeeperd hbase-masterd

 In other news: zookeeper is in Debian testing as of today.

 Best regards,

 Thomas Koch, http://www.koch.ro



Re: Regionservers crash with an OutOfMemoryException after a data-intensive map reduce job..

2010-05-13 Thread Stack
Hello Vidhyashankar:

How many regionservers?   What version of hbase and hadoop?  How much
RAM on these machines in total?  Can you give HBase more RAM?

Also check that you don't have an exceptional cell in your input --
one that is very much larger than the 14KB you not below.

12 column families is at the extreme regards what we've played with,
just FYI.  You might try with a schema that has less: e.g. one CF for
the big cell value and all others into the second CF.

There may also be corruption in one of the storefiles given that the
OOME below seems to happen when we try and open a region (but the fact
of opening may have no relation to why the OOME).

St.Ack


On Thu, May 13, 2010 at 10:35 AM, Vidhyashankar Venkataraman
vidhy...@yahoo-inc.com wrote:
 This is similar to a mail sent by another user to the group a couple of
 months back.. I am quite new to Hbase and I’ve been trying to conduct a
 basic experiment with Hbase..

 I am trying to load 200 million records each record around 15 KB : with one
 column value around 14KB and the rest of the 100 column values 8 bytes
 each.. The 120 columns are grouped as 10 qualifiers X 12 families: hope I
 got my jargon right.. Note that only one value is quite large for each doc
 (when compared to other values)...
 The data is uncompressed.. And each value is uniformly randomly selected..
 I used a map-reduce job to load a data file on hdfs into the database.. Soon
 after the job finished, the region servers crash with OOM Exception.. Below
 is part of the trace from the logs in one of the RS’s:

 I have attached the conf along with the email: Can you guys point out any
 anamoly in my settings? I have set a heap size of 3 gigs.. Anything
 significantly more, java 32-bit doesn’t run..


 2010-05-12 19:22:45,068 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
 Total=8.43782MB (8847696), Free=1791.2247MB (1878235312), M
 ax=1799.6626MB (1887083008), Counts: Blocks=1, Access=16947, Hit=52,
 Miss=16895, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.3068389603868127%,
 Miss Ratio=99
 .69316124916077%, Evicted/Run=NaN
 2010-05-12 19:22:45,069 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col5/7617863559659933969,
 isReference=false, seque
 nce id=2470632548, length=8456716, majorCompaction=false
 2010-05-12 19:22:45,075 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col6/1328113038200437659,
 isReference=false, seque
 nce id=2960732840, length=19861, majorCompaction=false
 2010-05-12 19:22:45,078 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col6/6484804359703635950,
 isReference=false, seque
 nce id=2470632548, length=8456716, majorCompaction=false
 2010-05-12 19:22:45,082 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col7/1673569837212457160,
 isReference=false, seque
 nce id=2960732840, length=19861, majorCompaction=false
 2010-05-12 19:22:45,085 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col7/4737399093829085995,
 isReference=false, seque
 nce id=2470632548, length=8456716, majorCompaction=false
 2010-05-12 19:22:47,238 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col8/8446828932792437464,
 isReference=false, seque
 nce id=2960732840, length=19861, majorCompaction=false2010-05-12
 19:22:47,241 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded
 /hbase/DocData/1651418343/col8/974386128174268353, isReference=false, sequen
 ce id=2470632548, length=8456716, majorCompaction=false
 2010-05-12 19:22:48,804 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col9/2096232603557969237,
 isReference=false, seque
 nce id=2470632548, length=8456716, majorCompaction=false
 2010-05-12 19:22:48,807 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/1651418343/col9/7088206045660348092,
 isReference=false, seque
 nce id=2960732840, length=19861, majorCompaction=false
 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 region DocData,4824176,1273625075099/1651418343 available; sequence id is
 29607328
 41
 2010-05-12 19:22:48,808 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
 DocData,40682172,1273607630618
 2010-05-12 19:22:48,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
 Opening region DocData,40682172,1273607630618, encoded=271889952
 2010-05-12 19:22:50,924 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/DocData/271889952/CONTENT/4859380626868896307,
 isReference=false, sequence id=2959849236, length=337563,
 majorCompaction=false2010-05-12 19:22:53,037 DEBUG
 org.apache.hadoop.hbase.regionserver.Store: loaded
 /hbase/DocData/271889952/CONTENT/952776139755887312, isReference=false, sequ
 ence id=2082553088, length=110460013, majorCompaction=false
 2010-05-12 19:22:57,404 DEBUG 

Re: tables disappearing after upgrading 0.20.3 = 0.20.4

2010-05-13 Thread Stack
Whats the shelll say?  Does it see the tables consistently?  Can you
count your content consistently?
St.Ack

On Thu, May 13, 2010 at 4:53 PM, Viktors Rotanovs
viktors.rotan...@gmail.com wrote:
 Hi,

 after upgrading from 0.20.3 to 0.20.4 a list of tables almost
 immediately becomes inconsistent - master.jsp shows no tables even
 after creating test table in hbase shell, tables which were available
 before start randomly appearing and disappearing, etc. Upgrading was
 done by stopping, upgrading code, and then starting (no dump/restore
 was done).
 I didn't investigate yet, just checking if somebody had the same
 problem or if I did upgrade right (I had exactly the same issue in the
 past when trying to apply HBASE-2174 manually).

 Environment:
 Small tables, 100k rows
 Amazon EC2, c1.xlarge instance type with Ubuntu 9.10 and EBS root,
 HBase installed manually
 1 master (namenode + jobtracker + master), 3 slaves (tasktracker +
 datanode + regionserver + zookeeper)
 Hadoop 0.20.1+169.68~1.karmic-cdh2 from Cloudera distribution
 Flaky DNS issue present, happens about once per day even with dnsmasq
 installed (heartbeat every 1s, dnsmasq forwards requests once per
 minute), DDNS set for internal hostnames.

 This is a testing cluster, nothing important on it.

 Cheers,
 -- Viktors



Re: Enabling IHbase

2010-05-12 Thread Stack
You saw this package doc over in the ihbase's new home on github?
http://github.com/ykulbak/ihbase/blob/master/src/main/java/org/apache/hadoop/hbase/client/idx/package.html
 It'll read better if you build the javadoc.  There is also this:
http://github.com/ykulbak/ihbase/blob/master/README

St.Ack

On Wed, May 12, 2010 at 8:27 AM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
 Hi Alex,

 Thanks for your help, but I meant something more like a how-to set it up
 thing, or like a tutorial of it (=
 I also read these ones if anyone else is interested.

 http://blog.sematext.com/2010/03/31/hbase-digest-march-2010/
 http://search-hadoop.com/m/5MBst1uL87b1

 Renato M.



 2010/5/12 alex kamil alex.ka...@gmail.com

 regarding usage this may be helpful
 https://issues.apache.org/jira/browse/HBASE-2167


 On Wed, May 12, 2010 at 10:48 AM, alex kamil alex.ka...@gmail.com wrote:

 Renato,

 just noticed you are looking for *Indexed *Hbase

 i found this
 http://blog.reactive.org/2010/03/indexed-hbase-it-might-not-be-what-you.html

 Alex


 On Wed, May 12, 2010 at 10:42 AM, alex kamil alex.ka...@gmail.comwrote:


 http://www.google.com/search?hl=ensource=hpq=hbase+tutorialaq=faqi=g-p1g-sx3g1g-sx4g-msx1aql=oq=gs_rfai=


 On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo 
 renatoj.marroq...@gmail.com wrote:

 Hi eveyone,

 I just read about IHbase and seems like something I could give it a try,
 but
 I haven't been able to find information (besides descriptions and
 advantages) regarding to how to install it or use it.
 Thanks in advance.

 Renato M.








Re: Problem with performance with many columns in column familie

2010-05-11 Thread Stack
You could try thread-dumping the regionserver to try and figure where
its hung up.  Counters are usually fast so maybe its something to do
w/ 8k of them in the one row.  What kinda numbers are you seeing?  How
much RAM you throwing at the problem?

Yours,
St.Ack



On Tue, May 11, 2010 at 8:51 AM, Sebastian Bauer ad...@ugame.net.pl wrote:
 Hi,

 maybe i'll get help here :)

 I have 2 tables, UserToAdv and AdvToUsers.

 UserToAdv is simple:
 { row_id = [ {adv:id:counter },
                            {adv:id:counter },
                            .about 100 columns
                        ]
 only one kind of operation is perform - increasing counter:
 client.atomicIncrement(UsersToAdv, ID, column, 1)


 AdvToUsers have one column familie: user: inside this i have about 8000
 columns with format: user:cookie
 what i'm doing on DB is increasing counter inside user:cookie:

 client.atomicIncrement(AdvToUsers, ID, column, 1)

 i have 2 regions:


 first one:
        UsersToAdv,6FEC716B3960D1E8208DE6B06993A68D,1273580007602
            stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=9,
 storefileIndexSizeMB=0
        UsersToAdv,0FDD84B9124B98B05A5E40F47C12DC45,1273580531847
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        AdvToUsers,5735,1273580575873
            stores=1, storefiles=1, storefileSizeMB=15, memstoreSizeMB=10,
 storefileIndexSizeMB=0
        UsersToAdv,67CB411B48A7B83F0B863AC615285060,1273580533380
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,4012667F3E78C6431E3DD84641002FCE,1273580532995
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,5FE4A7506737CE0F38E254E62E23FE45,1273580533380
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,47E95EE30A11EBE45F055AC57EB2676E,1273580532995
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,37F9573415D9069B7E5810012AAD9CB7,1273580532258
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,1FFFDF082566D93153B34BFE0C44A9BF,1273580532173
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,17C93FB0047BC4D660C6570B734CBE17,1273580531847
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,27DFD8F02CD98FF57E8334837C73C57A,1273580532173
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0

 second one:
        UsersToAdv,57C568066D35D09B4AF6CD7D68681144,1273580533427
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,4FA6A1A2681E2D252CCF765B140369EF,1273580533427
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        AdvToUsers,,1273580575966
            stores=1, storefiles=1, storefileSizeMB=1, memstoreSizeMB=1,
 storefileIndexSizeMB=0
        UsersToAdv,07B296AC590061025B382B163E3C149E,1273580533023
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        UsersToAdv,3015D5DB07E2F4D30A19DEB354A85B52,1273580532258
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        AdvToUsers,5859,1273580580940
            stores=1, storefiles=1, storefileSizeMB=9, memstoreSizeMB=9,
 storefileIndexSizeMB=0
        AdvToUsers,5315,1273580575966
            stores=1, storefiles=1, storefileSizeMB=14, memstoreSizeMB=12,
 storefileIndexSizeMB=0
        AdvToUsers,5825,1273580580940
            stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=8,
 storefileIndexSizeMB=0
        AdvToUsers,5671,1273580578114
            stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=7,
 storefileIndexSizeMB=0
        UsersToAdv,,1273580533023
            stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
 storefileIndexSizeMB=0
        AdvToUsers,5457,1273580578114
            stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=8,
 storefileIndexSizeMB=0

 number of queries on both tables are equal, but load is greater on second
 region because of AdvToUsers

 is there any solution to increase performance atomicIncrement operation on
 column families with so many(8000) columns?

 Thank You,

 Sebastian Bauer



Re: WrongRegionException -- add_table.rb screwed up my hbase.

2010-05-11 Thread Stack
Sorry for trouble caused.  I thought that 0.20.4 added updating of
.regioninfo on renable of table but I don't see it.  Nonetheless, I'd
suggest you update to 0.20.4.  It should have fixes at least to save
your from WRE going forward.

Thanks for writing the list,
St.Ack

On Tue, May 11, 2010 at 9:20 PM, maxjar10 jcuz...@gmail.com wrote:

 Answered my own question. The .regioninfo files are there specifically for
 performing fsck functionalities like using add_table.rb.

 The problem is that the .regioninfo files are NOT updated after an alter.
 This issue is described in:

 https://issues.apache.org/jira/browse/HBASE-2366

 The purpose of the .regioninfo files is described here:

 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html


 maxjar10 wrote:

 Ok, here's my story in case anyone else encounters the same issue...

 My question is this... Why does the table descriptor/meta table
 information not match the .regioninfo in each region sub dir? Is this a
 bad thing? Read below...

 HBase 0.23-1
 Hadoop 0.20.1

 So I wanted to add compression to my HBase tables that I already had
 setup. So, I went to the hbase shell and ran a alter table set compression
 to GZ and decreased the versions from 3 to 2.

 I then ran a major_compact on my table to put the change into effect. Even
 though this appears to happen instantly I know you need to wait for
 fragmentation to drop to 0%.

 Now, I wanted to run a job and saw an exception:

 org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
 contact region server Some server, retryOnlyOne=true, index=0,
 islastrow=true, tries=9, numtries=10, i=0, listsize=1,
 region=Businesses,30012005:14317197,1269286328987 for region
 Businesses,30012005:14317197,1269286328987, row '30012013:692',
 but failed after 10 attempts.

 How very strange... Well, this was odd so I went to HBase shell and ran a
 count Businesses and it hung (looped, whatever) when it got ~ to the
 above start row. So, I cancelled the count and saw a brief message in the
 datanode log about the fact that there was a WrongRegionException. Hmmm...

 When I looked at the .META. tables I saw that the endRow didn't match up
 to the next startRow like it should so it looked as though a region was
 missing. Double h...

 Because I clearly had something wrong I decided to try to run the
 add_table.rb as suggested in another thread about seeing
 WrongRegionExceptions. So, I proceeded to run the add_table.rb only to
 have it fail. I modified the script and noticed that it was outputting the
 .regioninfo information that is in each regions sub dir. The problem is
 that this .regioninfo DOES NOT match the actual table description. There
 was no COMPRESSION and the version was back at 3. This was confusing
 because I could see in the actual data when I would open the data files
 that it was clearly gzipped.

 Sigh, so, I went ahead and modified the script to output what does match
 the table description and the only thing it uses the .regioninfo files for
 is the startKey and the endKey. The rest is the data that actually matches
 my alter.

 As of right now the count is proceeding passed the row I had a problem
 with BUT I'm not sure if the data is actually good. I'll need to scan a
 few rows.

 My question is this... Why does the table descriptor/meta table
 information not match the .regioninfo in each region sub dir? Is this a
 bad thing?

 Thanks!


 --
 View this message in context: 
 http://old.nabble.com/WrongRegionExceptionadd_table.rb-screwed-up-my-hbase.-tp28531026p28531887.html
 Sent from the HBase User mailing list archive at Nabble.com.




Re: java.lang.IllegalAccessError while opening region

2010-05-10 Thread Stack
This looks like https://issues.apache.org/jira/browse/HBASE-1925.  A
storefile in the problematic region is missing its sequenceid.  Try
and figure it (see the issue for clues).  There is also the hfile tool
for examing metainfo in storefiles:

./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile

Do the above for usage.

Move aside the bad storefile in hdfs (./bin/hadoop fs -mv SRC TGT).
Region should deploy then.

Thereafter, please update your hbase so you get the fix that makes it
so this doesn't happen in future.

Yours,
St.Ack

On Mon, May 10, 2010 at 1:35 PM, g00dn3ss g00dn...@gmail.com wrote:
 Hi All,

 I have reams of errors in my master log for a single region.  It keeps
 trying and failing to open the region. An example of the exception is below.
  Are there conceivable ways of fixing this or am I most likely going to have
 to delete the region.  If deleting the region is the likely option, are
 there any tools to help with this?  I saw the recent message from Stack
 describing how to delete a region. Does that mean I have to manually edit
 .META. from the hbase shell?  Do I also have to edit the .regioninfo files
 in the modified regions' directories with the new start/end keys?

 Thanks!
 (Exception follows)

 2010-05-10 13:15:48,114 INFO  master.ProcessRegionClose$1
 (ProcessRegionClose.java:call(85)) - region set as unassigned: regionid
 2010-05-10 13:15:48,123 INFO  master.RegionManager
 (RegionManager.java:doRegionAssignment(337)) - Assigning region regionid
 to regionserver whatnot
 2010-05-10 13:15:48,144 INFO  master.ServerManager
 (ServerManager.java:processMsgs(441)) - Processing MSG_REPORT_CLOSE:
 regionid: java.lang.IllegalAccessError: Has not been initialized
        at
 org.apache.hadoop.hbase.regionserver.StoreFile.getMaxSequenceId(StoreFile.java:216)
        at
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:417)
        at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1641)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:320)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1575)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1542)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1462)
        at java.lang.Thread.run(Unknown Source)



Re: locality and tasktracker vs split hostname

2010-05-10 Thread Stack
On Mon, May 10, 2010 at 6:26 PM, John Sichi jsi...@facebook.com wrote:
...
 (a) making the Hadoop locality code use a hostname comparison which is 
 insensitive to the presence of the trailing dot

 or

 (b) making the HBase split's hostname consistent with the task tracker

 Any opinions?


Lets do (b).  Users will see the fix sooner.  Want to file an issue John?

Good on you,
St.Ack


Re: Multiple get

2010-05-04 Thread Stack
Is this the new patch up in hbase-1845 that you are messing with
Slava?  If so, please add stacktrace to the issue so we can take a
look.
St.Ack

On Tue, May 4, 2010 at 5:01 AM, Slava Gorelik slava.gore...@gmail.com wrote:
 Hi.

 After I applied the patch and started to use multiple get,  the test
 application started to hangs on exit (DestroyJavaVM thread hangs) after I
 called Htable.batch() .
 Regular get operation is continue to work as expected.

 Best Regards.


 On Thu, Mar 4, 2010 at 9:24 PM, Slava Gorelik slava.gore...@gmail.comwrote:

 Thank You.
 At least I can apply the patch on 0.20.3.


 On Thu, Mar 4, 2010 at 7:18 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 On Thu, Mar 4, 2010 at 9:07 AM, Slava Gorelik slava.gore...@gmail.com
 wrote:
  Hi Erik,
  Thank You for the quick reply.
  Is this patch will be integrated into HBase 0.21.0 ?

 If it's ready before we release, then yes.

  If yes, there any estimated release date for the 0.21.0 ?

 It's complicated. Currently it would be at least after the Hadoop
 0.21.0 release (which also has no estimate since Y! isn't focusing on
 it).

 J-D

 
  Thank You.
 
 
  On Thu, Mar 4, 2010 at 5:14 PM, Erik Holstad erikhols...@gmail.com
 wrote:
 
  Hey Slava!
  There is work being done to be able to do this
 
 
 https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832114#action_12832114
  I think that there is also another Jira that s related to this topic,
 but I
  don't know what that number is.
 
  --
  Regards Erik
 
 






Re: Improving HBase scanner

2010-05-04 Thread Stack
Are you waiting too long between invocations of next?  (i.e.  the
scanner lease period?)  Or, perhaps you are fetching too much in the
one go.  If you fetch 1000 at a time -- scanner caching -- and you
don't get the next batch within the scanner lease period, again you
will timeout.

St.Ack

On Tue, May 4, 2010 at 1:46 AM, Michelan Arendse miche...@addynamo.com wrote:
 Hi

 I would like to know how configure HBase to improve the scanner fetching data 
 from the table or another method of using scanner, as my database is very 
 large and scanner times out.

 Kind Regards,

 Michelan Arendse
 Junior Developer | AD:DYNAMO // happy business ;-)
 Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587

 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com 
 http://www.addynamo.com




Re: HBase / Pig integration

2010-05-04 Thread Stack
On Tue, May 4, 2010 at 12:02 AM, Dmitriy Ryaboy dmit...@twitter.com wrote:

 We have an apache license file in the root of the project; I am not sure if
 we need to put it in every file. Will check with the lawyers.

Generally you put notice at head of each src file.  I was particularly
referring to this that we have in all hbase files:

 * Copyright 2010 The Apache Software Foundation

We got this practise from hadoop but looking there, they no longer
seem to do it (I need to talk to lawyers too -- smile).

 Regarding the first and last slice, the problem is that I have no way of
 knowing what the first and last, respectively, key values are. With the
 first slice I can maybe cache the first key I see, and use that in
 conjunction with the end of the region to calculate the size of the
 keyspace; but with that last region, the max is infinity, so I can't really
 estimate how much more I have left until I have none.. do regions store any
 metadata that countains a rough count of the number of records they hold?

Regions no.  StoreFiles yes.  They have number of entries but this is
not really available via API.  We should expose it or something like
it.  It could only be an estimate since delete and put both records
and a delete can remove cell, column or family.


I
 guess they only keep track of the byte size of the data, not the number of
 records per se.   Maybe I can get the total byte size of the region, and
 calculate offsets based on the size of the returned data? This would be
 likely wrong due to pushed down projections and filters, of course. Any
 other ideas? How do people normally handle this when writing regular MR jobs
 that scan HBase tables?


I think most tasks that go against hbase should 0% progress and then
100% when done.

We could expose a getLastRowInRegion or what if we added an estimated
row count to the Split (Maybe thats not the right place to expose this
info?  What is the canonical way?).

 I suspect this is actually a bit of a problem, btw -- since I don't report
 the amount of remaining work for these slices accurately, and I (hopefully)
 do a reasonable job for the ones where I can calculate the size of the
 keyspace, speculative execution may get overeager with these two slices.

Good point.

We should fix this.  Keeping a counter of how many rows in a region
wouldn't be hard.  It could be updated on compaction, etc.  A row
count would be good enough.


 PigCounterHelper just deals with some oddities of Hadoop counters (they may
 not be available when you first try to increment a counter -- the helper
 buffers increment requests until the reporter becomes available). Are HBase
 counters special things or also just Hadoop counters under the covers?

Check them out.  They are not  hadoop counters.  Keep up a count on
anything.  Might be of use given what you are doing.  Update it
thousands of times a second, etc.


 The lzo files are probably unrelated.. there shouldn't be anything
 LZO-specific in the HBase code. We are, in fact, lzo'ing hbase content in
 the sense that that's the compression we have for HDFS, and I think HBase is
 supposed to inherit that.

No.  You need to enable it on the column family.  See how COMPRESSION
can be NONE, GZ, or LZO.  LZO needs to be installed as per hadoop.
Search the hbase wiki home page for lzo.  The hbase+lzo page has had
some experience baked in to it so may be of some use to you.

St.Ack


Re: Improving HBase scanner

2010-05-04 Thread Stack
If long periods between next invocations, up the scanner lease.   See:

  property
namehbase.regionserver.lease.period/name
value6/value
descriptionHRegion server lease period in milliseconds. Default is
60 seconds. Clients must report in within this period else they are
considered dead./description
  /property

St.Ack


On Tue, May 4, 2010 at 7:04 AM, Michelan Arendse miche...@addynamo.com wrote:
 Yes I am waiting long periods between invocation of next. I didn't know that 
 I am fetching too much data at once.

 I am using HBase 0.20.3. This is my code:

 scan.setTimeRange(fromDate.getTime(), toDate.getTime());
 ResultScanner scanner = table.getScanner(scan);

 while( (result = scanner.next()) != null) {
 channelRow = getChannelDeliveryRow(Bytes.toString(result.getRow()));
      channelRowList.add(channelRow);
 }

 This is some of the output from the log file:
 2010-05-04 15:27:44,546 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
 Block cache LRU eviction started.  Attempting to free 62791520 bytes
 2010-05-04 15:27:44,552 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
 Block cache LRU eviction completed. Freed 62797944 bytes.  Priority Sizes: 
 Single=279.4997MB (293076672), Multi=224.35243MB (235250576),Memory=0.0MB (0)


 -Original Message-
 From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
 Sent: 04 May 2010 03:55 PM
 To: hbase-user@hadoop.apache.org
 Subject: Re: Improving HBase scanner

 Are you waiting too long between invocations of next?  (i.e.  the
 scanner lease period?)  Or, perhaps you are fetching too much in the
 one go.  If you fetch 1000 at a time -- scanner caching -- and you
 don't get the next batch within the scanner lease period, again you
 will timeout.

 St.Ack

 On Tue, May 4, 2010 at 1:46 AM, Michelan Arendse miche...@addynamo.com 
 wrote:
 Hi

 I would like to know how configure HBase to improve the scanner fetching 
 data from the table or another method of using scanner, as my database is 
 very large and scanner times out.

 Kind Regards,

 Michelan Arendse
 Junior Developer | AD:DYNAMO // happy business ;-)
 Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587

 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com 
 http://www.addynamo.com





Re: HBase / Pig integration

2010-05-04 Thread Stack
On Tue, May 4, 2010 at 7:34 AM, Stack st...@duboce.net wrote:
 On Tue, May 4, 2010 at 12:02 AM, Dmitriy Ryaboy dmit...@twitter.com wrote:

 We should fix this.  Keeping a counter of how many rows in a region
 wouldn't be hard.  It could be updated on compaction, etc.  A row
 count would be good enough.


Post-Coffee

On second thoughts, keeping a counter wouldn't be that easy
particularly if multiple column families.

I wonder what happens if you do a getClosestRowBefore on the last
region passing in the  key.  Will it return the last row in the
region? (I'll try it later if you don't get to it first).

St.Ack


Re: Multiple get

2010-05-04 Thread Stack
I'd suggest sticking full stack trace into the issue so Marc can take
a look (it looks like pool-1-thread-1 is not daemon thread).
Thanks,
St.Ack

On Tue, May 4, 2010 at 7:42 AM, Slava Gorelik slava.gore...@gmail.com wrote:
 Yes, I'm using the patch from 1845.
 At the when Java program is hanged up there no more stack trace.
 It's hanging up after Java build in system exit() method.

 I see only this:

 EmployeeSample [Java Application]:
 Sample.EmployeeSample at localhost:2916 :
 Daemon Thread [main-EventThread] (Running)
 Daemon Thread [main-SendThread] (Running)
 Thread [pool-1-thread-1] (Running)
 Thread [DestroyJavaVM] (Running)

 Best Regards.

 On Tue, May 4, 2010 at 4:52 PM, Stack st...@duboce.net wrote:

 Is this the new patch up in hbase-1845 that you are messing with
 Slava?  If so, please add stacktrace to the issue so we can take a
 look.
 St.Ack

 On Tue, May 4, 2010 at 5:01 AM, Slava Gorelik slava.gore...@gmail.com
 wrote:
  Hi.
 
  After I applied the patch and started to use multiple get,  the test
  application started to hangs on exit (DestroyJavaVM thread hangs) after I
  called Htable.batch() .
  Regular get operation is continue to work as expected.
 
  Best Regards.
 
 
  On Thu, Mar 4, 2010 at 9:24 PM, Slava Gorelik slava.gore...@gmail.com
 wrote:
 
  Thank You.
  At least I can apply the patch on 0.20.3.
 
 
  On Thu, Mar 4, 2010 at 7:18 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  On Thu, Mar 4, 2010 at 9:07 AM, Slava Gorelik slava.gore...@gmail.com
 
  wrote:
   Hi Erik,
   Thank You for the quick reply.
   Is this patch will be integrated into HBase 0.21.0 ?
 
  If it's ready before we release, then yes.
 
   If yes, there any estimated release date for the 0.21.0 ?
 
  It's complicated. Currently it would be at least after the Hadoop
  0.21.0 release (which also has no estimate since Y! isn't focusing on
  it).
 
  J-D
 
  
   Thank You.
  
  
   On Thu, Mar 4, 2010 at 5:14 PM, Erik Holstad erikhols...@gmail.com
  wrote:
  
   Hey Slava!
   There is work being done to be able to do this
  
  
 
 https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832114#action_12832114
   I think that there is also another Jira that s related to this
 topic,
  but I
   don't know what that number is.
  
   --
   Regards Erik
  
  
 
 
 
 




Re: HBase / Pig integration

2010-05-03 Thread Stack
Hey Dmitry:

I took a quick look.

Your files are missing a copyright?

I like your using of BinaryComparatory and the lte, gte, options in
skipRegion setting up filters.

Regards:

// No way to know max.. just return 0. Sorry, reporting on the
last slice is janky.
// So is reporting on the first slice, by the way -- it will start
out too high, possibly at 100%.
if (endRow_.length==0) return 0;



...if your keys are kinda regular, you might be able to do better in a
slice.  See in Bytes where there are methods that do BigDecimal math.
You can ask them to divide the slice.  Might work.  Then you could do
progress (Looks like you are doing some later in the file -- does it
work?).

Try to use the same version ofHBaseConfiguration conf = new
HBaseConfiguration(); throughout rather than create a new one each
time.  Can be more costly.

Whats this?

if (counterHelper_ == null) counterHelper_ = new PigCounterHelper();

A pig counter? You don't want to use hbase counters?

Whats the lzo stuff about?  It seems to be for loading files.  Are you
lzo'ing your hbase content?

Oh man ... base64'ing

There are two files w/ mention of hbase, is that right?

St.Ack



On Mon, May 3, 2010 at 12:23 PM, Dmitriy Ryaboy dmit...@twitter.com wrote:
 Hi folks,
 I recently rewrote the Pig HBase loader to work with binary data, push down
 filters, and do other things that make it more versatile.
 If you use, or plan to use, both Pig and HBase, please try it out, take a
 look at the code, let me know what you think.  I am just starting to learn
 about HBase, so I am especially interested to learn if there are HBase
 capabilities I am not using and should be.

 The code is part of our ElephantBird project, here:

 http://github.com/kevinweil/elephant-bird/
 and more specifically:
 http://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load/

 Thanks,
 -Dmitriy



Re: EC2 + Thrift inserts

2010-05-01 Thread Stack
? What is the bottleneck 
 there?
 CPU utilization and network packets went up when I disabled the 
 indexes, I
 don't think those are the bottlenecks for the indexes. I was even 
 able to
 add another 15 insert process (total of 40) and only lost about 10% 
 on a per
 process throughput. I probably could go even higher, none of the 
 nodes are
 above CPU 60% utilization and IO wait was at most 3.5%.

 Each rowkey is unique, so there should not be any blocking on the row
 locks. I'll do more indexed tests tomorrow.

 thanks,
 -chris







 On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote:

 Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 
 17 and
 you
 should be good to go. _18 is a botched release if I ever saw one.

 -Todd

 On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas c...@email.com 
 wrote:

 Hi Stack,

 Thanks for looking. I checked the ganglia charts, no server was at 
 more
 than ~20% CPU utilization at any time during the load test and 
 swap was
 never used. Network traffic was light - just running a count 
 through
 hbase
 shell generates a much higher use. One the server hosting meta
 specifically,
 it was at about 15-20% CPU, and IO wait never went above 3%, was
 usually
 down at near 0.

 The load also died with a thrift timeout on every single node (each
 node
 connecting to localhost for its thrift server), it looks like a
 datanode
 just died and caused every thrift connection to timeout - I'll 
 have to
 up
 that limit to handle a node death.

 Checking logs this appears in the logs of the region server hosting
 meta,
 looks like the dead datanode causing this error:

 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient:
 DFSOutputStream ResponseProcessor exception  for block
 blk_508630839844593817_11180java.io.IOException: Bad response 1 for
 block
 blk_508630839844593817_11180 from datanode 10.195.150.255:50010
      at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423)

 The regionserver log on teh dead node, 10.195.150.255 has some more
 errors
 in it:

 http://pastebin.com/EFH9jz0w

 I found this in the .out file on the datanode:

 # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
 linux-amd64 )
 # Problematic frame:
 # V  [libjvm.so+0x62263c]
 #
 # An error report file with more information is saved as:
 # /usr/local/hadoop-0.20.1/hs_err_pid1364.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #


 There is not a single error in the datanode's log though. Also of 
 note
 -
 this happened well into the test, so the node dying cause the load 
 to
 abort
 but not the prior poor performance. Looking through the mailing 
 list it
 looks like java 1.6.0_18 has a bad rep so I'll update the AMI 
 (although
 I'm
 using the same JVM on other servers in the office w/o issue and 
 decent
 single node performance and never dying...).

 Thanks for any help!
 -chris




 On Apr 28, 2010, at 10:10 PM, Stack wrote:

 What is load on the server hosting meta like?  Higher than others?



 On Apr 28, 2010, at 8:42 PM, Chris Tarnas c...@email.com wrote:

 Hi JG,

 Speed is now down to 18 rows/sec/table per process.

 Here is a regionserver log that is serving two of the regions:

 http://pastebin.com/Hx5se0hz

 Here is the GC Log from the same server:

 http://pastebin.com/ChrRvxCx

 Here is the master log:

 http://pastebin.com/L1Kn66qU

 The thrift server logs have nothing in them in the same time 
 period.

 Thanks in advance!

 -chris

 On Apr 28, 2010, at 7:32 PM, Jonathan Gray wrote:

 Hey Chris,

 That's a really significant slowdown.  I can't think of anything

 obvious that would cause that in your setup.

 Any chance of some regionserver and master logs from the time 
 it was

 going slow?  Is there any activity in the logs of the regionservers
 hosting
 the regions of the table being written to?

 JG

 -Original Message-
 From: Christopher Tarnas [mailto:c...@tarnas.org] On Behalf Of 
 Chris
 Tarnas
 Sent: Wednesday, April 28, 2010 6:27 PM
 To: hbase-user@hadoop.apache.org
 Subject: EC2 + Thrift inserts

 Hello all,

 First, thanks to all the HBase developers for producing this, 
 it's
 a
 great project and I'm glad to be able to use it.

 I'm looking for some help and hints here with insert 
 performance
 help.
 I'm doing some benchmarking, testing how I can scale up using
 HBase,
 not really looking at raw speed. The testing is happening on 
 EC2,

 using

 Andrew's scripts (thanks - those were very helpful) to set 
 them up
 and
 with a slightly customized version of the default AMIs (added 
 my
 application modules). I'm using HBase 20.3 and Hadoop 20.1. 
 I've

 looked

 at the tips in the Wiki and it looks like Andrew's scripts are
 already
 setup that way.

 I'm inserting into HBase from a hadoop streaming job that runs 
 perl

 and

 uses the thrift gateway. I'm also using the Transactional 
 tables so
 that alone could

Re: hbase with multiple interfaces

2010-04-29 Thread Stack
On Thu, Apr 29, 2010 at 4:39 AM, Michael Segel
michael_se...@hotmail.com wrote:
 The problem with Hadoop and of course HBase is that they determine their own 
 IP network based on the machine's actual name, so that even if you have 
 multiple interfaces, the nodes will choose the interface that matches the 
 machine name. (IMHO this is a defect that should be fixed.)


I made HBASE-2502 to cover this issue.
Thanks Michael,
St.Ack


Re: Unique row ID constraint

2010-04-28 Thread Stack
Would the incrementValue [1] work for this?
St.Ack

1. 
http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29

On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano
tatsuy...@snowcocoa.info wrote:
 Hi,

 I'd like to implement unique row ID constraint (like the primary key
 constraint in RDBMS) in my application framework.

 Here is a code fragment from my current implementation (HBase
 0.20.4rc) written in Scala. It works as expected, but is there any
 better (shorter) way to do this like checkAndPut()?  I'd like to pass
 a single Put object to my function (method) rather than passing rowId,
 family, qualifier and value separately. I can't do this now because I
 have to give the rowLock object when I instantiate the Put.

 ===
 def insert(table: HTable, rowId: Array[Byte], family: Array[Byte],
                               qualifier: Array[Byte], value:
 Array[Byte]): Unit = {

    val get = new Get(rowId)

    val lock = table.lockRow(rowId) // will expire in one minute
    try {
      if (table.exists(get)) {
        throw new DuplicateRowException(Tried to insert a duplicate row: 
                + Bytes.toString(rowId))

      } else {
        val put = new Put(rowId, lock)
        put.add(family, qualifier, value)

        table.put(put)
      }

    } finally {
      table.unlockRow(lock)
    }

 }
 ===

 Thanks,

 --
 河野 達也
 Tatsuya Kawano (Mr.)
 Tokyo, Japan

 twitter: http://twitter.com/tatsuya6502



Re: org.apache.hadoop.hbase.UnknownScannerException: Name: -1

2010-04-28 Thread Stack
Does it work if not transactional in the mix?  That might help narrow
down what is going on here?
St.Ack

On Tue, Apr 27, 2010 at 6:51 AM, Slava Gorelik slava.gore...@gmail.com wrote:
 Hi to All.
 Tried to investigate a bit this problem in the debugger.
 It looks like the failure is in connecting region server to open scanner.
 And it seems that there problem to connect region server to open scanner (in
 any other cases region server is available
 for example for put/get operations) , after 10 tries the scanner ID is still
 -1 and it passed to the region server  (somehow
 in this case connection with region server is succeeded) and server throw an
 exception about wrong scanner name.

 I have only one region server that is on the same machine where master is
 located (single node installation - not a pseudo,
 including zookeeper) and also I'm using transactional table (from contrib).

 Any idea what could be a problem ?

 Best Regards.



 On Thu, Mar 18, 2010 at 5:54 PM, Slava Gorelik slava.gore...@gmail.comwrote:

 Hi.
 I also don't have any solution yet.

 Best Regards.



 On Thu, Mar 18, 2010 at 8:29 AM, Alex Baranov 
 alex.barano...@gmail.comwrote:

 I have a similar problem, but even with standard filter, when I use it on
 the remote client (

 http://old.nabble.com/Adding-filter-to-scan-at-remote-client-causes-UnknownScannerException-td27934345.html
 ).

 Haven't solved yet.

 Alex Baranau

 On Tue, Mar 16, 2010 at 8:12 PM, Slava Gorelik slava.gore...@gmail.com
 wrote:

  Hi Dave.
  Thank You for your reply, but all .out files (master and region server)
 are
  empty from any exception.
 
  Best Regards.
 
  On Tue, Mar 16, 2010 at 7:45 PM, Dave Latham lat...@davelink.net
 wrote:
 
   Is there anything informative in the .out file?  I remember one time I
  had
   an error in a filter's static initializer that caused the class to
 fail
  to
   load, and it manifested as an uncaught NoClassDefFoundError (
   https://issues.apache.org/jira/browse/HBASE-1913 ) showing up there
   instead
   of the .log file.
  
   Dave
  
   On Tue, Mar 16, 2010 at 9:52 AM, Slava Gorelik 
 slava.gore...@gmail.com
   wrote:
  
Hi.
Sure i restarted both sides.
The log has ony one exception that I specified - Name: -1.
Scanner on .META and .ROOT are works fine (I put break points on
 call()
method that
actually calls openScanner() and till my scanner it works fine).
   
Best Regards.
   
On Tue, Mar 16, 2010 at 5:39 PM, Stack st...@duboce.net wrote:
   
 On Tue, Mar 16, 2010 at 1:42 AM, Slava Gorelik 
   slava.gore...@gmail.com
 wrote:
  Hi.
  I added my filters to the HbaseObjectWritable but the problem is
  not
 solved.
 

 And for sure you restarted both sides of the connection and both
  sides
 are same compiled code?

 If so, next up will be seeing whats in the log over on the server.
 Mismatched interfaces are ugly to debug.  The messages that come
 out
 don't tell much about whats actually wrong.  If you remove your
 code,
 all works fine?  So its just the addition of your filters that is
 the
 prob?

 St.Ack

   
  
 






Re: multiple scanners on same table/region

2010-04-27 Thread Stack
Are you closing Scanners?  If not, they are occupying slots until they time out.
St.Ack

On Thu, Apr 22, 2010 at 8:10 PM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
 hi,
          sorry I start another thread here.  This mail is actually answer
 to a previous thread multiple scanners on same table will cause problem?
 Scan results change among different tries..
          the mail system kept saying that I am spamming, now it seems that
 it's right! :)

 here is my reply to people in that thread:

      I don't know if there is a limit on reads to a single row/region in
 HBase, but if there is, I might have exceeded that limit.   :(
      in my case, there are hundreds of rows, with dozens of kilos of cells
 in a row(a 256 MB region may contain 10- rows). for each row, I started a
 thread on each CF, there are 8 of them, so there might be dozens of scanners
 on the same region.
      and, to Tim, I could not see your attached mail, my test code is
 pasted below, it just iterate on the rows and column families, output all
 the cells.

  private void doScan() throws Exception {
 if (null == CopyOfTestTTT234.table) {
 return;
 }
 Scan s = new Scan();
 s.setStartRow(aaa.getBytes());
 s.setStopRow(ccc.getBytes());
 s.setCaching(CopyOfTestTTT234.ROWCACHING);  //it's 1 here.
 ResultScanner scanner = CopyOfTestTTT234.table.getScanner(s);
 while (true) {
 Result row = scanner.next();
 if(null==row) break;
 String rowKey = new String(row.getRow());
 NavigableMapbyte[], NavigableMapbyte[], byte[] fm = row
 .getNoVersionMap();
 while (fm.size()  0) {
 Entrybyte[], NavigableMapbyte[], byte[] ee = fm
 .pollFirstEntry();
 String fName = new String(ee.getKey());
 NavigableMapbyte[], byte[] ff = ee.getValue();
 while (ff.size()  0) {
 Entrybyte[], byte[] cell = ff.pollFirstEntry();
 String key = new String(cell.getKey());
 String val = new String(cell.getValue());
 System.out.println(Thread.currentThread().hashCode() + \t
 + rowKey + \t + fName + \t + key + \t + val);
 }
 }
 }
 }



Re: multiple scanners on same table/region

2010-04-27 Thread Stack
You may be running into HBASE-2481?
St.Ack

On Tue, Apr 27, 2010 at 1:30 AM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
 the first thread can be found at:
 http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/10074

           After some dig, it seems that the problem is caused by long pause
 between two scanner.next() call.

           In my case the program has to spent a relatively long while to
 process one row, when it calls scanner.next() again, seems that the
 returned Result will be null even if there should be more rows in the
 tables.  The rowcaching is set to 1.
            I have checked some of the source code, seems there is some
 mechanism which will call the close() method of the ClientScanner, but I am
 still checking.
           I don't know if there is a certain timeout on
 ClientScanner/ScannerCallable after a row has been successfully returned,
 seems that timeout cause  my problem here.

          Any reply is appreciated.


 On Fri, Apr 23, 2010 at 11:10 AM, steven zhuang 
 steven.zhuang.1...@gmail.com wrote:

 hi,
           sorry I start another thread here.  This mail is actually answer
 to a previous thread multiple scanners on same table will cause problem?
 Scan results change among different tries..
           the mail system kept saying that I am spamming, now it seems that
 it's right! :)

 here is my reply to people in that thread:

       I don't know if there is a limit on reads to a single row/region in
 HBase, but if there is, I might have exceeded that limit.   :(
       in my case, there are hundreds of rows, with dozens of kilos of cells
 in a row(a 256 MB region may contain 10- rows). for each row, I started a
 thread on each CF, there are 8 of them, so there might be dozens of scanners
 on the same region.
       and, to Tim, I could not see your attached mail, my test code is
 pasted below, it just iterate on the rows and column families, output all
 the cells.

   private void doScan() throws Exception {
  if (null == CopyOfTestTTT234.table) {
 return;
 }
 Scan s = new Scan();
  s.setStartRow(aaa.getBytes());
 s.setStopRow(ccc.getBytes());
 s.setCaching(CopyOfTestTTT234.ROWCACHING);  //it's 1 here.
  ResultScanner scanner = CopyOfTestTTT234.table.getScanner(s);
 while (true) {
 Result row = scanner.next();
  if(null==row) break;
 String rowKey = new String(row.getRow());
 NavigableMapbyte[], NavigableMapbyte[], byte[] fm = row
  .getNoVersionMap();
 while (fm.size()  0) {
 Entrybyte[], NavigableMapbyte[], byte[] ee = fm
  .pollFirstEntry();
 String fName = new String(ee.getKey());
 NavigableMapbyte[], byte[] ff = ee.getValue();
  while (ff.size()  0) {
 Entrybyte[], byte[] cell = ff.pollFirstEntry();
 String key = new String(cell.getKey());
  String val = new String(cell.getValue());
 System.out.println(Thread.currentThread().hashCode() + \t
  + rowKey + \t + fName + \t + key + \t + val);
 }
 }
  }
 }




Re: optimizing for random access

2010-04-26 Thread Stack
On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey ghend...@decarta.com wrote:
 My thought with memory mapping was, as you noted, *not* to try to map files 
 that are inside of HDFS but rather to copy as many blocks as possible out of 
 HDFS, onto region server filesystems, and memory map the file on the region 
 server. TB drives are now common. The virtual memory system of the Operating 
 System manages paging in and out of real memory off disk when you use 
 memory mapping. My experience with memory mapped ByteBuffer in Java is that 
 it is very fast and scalable. By fast, I mean I have clocked reads in the 
 microseconds using nanotime. So I was just wondering why you wouldn't at 
 least make a 2nd level cache with memory mapping.



Are memory-mapped files scalable in java?  I'm curious.  Its a while
since I played with them (circa java 1.5) but then they did not scale.
I was only able to open a few files concurrently before I started
running into interesting issue.   In hbase I'd need to be able to
keep hundreds or even thousands open concurrently.

I've thought about doing something like you propose Geoff -- keeping
some subset of storefiles locally (we could even write two places when
compacting say, local and out to hdfs) -- but it always devolved
quickly into a complicated mess keeping up the local copy with the
remote set making sure the local didn't overflow local storage and
that local files were aged out on compactions and splits.  If you have
suggestion on how it'd work, I'd all ears.

Thanks,
St.Ack


Slides for HUG10 are up at http://www.meetup.com/hbaseusergroup/calendar/12689490/

2010-04-21 Thread Stack
... for those interested in the talks at Mondays HUG10.  Checkout
Andrew's talk on TrendMicro Architecture, Coprocessors in HBase as
well as John Sichi's talk on Hive+HBase integration and Mahadev on
Zookeeper+HBase.

St.Ack


Re: extremely sluggish hbase

2010-04-20 Thread Stack
On Tue, Apr 20, 2010 at 10:29 AM, Geoff Hendrey ghend...@decarta.com wrote:
 Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}!

Is MR job running concurrently?

Whats happening on your servers?  High load?

I see
 this error occur frequently in the region server  logs. Any ideas on
 what this might be

 2010-04-20 04:19:41,401 INFO org.apache.hadoop.ipc.HBaseServer: IPC
 Server handler 2 on 60020, call next(-750587486574522252) from
 10.241.6.80:51850: error:
 org.apache.hadoop.hbase.UnknownScannerException: Name:
 -750587486574522252

 I also see this in the regions server logs:

 2010-04-20 04:21:44,559 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
 5849633296569445699 lease expired
 2010-04-20 04:21:44,560 INFO org.apache.hadoop.hdfs.DFSClient: Could not
 obtain block blk_1799401938583830364_69702 from any node:
 java.io.IOException: No live nodes contain current block



So, this is usually because the client took long between 'next'
invocations on the scanner or the server is under such load its
holding on to the 'next' call for so long that the next time 'next' is
called, the scanner lease has expired.


 However hadoop dfsadmin -report doesn't show any HDFS issues. Looks
 totally healthy. When I do status from HBase shell I get
 hbase(main):008:0 status
 2 servers, 0 dead, 484. average load which also seems healthy to
 me.


Your servers are carrying 500 regions each.

 Any suggestions?


Look at top.  Look for loading.  Are you swapping?  Look in hbase
logs.  Whats it say its doing?  Fat GC pauses?

St.Ack


Re: extremely sluggish hbase

2010-04-20 Thread Stack
If you scan '.META.' table is it slow also?  You could have a case of
hbase-2451?  There is a script in the patch to that issue.  Try it.
See if that helps.
St.Ack

On Tue, Apr 20, 2010 at 12:02 PM, Geoff Hendrey ghend...@decarta.com wrote:
 Answers below, prefixed by geoff:

 -Original Message-
 From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
 Sent: Tuesday, April 20, 2010 11:23 AM
 To: hbase-user@hadoop.apache.org
 Subject: Re: extremely sluggish hbase

 On Tue, Apr 20, 2010 at 10:29 AM, Geoff Hendrey ghend...@decarta.com wrote:
 Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}!

 Is MR job running concurrently?
 Geoff: no

 Whats happening on your servers?  High load?
 Geoff: no, 99% idle on both servers

 I see
 this error occur frequently in the region server  logs. Any ideas on
 what this might be

 2010-04-20 04:19:41,401 INFO org.apache.hadoop.ipc.HBaseServer: IPC
 Server handler 2 on 60020, call next(-750587486574522252) from
 10.241.6.80:51850: error:
 org.apache.hadoop.hbase.UnknownScannerException: Name:
 -750587486574522252

 I also see this in the regions server logs:

 2010-04-20 04:21:44,559 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
 5849633296569445699 lease expired
 2010-04-20 04:21:44,560 INFO org.apache.hadoop.hdfs.DFSClient: Could
 not obtain block blk_1799401938583830364_69702 from any node:
 java.io.IOException: No live nodes contain current block



 So, this is usually because the client took long between 'next'
 invocations on the scanner or the server is under such load its holding on to 
 the 'next' call for so long that the next time 'next' is called, the scanner 
 lease has expired.


 However hadoop dfsadmin -report doesn't show any HDFS issues. Looks
 totally healthy. When I do status from HBase shell I get
 hbase(main):008:0 status
 2 servers, 0 dead, 484. average load which also seems healthy to
 me.


 Your servers are carrying 500 regions each.
 Geoff: Is this high, moderate, or low for a typical installation?

 Any suggestions?


 Look at top.  Look for loading.  Are you swapping?
 Geoff: I will look into the swapping and see if I can get some numbers.

 Look in hbase logs.  Whats it say its doing?  Fat GC pauses?
 Geoff: I monitor all the logs and I don't see any GC pauses. I am running 64 
 bit java with 8GB of heap. I'll look into GC further and see if I can get 
 some concrete data.

 St.Ack



Re: extremely sluggish hbase

2010-04-20 Thread Stack
Below it says blockcache is true for the 'info' family on .META.

What happens if you scan '-ROOT-', does it still stay info family has
blockcache false?

Best way to get the script is to either update your hbase to 0.20.4 or
just apply said patch.  The script will then be in your bin directory.

St.Ack

On Tue, Apr 20, 2010 at 1:24 PM, Geoff Hendrey ghend...@decarta.com wrote:
 Does look like the .META. BLOCKCACHE is false. What's the best way to get a 
 patch for https://issues.apache.org/jira/browse/HBASE-2451

 hbase(main):001:0 describe .META.
 DESCRIPTION                                                             
 ENABLED
  {NAME = '.META.', IS_META = 'true', MEMSTORE_FLUSHSIZE = '16384', F true
  AMILIES = [{NAME = 'historian', COMPRESSION = 'NONE', VERSIONS = '
  2147483647', TTL = '604800', BLOCKSIZE = '8192', IN_MEMORY = 'false
  ', BLOCKCACHE = 'false'}, {NAME = 'info', COMPRESSION = 'NONE', VER
  SIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY =
  'false', BLOCKCACHE = 'true'}]}

 -Original Message-
 From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
 Sent: Tuesday, April 20, 2010 12:45 PM
 To: hbase-user@hadoop.apache.org
 Subject: Re: extremely sluggish hbase

 If you scan '.META.' table is it slow also?  You could have a case of 
 hbase-2451?  There is a script in the patch to that issue.  Try it.
 See if that helps.
 St.Ack

 On Tue, Apr 20, 2010 at 12:02 PM, Geoff Hendrey ghend...@decarta.com wrote:
 Answers below, prefixed by geoff:

 -Original Message-
 From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of
 Stack
 Sent: Tuesday, April 20, 2010 11:23 AM
 To: hbase-user@hadoop.apache.org
 Subject: Re: extremely sluggish hbase

 On Tue, Apr 20, 2010 at 10:29 AM, Geoff Hendrey ghend...@decarta.com wrote:
 Hbase shell is taking 63 seconds to scan a table with {LIMIT=1}!

 Is MR job running concurrently?
 Geoff: no

 Whats happening on your servers?  High load?
 Geoff: no, 99% idle on both servers

 I see
 this error occur frequently in the region server  logs. Any ideas on
 what this might be

 2010-04-20 04:19:41,401 INFO org.apache.hadoop.ipc.HBaseServer: IPC
 Server handler 2 on 60020, call next(-750587486574522252) from
 10.241.6.80:51850: error:
 org.apache.hadoop.hbase.UnknownScannerException: Name:
 -750587486574522252

 I also see this in the regions server logs:

 2010-04-20 04:21:44,559 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
 5849633296569445699 lease expired
 2010-04-20 04:21:44,560 INFO org.apache.hadoop.hdfs.DFSClient: Could
 not obtain block blk_1799401938583830364_69702 from any node:
 java.io.IOException: No live nodes contain current block



 So, this is usually because the client took long between 'next'
 invocations on the scanner or the server is under such load its holding on 
 to the 'next' call for so long that the next time 'next' is called, the 
 scanner lease has expired.


 However hadoop dfsadmin -report doesn't show any HDFS issues. Looks
 totally healthy. When I do status from HBase shell I get
 hbase(main):008:0 status
 2 servers, 0 dead, 484. average load which also seems healthy to
 me.


 Your servers are carrying 500 regions each.
 Geoff: Is this high, moderate, or low for a typical installation?

 Any suggestions?


 Look at top.  Look for loading.  Are you swapping?
 Geoff: I will look into the swapping and see if I can get some numbers.

 Look in hbase logs.  Whats it say its doing?  Fat GC pauses?
 Geoff: I monitor all the logs and I don't see any GC pauses. I am running 64 
 bit java with 8GB of heap. I'll look into GC further and see if I can get 
 some concrete data.

 St.Ack




Re: Zookeeper watcher error: java.lang.NoClassDefFoundError: org/apache/zookeeper/Watcher

2010-04-17 Thread Stack
)
    
 org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:159)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
 root cause

 java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
    
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387)
    
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
    java.lang.ClassLoader.loadClassInternal(ClassLoader.java:398)
    java.lang.ClassLoader.defineClass1(Native Method)
    java.lang.ClassLoader.defineClass(ClassLoader.java:698)
    java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
    
 org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1847)
    
 org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:890)
    
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1354)
    
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
    java.lang.ClassLoader.loadClassInternal(ClassLoader.java:398)
    
 org.apache.hadoop.hbase.client.HConnectionManager.getClientZooKeeperWatcher(HConnectionManager.java:170)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getZooKeeperWrapper(HConnectionManager.java:932)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:948)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:675)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:675)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:638)
    
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
    org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
    org.apache.hadoop.hbase.client.HTable.init(HTable.java:106)
    com.nokia.dataos.api.StoreImpl.get(StoreImpl.java:57)
    com.nokia.dataos.api.NodeResource.node(NodeResource.java:25)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    java.lang.reflect.Method.invoke(Method.java:597)
    
 org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:124)
    
 org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:247)
    org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:212)
    org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:202)
    
 org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:441)
    
 org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:418)
    
 org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:111)
    
 org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:217)
    
 org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:159)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
 note The full stack trace of the root cause is available in the Apache 
 Tomcat/6.0.18 logs.




Re: Hackathon agenda

2010-04-17 Thread Stack
On Sat, Apr 17, 2010 at 11:54 AM, Jonathan Gray jg...@facebook.com wrote:
 Agreed that it's good to try to be agenda-less, but in the past we've always 
 taken the first couple hours to do a group discussion around some of the key 
 topics.  Given there's a bunch of fairly major changes/testing going on these 
 days, I think there is a good bit of stuff that would benefit from group 
 discussion.  After that, we can break up into smaller groups or individually 
 to start hacking away.  Or for those not interested in the topics, you can 
 just hack from the start.


The above sounds good, discussion of a few near future concerns.  I
think it important we not let it go on too long.

All of the below look good.  I've added a few comments.  (Anyone else
have suggestions on what should be discussed?)

 More potential topics of discussion I had in mind:

 - Compaction, split, and flush policies/heuristics (HBASE-2453, HBASE-2462, 
 HBASE-2457, HBASE-2375, HBASE-1892, etc...)
 - Define our desired behaviors related to versioning, deletes, and removal of 
 deletes in minor/major compactions. (HBASE-2453, HBASE-2457, HBASE-2243, etc)

The new HBase ACID spec should get at least a passing airing (Its
'done'.  All releases post 0.20.4 are required to adhere to it).

 - Brainstorm on doing better distributed scenario testing (HBASE-2414)

We also badly need to work on breaking down current daemons so their
functions can be made more standalone and thus testable; e.g. load
balancer in master, compacting code in regionservers, etc.

 - Brainstorm on performance improvement ideas (top HDFS issues, better use of 
 HFile seeking, blooms, block pre-fetch, etc...)  Would be cool to have a wiki 
 page w/ a list of these things.

A wiki page of items to be discussed?


 - Brainstorm on new functionality / updated road map.  What priorities do the 
 various sponsoring companies have, what are nice to haves but not on anyones 
 schedule yet, etc.  Again, this can seed a new (or updated) wiki page and/or 
 update the currently outdated road map wiki page.

I moved aside the current 0.21 roadmap moving it to a page of its own
rather than have it as a section on the roadmap page
(http://wiki.apache.org/hadoop/HBase/Original021Roadmap#preview), and
added comments on the state of roadmap items listed therein.  A
roadmap discussion could start with what is listed in the old roadmap,
I'd suggest.  I'd suggest that roadmap discussion should not go into
deep depth because it has a tendency to consume mostly because it
turns into a blue skying session and besides, 0.21 is fairly imminent
and there isn't that much we could get into this release anyways.

 - HBase PR.  We could use a new web site (maven and otherwise), a centralized 
 blog, and also a refresh/cleanup of documentation.  There's also agreement on 
 shipping w/ a few different configurations, which should be part of a new set 
 of getting started / new user docs.  Would like to get everyones thoughts and 
 also come up with a schedule.


On monday our petition to become an apache top level project goes
before the apache board.  If it passes, rolling out a site revamp
might be timed to match our move to TLP.


 - Ideas for future HUGs

 For anyone that will not be able to attend the hackathon we will post a 
 wrap-up afterwards with notes about all the discussions we had.  Whatever 
 comes out of the hackathon should be posted into the proper jiras or mailing 
 list for full community discussion.

 Also, if anyone was not able to sign up for the HUG or Hackathon (both are 
 full now) and is a regular contributor, please contact me directly.

 Very awesome.  Gonna be a great day of HBase!

Agreed.

St.Ack


 JG

 
 From: Andrew Purtell [apurt...@apache.org]
 Sent: Saturday, April 17, 2010 10:28 AM
 To: hbase-...@hadoop.apache.org
 Cc: hbase-user@hadoop.apache.org
 Subject: Hackathon agenda

 The Hackathon is basically agenda-less, but I'd like to propose a general 
 topic of discussion we should cover while we are all in the room together:

 - For HBASE-1964 (HBASE-2183, HBASE-2461, and related): injecting and/or 
 mocking exceptions thrown up from DFSClient. I think we want a toolkit for 
 that. Could be incorporated into the unit testing framework. Should be 
 possible to swap out a jar or something and make it active running on a real 
 cluster with real load. Should be possible to inject random exceptions with 
 adjustable probability. So what does HDFS have already? What do we need? If 
 we're adding something, does it make sense to put it into HBase or contribute 
 to HDFS? I think the latter.

 Let's gather a list of other topics, if any, that hackathon participants want 
 to see covered so we can make sure it will happen.

   - Andy







Re: hundreds of reads of a metafile

2010-04-17 Thread Stack
Raghu:

Sounds like you have a case of
https://issues.apache.org/jira/browse/HBASE-2451.  There is a script
in the patch that will turn on .META. caching.  Its likely off.

Yours,
St.Ack

On Tue, Apr 13, 2010 at 12:19 PM, Raghu Angadi rang...@apache.org wrote:
 I think this happens all through the job... I need to double check. This job
 was adding around 300k rows.

 will get more info on latest runs.

 Raghu.

 On Mon, Apr 12, 2010 at 8:33 PM, Stack st...@duboce.net wrote:

 On Mon, Apr 12, 2010 at 5:04 PM, Stack st...@duboce.net wrote:
  On Mon, Apr 12, 2010 at 4:00 PM, Raghu Angadi rang...@apache.org
 wrote:
  Sorry for the delay. This is 3.5K reads per second. This goes on for may
 be
  minutes.
 
 Or is this on startup of a big job against a big table?
 St.Ack




Re: Additional Information about HBase internals?

2010-04-16 Thread Stack
You've seen the wiki at hbase.org?  There is a dated architecture
document there but beside is a link to recent docs by our Lars George
that are detailed with lots of nice pictures.

If the above is insufficient, please let us know what you need.
St.Ack

On Fri, Apr 16, 2010 at 6:36 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
 Hi everyone, I would like to know if there is any additional information
 about HBase internals besides the Google paper and the code.
 Thanks in advance.

 Renato M.



Re: roadmap update?

2010-04-14 Thread Stack
Hey Thomas:

I think you should do the 0.20.x.  There'll very likely be a 0.20.5
within the timeframe you are talking of.

0.21 hadoop is getting some loving these times.  It could be out by
then but I can't say for sure.

On the roadmap, yes, needs updating.  Next monday at the hackathon
it'll get an updating.

St.Ack

On Wed, Apr 14, 2010 at 6:14 AM, Thomas Koch tho...@koch.ro wrote:
 Hi,

 the current roadmap under http://wiki.apache.org/hadoop/HBase/RoadMaps still
 lists aug/sep 09 for an estimated release date of 0.21
 Could you please be so kind to update the site. I'm trying to figure out,
 whether I should rather consider 0.20.4 or 0.21 for a HBase deployment in
 mai/june.
 I also want to build Debian packages of HBase. Would it make sense to package
 0.21 for Debian squeeze?

 Best regards,

 Thomas Koch, http://www.koch.ro



Re: Region server goes away

2010-04-14 Thread Stack
On Wed, Apr 14, 2010 at 8:27 PM, Geoff Hendrey ghend...@decarta.com wrote:
 Hi,

 I have posted previously about issues I was having with HDFS when I was
 running HBase and HDFS on the same box both pseudoclustered. Now I have
 two very capable servers. I've setup HDFS with a datanode on each box.
 I've setup the namenode on one box, and the zookeeper and HDFS master on
 the other box. Both boxes are region servers. I am using hadoop 20.2 and
 hbase 20.3.

What do you have for replication?  If two datanodes, you've set it to
two rather than default 3?



 I have set dfs.datanode.socket.write.timeout to 0 in hbase-site.xml.

This is probably not necessary.


 I am running a mapreduce job with about 200 concurrent reducers, each of
 which writes into HBase, with 32,000 row flush buffers.


Why don't you try with just a few reducers first and then build it up?
 See if that works?


About 40%
 through the completion of my job, HDFS started showing one of the
 datanodes was dead (the one *not* on the same machine as the namenode).


Do you think it dead -- what did a threaddump say? -- or was it just
that you couldn't get into it?  Any errors in the datanode logs
complaining about xceiver count or perhaps you need to up the number
of handlers?



 I stopped HBase, and magically the datanode came back to life.

 Any suggestions on how to increase the robustness?


 I see errors like this in the datanode's log:

 2010-04-14 12:54:58,692 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: D
 atanodeRegistration(10.241.6.80:50010,
 storageID=DS-642079670-10.241.6.80-50010-
 1271178858027, infoPort=50075, ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for
 channel


I believe this harmless.  Its just the DN timing out the socket -- you
set the timeout to 0 in the hbase-site.xml rather than in
hdfs-site.xml where it would have an effect.  See HADOOP-3831 for
detail.


  to be ready for write. ch : java.nio.channels.SocketChannel[connected
 local=/10
 .241.6.80:50010 remote=/10.241.6.80:48320]
        at
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeo
 ut.java:246)
        at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutput
 Stream.java:159)
        at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutput
 Stream.java:198)
        at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSe
 nder.java:313)
        at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSen
 der.java:400)
        at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXcei
 ver.java:180)
        at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.ja
 :


 Here I show the output of 'hadoop dfsadmin -report'. First time it is
 invoked, all is well. Second time, one datanode is dead. Third time, the
 dead datanode has come back to life.:

 [had...@dt1 ~]$ hadoop dfsadmin -report
 Configured Capacity: 1277248323584 (1.16 TB)
 Present Capacity: 1208326105528 (1.1 TB)
 DFS Remaining: 1056438108160 (983.88 GB)
 DFS Used: 151887997368 (141.46 GB)
 DFS Used%: 12.57%
 Under replicated blocks: 3479
 Blocks with corrupt replicas: 0
 Missing blocks: 0

 -
 Datanodes available: 2 (2 total, 0 dead)

 Name: 10.241.6.79:50010
 Decommission Status : Normal
 Configured Capacity: 643733970944 (599.52 GB)
 DFS Used: 75694104268 (70.5 GB)
 Non DFS Used: 35150238004 (32.74 GB)
 DFS Remaining: 532889628672(496.29 GB)
 DFS Used%: 11.76%
 DFS Remaining%: 82.78%
 Last contact: Wed Apr 14 11:20:59 PDT 2010



Yeah, my guess as per above is that the reporting client couldn't get
on to the datanode because handlers were full or xceivers exceeded.

Let us know how it goes.
St.Ack


 Name: 10.241.6.80:50010
 Decommission Status : Normal
 Configured Capacity: 633514352640 (590.01 GB)
 DFS Used: 76193893100 (70.96 GB)
 Non DFS Used: 33771980052 (31.45 GB)
 DFS Remaining: 523548479488(487.59 GB)
 DFS Used%: 12.03%
 DFS Remaining%: 82.64%
 Last contact: Wed Apr 14 11:14:37 PDT 2010


 [had...@dt1 ~]$ hadoop dfsadmin -report
 Configured Capacity: 643733970944 (599.52 GB)
 Present Capacity: 609294929920 (567.45 GB)
 DFS Remaining: 532876144640 (496.28 GB)
 DFS Used: 76418785280 (71.17 GB)
 DFS Used%: 12.54%
 Under replicated blocks: 3247
 Blocks with corrupt replicas: 0
 Missing blocks: 0

 -
 Datanodes available: 1 (2 total, 1 dead)

 Name: 10.241.6.79:50010
 Decommission Status : Normal
 Configured Capacity: 643733970944 (599.52 GB)
 DFS Used: 76418785280 (71.17 GB)
 Non DFS Used: 34439041024 (32.07 GB)
 DFS Remaining: 532876144640(496.28 GB)
 DFS Used%: 11.87%
 DFS Remaining%: 82.78%
 Last contact: Wed Apr 14 11:28:38 PDT 2010


 Name: 10.241.6.80:50010
 Decommission Status : Normal
 Configured Capacity: 0 (0 KB)
 DFS Used: 0 (0 KB)
 Non DFS Used: 0 (0 KB)
 DFS Remaining: 0(0 KB)
 DFS Used%: 100%
 DFS 

Re: can't run data-import and map-reduce-job on a Htable simultaneously

2010-04-14 Thread Stack
On Wed, Apr 14, 2010 at 4:05 PM, Sujee Maniyam su...@sujee.net wrote:
 scenario:
 - I am writing data into Hbase
 - I am also kicking off a MR job that READS from the same table

 When the MR job starts, data-inserts pretty much halt, as if the table
 is 'locked out'.

 Is this behavior to be expected?


No.

If you let the MR job run alone, it runs cleanly to the end?

St.Ack


 my pseudo write code :
  HBaseConfiguration hbaseConfig = new HBaseConfiguration();
  Htable table = new HTable(hbaseConfig, logs);
  table.setAutoFlush(false);
  table.setWriteBufferSize(1024 * 1024 * 12);

 My MR job:
 - reads from log table
 - writes to another table

 can I do these two in parallel?

 thanks
 Sujee

 http://sujee.net



Re: Deadlock when mapping a table?

2010-04-13 Thread Stack
Joost:

These are same as what you pasted on IRC?  The hold-up passed?  Can
you make it happen again?

St.Ack

On Mon, Apr 12, 2010 at 12:18 PM, Joost Ouwerkerk jo...@openplaces.org wrote:
 Thread dump of TaskTracker:
  http://gist.github.com/363898

 Thread dump of RegionServer:
  http://gist.github.com/363899

 Not clear what's going on.  I'm going to have a look at HBASE-2180...

 joost.

 On Sat, Apr 10, 2010 at 10:41 PM, Stack st...@duboce.net wrote:

 On Sat, Apr 10, 2010 at 4:38 PM, Joost Ouwerkerk jo...@openplaces.org
 wrote:
  We're mapping a table with about 2 million rows in 100 regions on 40
 nodes.
  In each map, we're doing a random read on the same table.  We're
  encountering a situation that looks alot like deadlock.  When the job is
  launched, some of the tasktrackers appear to get blocked in doing the
 first
  random read.  The only trace we get is an eventual Unknown Scanner
 Exception
  in the RegionServer log, at which point the task is actually reported as
  successfully completed by MapReduce (1 row processed).  There is no error
 in
  the task's log.  The job completes as SUCCESSFUL with an incomplete
 number
  of rows.  In the worst case scenario, we've actually seen ALL the
  tasktrackers encounter this problem; the job completes succesfully with
 100
  rows processed (1 per region).


 Any chance of a threaddump on the the problematic RS at the time?  Can
 you even figure the culprit?  There is a known deadlock that can
 happen writing (HBASE-2322) but this seems like something else.  If
 its a deadlock, often JVM can recognize it as so and it'll be detailed
 on the tail of the threaddump.  Todd has been messing too w/ jcarder
 (sp)?  That found HBASE-2322 but thats all it found I believe (I need
 to run it on next release candidate before it becomes a release
 candidate).  Maybe you're running into very slow reads because you
 don't have HBASE-2180?

 St.Ack



 
  When we remove the code that does the random read in the map, there are
 no
  problems.
 
  Anyone?  This is driving me crazy because I can't reproduce it locally
 (it
  only seems to be a problem in a distributed environment with many nodes)
 and
  because there is no stacktrace besides the scanner exception (which is
  clearly a symptom, not a cause).
 
  j
 




Re: hitting xceiverCount limit (2047)

2010-04-13 Thread Stack
Looks like you'll have to up your xceivers or up the count of hdfs nodes.
St.Ack

On Tue, Apr 13, 2010 at 11:37 AM, Sujee Maniyam su...@sujee.net wrote:
 Hi all,

 I have been importing a bunch of data into my hbase cluster, and I see
 the following error:

 Hbase error :
 hdfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink A.B.C.D

 Hadoop data node error:
     DataXceiver : java.io.IOException: xceiverCount 2048 exceeds the
 limit of concurrent xcievers 2047


 I have configured dfs.datanode.max.xcievers = 2047 in
 hadoop/conf/hdfs-site.xml

 Config:
 amazon ec2   c1.xlarge instances (8 CPU, 8G RAM)
 1 master + 4 region servers
 hbase heap size = 3G


 Upping the xcievers count, is an option.  I want to make sure if I
 need to tweak any other parameters to match this.

 thanks
 Sujee
 http://sujee.net



Recommendations (WAS - Re: DFSClient errors during massive HBase load)

2010-04-12 Thread Stack
Thanks for the writing back the list Oded.  I changed the subject so
its easier to find your suggestions amongst the mailing list weeds
going forward.  On swappyness, setting it to 0 is extreme but since
you've supplied links, users can do as you suggest or do something not
so radical.

Good stuff,

St.Ack


On Mon, Apr 12, 2010 at 11:31 AM, Oded Rosen o...@legolas-media.com wrote:
 The tips you guys gave me made a huge difference.
 I also used other tips from the Troubleshooting section in hbase wiki, and
 from all around the web.
 I would like to share my current cluster configuration, as only few places
 around the web offer a guided tour of these important configuration changes.
 This might be helpful for other people with small clusters, that have
 problems with loading large amounts of data to hbase on a regular basis.
 I am not a very experienced user (yet...) so if I got something wrong, or if
 I am missing anything, please say so. Thanks in advance

 *1. Prevent your regionserver machines from memory swap* - this is a must
 have, it seems, for small hbase clusters that handle large loads:

 *Edit this file (on each regionserver) and then activate the following
 commands.*

 *File:* /etc/sysctl.conf
 *Add Values:*
 vm.swappiness = 0 (this one - on datanodes only!)

 *Then run (In order to apply the changes immediately):*
 sysctl -p /etc/sysctl.conf
 service network restart

 note: this is a kernel property change. swappiness with a zero value means
 the machine will not use virtual memory at all (or at least that what I
 understood). So handle with care. a low value (around 5 or 10, from the
 maximum value of 100) might also work. My configuration is zero.

 (Further explanations:
 http://cloudepr.blogspot.com/2009/09/cluster-facilities-hardware-and.html )

 *2. Increase file descriptor limit* - this is also a must have for almost
 any use of hbase.

 *Edit these two files (on each datanode/namenode) and then activate the
 following commands.*

 *File:* /etc/security/limits.conf
 *Add Values:*

 hadoop soft nofile 32768
 hadoop hard nofile 32768

 *File:* /etc/sysctl.conf
 *Add Values:*

 fs.file-max = 32768

 *Then run:*
 sysctl -p /etc/sysctl.conf
 service network restart
 note: you can perform steps 1+2 together, they both edit sysctl.conf. notice
 step 1 is only for regionservers (datanodes),
 while this one can also be made to the master (namenode) - although I'm not
 so sure it's necessary.

 (see 
 http://wiki.apache.org/hadoop/Hbase/Troubleshootinghttp://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6)

 *3. Raise HDFS + HBASE connection limit upper bound:*

 Edit hadoop/hbase configuration files to include these entries:
 (you might want to change the specific values, according to your cluster
 properties and usage).

 *File:* hdfs-site.xml
 *Add Properties:*

 name: dfs.datanode.max.xcievers
 value: 2047

 name: dfs.datanode.handler.count
 value: 10 (at least as the number of nodes in the cluster, or more if
 needed).

 (see 
 http://wiki.apache.org/hadoop/Hbase/Troubleshootinghttp://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6
 )

 *File:* hbase-site.xml
 *Add Properties:*

  property
    namehbase.regionserver.handler.count/name
    value100/value
  /property
  property
    namehbase.zookeeper.property.maxClientCnxns/name
    value100/value
  /property

 (see 
 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.htmlhttp://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description)

 If you can remember other changes you've made to increase hbase stability,
 you are welcome to reply.

 Cheers.

 On Thu, Apr 1, 2010 at 11:43 PM, Andrew Purtell apurt...@apache.org wrote:

 First,

  ulimit: 1024

 That's fatal. You need to up file descriptors to something like 32K.

 See http://wiki.apache.org/hadoop/Hbase/Troubleshooting, item #6

 From there, let's see.

    - Andy

  From: Oded Rosen o...@legolas-media.com
  Subject: DFSClient errors during massive HBase load
  To: hbase-user@hadoop.apache.org
  Date: Thursday, April 1, 2010, 1:19 PM
  **Hi all,
 
  I have a problem with a massive HBase loading job.
  It is from raw files to hbase, through some mapreduce
  processing +
  manipulating (so loading direcly to files will not be
  easy).
 
  After some dozen million successful writes - a few hours of
  load - some of
  the regionservers start to die - one by one - until the
  whole cluster is
  kaput.
  The hbase master sees a znode expired error each time a
  regionserver
  falls. The regionserver errors are attached.
 
  Current configurations:
  Four nodes - one namenode+master, three
  datanodes+regionservers.
  dfs.datanode.max.xcievers: 2047
  ulimit: 1024
  servers: fedora
  hadoop-0.20, hbase-0.20, hdfs (private servers, not on ec2
  or anything).
 
 
  *The specific errors from the regionserver log (from
  IP6, see comment):*
 
  2010-04-01 11:36:00,224 WARN
  org.apache.hadoop.hdfs.DFSClient:
  DFSOutputStream ResponseProcessor exception  

Re: hundreds of reads of a metafile

2010-04-12 Thread Stack
On Mon, Apr 12, 2010 at 4:00 PM, Raghu Angadi rang...@apache.org wrote:
 Sorry for the delay. This is 3.5K reads per second. This goes on for may be
 minutes.

Against META?  I wonder why META blocks are not going up into cache.

 We have not been running load against this cluster yet. will update soon
 with behavior with the cluster stability.


Good on you Raghu.

 Overall, do you think writes are expected to be gracefully slowed down if
 the RAM is not enough?


It should.  Gates come down to prevent OOMEing and if we start to be
overrun with files in filesystem and compactions are not keeping up.

St.Ack


 Raghu.

 Thanks for digging in Raghu.

 Looking in logs, lots of churn -- other regionservers shedding
 regions, restarting?  -- and write load would probably do better if
 given more RAM.  You keep hitting the ceiling where we put down a gate
 blocking writes until flushes finish.

 What time interval are we talking of regards the 3.5k reads across 20
 blocks?  Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory?
 This regionserver was carrying the .META. region it looks like so its
 going to be popular.  I'd think the cache should be running
 interference on i/o but maybe its not doing a good job of it.  The
 write load/churn might be blowing the cache.  Yeah, log at DEBUG and
 we'll get a better idea.

 You re doing big upload?

 Cache is a config where you set how much of heap to allocate.  Default
 is 0.2 IIRC.
 St.Ack





  We are increasing memory from 3GB to 6GB. Any pointers about how to set
 size
  of block cache will be helpful.
  will enable DEBUG for LruBlockCache.
 
  Raghu.
 
  On Thu, Apr 8, 2010 at 12:46 AM, Stack st...@duboce.net wrote:
 
  On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi rang...@apache.org
 wrote:
   Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses
  inline
   below.
  
  Do.  A 0.20.4 should be around in next week or so which will be better
  still, FYI.
 
   On Wed, Apr 7, 2010 at 8:52 PM, Stack st...@duboce.net wrote:
  
   On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org
  wrote:
We are working with a small HBase cluster (5 nodes) with fairly
 beefy
   nodes.
While looking at why all the regionservers died at one time,
 noticed
  that
these servers read some files 100s of times a second. This may not
 be
   cause
of the error... but do you think this is odd?
   
   Check end of regionserver log.  Should say why RegionServer went
 away.
    The usual reason is long GC pause, one that is longer than zk
 session
   timeout.
  
  
   This seems to be the case... There were CMS GC failures (promotion
  failed,
   Full GC etc). There were 4-5 pauses of about 4-10 seconds over a
 minute
  or
   so. Is that enough to kill ZK session? We are increasing the memory
 and
  will
   go through tuning tips on wiki.
  
 
  ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this
  is common enough until a bit of tuning is done.  If you update to
  0.20.3 at least, the zk session is 60 seconds but you should try and
  avoid the promotion failures if you can.
 
   There are various other errors in the log over couple of hours of RS
 run.
   will post a link to the full log.
  
   --- failure on RS-72 ---
   2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn:
 Exception
   closing session 0x127d58da4e70002 to
 sun.nio.ch.selectionkeyi...@426295eb
   java.io.IOException: TIMED OUT
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
   2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn:
 Exception
   closing session 0x27d58da6de0088 to
 sun.nio.ch.selectionkeyi...@283f4633
   java.io.IOException: TIMED OUT
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
   2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC
  Server
   handler 27 on 60020, call put([...@20a192c7,
   [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211
 :
   error: java.io.IOException: Server not running, aborting
   java.io.IOException: Server not running, aborting
   at
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345)
   
 
 
  These look like the zk session time out sequence.
 
 
  
   --- failure on RS-73 after a few minutes ---
  
   2010-04-06 22:21:41,867 INFO
   org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
   -4957903368956265878 lease expired
   2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn:
 Exception
   closing session 0x127d58da4e7002a to
 sun.nio.ch.selectionkeyi...@15ef1241
   java.io.IOException: TIMED OUT
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
   2010-04-06 22:21:47,806 ERROR
   org.apache.hadoop.hbase.regionserver.HRegionServer:
   java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:312

Re: hundreds of reads of a metafile

2010-04-12 Thread Stack
On Mon, Apr 12, 2010 at 5:04 PM, Stack st...@duboce.net wrote:
 On Mon, Apr 12, 2010 at 4:00 PM, Raghu Angadi rang...@apache.org wrote:
 Sorry for the delay. This is 3.5K reads per second. This goes on for may be
 minutes.

Or is this on startup of a big job against a big table?
St.Ack


Re: Couple of HBase related issues...

2010-04-12 Thread Stack
On Mon, Apr 12, 2010 at 2:20 PM, Something Something
mailinglist...@gmail.com wrote:

 2) After 30 minutes or so on in-activity, HBase (or may be Zookeeper) stops
 working.  I have to restart it.  How do I ensure that it always stays up?


What are the symptoms.  You have some log on that?  What versions are
we talking here?
Thanks,
St.Ack


Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE

2010-04-10 Thread Stack
 is defined in my .bash_profile, so it's already there and
 I
   see
it expanded in the debug statements with the correct path. I even
 tried
hard-coding the $HBASE_HOME path above just in case and I had the
 same
issue.
   
I any case, I'm passed it now. I'll have to check whether the same
  issue
happens on our dev environment running on Ubuntu on EC2. If not, then
  at
least it's localized to my OSX environment.
   
-GS
   
   
On Fri, Apr 9, 2010 at 7:32 PM, Stack st...@duboce.net wrote:
   
Very odd.  I don't have to do that running MR jobs.  I wonder whats
different? (I'm using 0.20.4 near-candidate rather than 0.20.3,
1.6.0u14).  I have a HADOOP_ENV like this.
   
export HBASE_HOME=/home/hadoop/0.20
export HBASE_VERSION=20.4-dev
#export
   
  
 
 HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.20.4-dev.jar:$HBASE_HOME/build/hbase-0.20.4-dev-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar
export
   
  
 
 HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}.jar:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar
   
St.Ack
   
On Fri, Apr 9, 2010 at 4:19 PM, George Stathis gstat...@gmail.com
wrote:
 Solved: for those interested, I had to explicitly copy
zookeeper-3.2.2.jar
 to $HADOOP_HOME/lib even though I had added its' path to
$HADOOP_CLASSPATH
 under $HADOOP_HOME/conf/hadoop-env.sh.

 It makes no sense to me why that particular JAR would not get
 picked
   up.
It
 was even listed in the classpath debug output when I ran the job
  using
the
 hadoop shell script. If anyone can enlighten, please do.

 -GS

 On Fri, Apr 9, 2010 at 5:56 PM, George Stathis 
 gstat...@gmail.com
wrote:

 No dice. Classpath is now set. Same error. Meanwhile, I'm running
  $
hadoop
 org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
  just
fine,
 so MapRed is working at least.

 Still looking for suggestions then I guess.

 -GS


 On Fri, Apr 9, 2010 at 5:31 PM, George Stathis 
 gstat...@gmail.com
  
wrote:

 RTFMing

   
  
 
 http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlright
 now...Hadoop classpath not being set properly could be the
  issue...


 On Fri, Apr 9, 2010 at 5:26 PM, George Stathis 
  gstat...@gmail.com
wrote:

 Hi folks,

 I hope this is just a newbie problem.

 Context:
 - Running 0.20.3 tag locally in pseudo cluster mode
 - $HBASE_HOME is in env and $PATH
 - Running org.apache.hadoop.hbase.mapreduce.Export in the shell
   such
 as: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels
 /bkps/channels/01

 Symptom:
 - Getting an NPE at

   
  
 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110):

 [...]
 110      this.scanner = this.htable.getScanner(newScan);
 [...]

 Full output is bellow. Not sure why htable is still null at
 that
point.
 User error?

 Any help is appreciated.

 -GS

 Full output:

 $ hbase org.apache.hadoop.hbase.mapreduce.Export channels
 /bkps/channels/01
 2010-04-09 17:13:57.407::INFO:  Logging to STDERR via
 org.mortbay.log.StdErrLog
 2010-04-09 17:13:57.408::INFO:  verisons=1, starttime=0,
 endtime=9223372036854775807
 10/04/09 17:13:58 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
 /hbase/root-region-server got 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
   Found
 ROOT at 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
Cached
 location for .META.,,1 is 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
Cached
 location for channels,,1270753106916 is 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
   Cache
hit
 for row  in tableName channels: location server
   192.168.1.16:52159
,
 location region name channels,,1270753106916
 10/04/09 17:13:58 DEBUG mapreduce.TableInputFormatBase:
  getSplits:
split
 - 0 - 192.168.1.16:,
 10/04/09 17:13:58 INFO mapred.JobClient: Running job:
 job_201004091642_0009
 10/04/09 17:13:59 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/09 17:14:09 INFO mapred.JobClient: Task Id :
 attempt_201004091642_0009_m_00_0, Status : FAILED
 java.lang.NullPointerException
  at

   
  
 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
 at

   
  
 
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119

Re: hundreds of reads of a metafile

2010-04-09 Thread Stack
On Thu, Apr 8, 2010 at 2:26 PM, Raghu Angadi rang...@apache.org wrote:
 Regionserver log on node 72 is at : http://bit.ly/cd9acU  (160K gzipped). To
 give a scale of reads, the local datanode had 3.5K reads spread across about
 20 blocks. Pretty much all the reads were from the same DFSClient ID. will
 watch if this happens again.


Thanks for digging in Raghu.

Looking in logs, lots of churn -- other regionservers shedding
regions, restarting?  -- and write load would probably do better if
given more RAM.  You keep hitting the ceiling where we put down a gate
blocking writes until flushes finish.

What time interval are we talking of regards the 3.5k reads across 20
blocks?  Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory?
This regionserver was carrying the .META. region it looks like so its
going to be popular.  I'd think the cache should be running
interference on i/o but maybe its not doing a good job of it.  The
write load/churn might be blowing the cache.  Yeah, log at DEBUG and
we'll get a better idea.

You re doing big upload?

Cache is a config where you set how much of heap to allocate.  Default
is 0.2 IIRC.
St.Ack





 We are increasing memory from 3GB to 6GB. Any pointers about how to set size
 of block cache will be helpful.
 will enable DEBUG for LruBlockCache.

 Raghu.

 On Thu, Apr 8, 2010 at 12:46 AM, Stack st...@duboce.net wrote:

 On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi rang...@apache.org wrote:
  Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses
 inline
  below.
 
 Do.  A 0.20.4 should be around in next week or so which will be better
 still, FYI.

  On Wed, Apr 7, 2010 at 8:52 PM, Stack st...@duboce.net wrote:
 
  On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org
 wrote:
   We are working with a small HBase cluster (5 nodes) with fairly beefy
  nodes.
   While looking at why all the regionservers died at one time, noticed
 that
   these servers read some files 100s of times a second. This may not be
  cause
   of the error... but do you think this is odd?
  
  Check end of regionserver log.  Should say why RegionServer went away.
   The usual reason is long GC pause, one that is longer than zk session
  timeout.
 
 
  This seems to be the case... There were CMS GC failures (promotion
 failed,
  Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minute
 or
  so. Is that enough to kill ZK session? We are increasing the memory and
 will
  go through tuning tips on wiki.
 

 ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this
 is common enough until a bit of tuning is done.  If you update to
 0.20.3 at least, the zk session is 60 seconds but you should try and
 avoid the promotion failures if you can.

  There are various other errors in the log over couple of hours of RS run.
  will post a link to the full log.
 
  --- failure on RS-72 ---
  2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception
  closing session 0x127d58da4e70002 to sun.nio.ch.selectionkeyi...@426295eb
  java.io.IOException: TIMED OUT
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
  2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception
  closing session 0x27d58da6de0088 to sun.nio.ch.selectionkeyi...@283f4633
  java.io.IOException: TIMED OUT
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
  2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC
 Server
  handler 27 on 60020, call put([...@20a192c7,
  [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211:
  error: java.io.IOException: Server not running, aborting
  java.io.IOException: Server not running, aborting
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345)
  


 These look like the zk session time out sequence.


 
  --- failure on RS-73 after a few minutes ---
 
  2010-04-06 22:21:41,867 INFO
  org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
  -4957903368956265878 lease expired
  2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exception
  closing session 0x127d58da4e7002a to sun.nio.ch.selectionkeyi...@15ef1241
  java.io.IOException: TIMED OUT
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
  2010-04-06 22:21:47,806 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
  ---
 

 This is zk session timeout and an OOME.  The GC couldn't succeed.

 How much memory you giving these puppies?  CMS is kinda sloppy so you
 need to give it a bit more space to work in.

   [...]
 
   2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could
 not
   obtain block blk_-7610953303919156937_1089667 from any node:
    java.io.IOException: No live nodes contain current block
   [...]
   
 
  Are you

Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE

2010-04-09 Thread Stack
Very odd.  I don't have to do that running MR jobs.  I wonder whats
different? (I'm using 0.20.4 near-candidate rather than 0.20.3,
1.6.0u14).  I have a HADOOP_ENV like this.

export HBASE_HOME=/home/hadoop/0.20
export HBASE_VERSION=20.4-dev
#export 
HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.20.4-dev.jar:$HBASE_HOME/build/hbase-0.20.4-dev-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar
export 
HADOOP_CLASSPATH=$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}.jar:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar

St.Ack

On Fri, Apr 9, 2010 at 4:19 PM, George Stathis gstat...@gmail.com wrote:
 Solved: for those interested, I had to explicitly copy zookeeper-3.2.2.jar
 to $HADOOP_HOME/lib even though I had added its' path to $HADOOP_CLASSPATH
 under $HADOOP_HOME/conf/hadoop-env.sh.

 It makes no sense to me why that particular JAR would not get picked up. It
 was even listed in the classpath debug output when I ran the job using the
 hadoop shell script. If anyone can enlighten, please do.

 -GS

 On Fri, Apr 9, 2010 at 5:56 PM, George Stathis gstat...@gmail.com wrote:

 No dice. Classpath is now set. Same error. Meanwhile, I'm running $ hadoop
 org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 just fine,
 so MapRed is working at least.

 Still looking for suggestions then I guess.

 -GS


 On Fri, Apr 9, 2010 at 5:31 PM, George Stathis gstat...@gmail.com wrote:

 RTFMing
 http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html
  right
 now...Hadoop classpath not being set properly could be the issue...


 On Fri, Apr 9, 2010 at 5:26 PM, George Stathis gstat...@gmail.comwrote:

 Hi folks,

 I hope this is just a newbie problem.

 Context:
 - Running 0.20.3 tag locally in pseudo cluster mode
 - $HBASE_HOME is in env and $PATH
 - Running org.apache.hadoop.hbase.mapreduce.Export in the shell such
 as: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels
 /bkps/channels/01

 Symptom:
 - Getting an NPE at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110):

 [...]
 110      this.scanner = this.htable.getScanner(newScan);
 [...]

 Full output is bellow. Not sure why htable is still null at that point.
 User error?

 Any help is appreciated.

 -GS

 Full output:

 $ hbase org.apache.hadoop.hbase.mapreduce.Export channels
 /bkps/channels/01
 2010-04-09 17:13:57.407::INFO:  Logging to STDERR via
 org.mortbay.log.StdErrLog
 2010-04-09 17:13:57.408::INFO:  verisons=1, starttime=0,
 endtime=9223372036854775807
 10/04/09 17:13:58 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
 /hbase/root-region-server got 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Found
 ROOT at 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cached
 location for .META.,,1 is 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cached
 location for channels,,1270753106916 is 192.168.1.16:52159
 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: Cache hit
 for row  in tableName channels: location server 192.168.1.16:52159,
 location region name channels,,1270753106916
 10/04/09 17:13:58 DEBUG mapreduce.TableInputFormatBase: getSplits: split
 - 0 - 192.168.1.16:,
 10/04/09 17:13:58 INFO mapred.JobClient: Running job:
 job_201004091642_0009
 10/04/09 17:13:59 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/09 17:14:09 INFO mapred.JobClient: Task Id :
 attempt_201004091642_0009_m_00_0, Status : FAILED
 java.lang.NullPointerException
  at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
 at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119)
  at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

 10/04/09 17:14:15 INFO mapred.JobClient: Task Id :
 attempt_201004091642_0009_m_00_1, Status : FAILED
 java.lang.NullPointerException
 at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
  at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119)
 at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)

 10/04/09 17:14:21 INFO mapred.JobClient: Task Id :
 attempt_201004091642_0009_m_00_2, Status : FAILED
 java.lang.NullPointerException
 at
 

Re: exceptions i got in HDFS - append problem?

2010-04-09 Thread Stack
On Fri, Apr 9, 2010 at 3:07 AM, Gokulakannan M gok...@huawei.com wrote:
 Hi,
  I got the following exceptions , when I am using HDFS to write the logs
 coming from Scribe
  1. java.io.IOException: Filesystem closed

  stack trace
  
      
  call to org.apache.hadoop.fs.FSDataOutputStream::write failed!


Above seems to be saying that filesystem is closed and as a
consequence, you are not able to write it.

  2. org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
 create
   file xxx-2010-04-01-12-40_0 for DFSClient_1355960219 on client
 10.18.22.55 because current leaseholder is trying to recreate file
   stack trace
  
      
  call to
 org.apache.hadoop.conf.FileSystem::append((Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FSDataOutputStream;)failed!


Someone holds the lease on the file you are trying to open?

You mention scribe.  Do you have hdfs-200 and friends applied to your cluster?

   I didn't apply the HDFS-265 to my hadoop patch yet.


What hadoop version are you running?  hdfs-265 won't apply to hadoop
0.20.x if that is what you are running.


   Are these exceptions due to the bugs in existing append-feature?? or some
 other reason?

  Should I need to apply the complete append patch or a simple patch will
 solve this.

I haven't looked, but my guess is that scribe documentation probably
has description of the patchset required to run on hadoop.

St.Ack


Re: hundreds of reads of a metafile

2010-04-08 Thread Stack
On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi rang...@apache.org wrote:
 Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses inline
 below.

Do.  A 0.20.4 should be around in next week or so which will be better
still, FYI.

 On Wed, Apr 7, 2010 at 8:52 PM, Stack st...@duboce.net wrote:

 On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org wrote:
  We are working with a small HBase cluster (5 nodes) with fairly beefy
 nodes.
  While looking at why all the regionservers died at one time, noticed that
  these servers read some files 100s of times a second. This may not be
 cause
  of the error... but do you think this is odd?
 
 Check end of regionserver log.  Should say why RegionServer went away.
  The usual reason is long GC pause, one that is longer than zk session
 timeout.


 This seems to be the case... There were CMS GC failures (promotion failed,
 Full GC etc). There were 4-5 pauses of about 4-10 seconds over a minute or
 so. Is that enough to kill ZK session? We are increasing the memory and will
 go through tuning tips on wiki.


ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this
is common enough until a bit of tuning is done.  If you update to
0.20.3 at least, the zk session is 60 seconds but you should try and
avoid the promotion failures if you can.

 There are various other errors in the log over couple of hours of RS run.
 will post a link to the full log.

 --- failure on RS-72 ---
 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x127d58da4e70002 to sun.nio.ch.selectionkeyi...@426295eb
 java.io.IOException: TIMED OUT
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x27d58da6de0088 to sun.nio.ch.selectionkeyi...@283f4633
 java.io.IOException: TIMED OUT
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
 handler 27 on 60020, call put([...@20a192c7,
 [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211:
 error: java.io.IOException: Server not running, aborting
 java.io.IOException: Server not running, aborting
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345)
 


These look like the zk session time out sequence.



 --- failure on RS-73 after a few minutes ---

 2010-04-06 22:21:41,867 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
 -4957903368956265878 lease expired
 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x127d58da4e7002a to sun.nio.ch.selectionkeyi...@15ef1241
 java.io.IOException: TIMED OUT
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
 2010-04-06 22:21:47,806 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.lang.OutOfMemoryError: Java heap space
 at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39)
 at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
 ---


This is zk session timeout and an OOME.  The GC couldn't succeed.

How much memory you giving these puppies?  CMS is kinda sloppy so you
need to give it a bit more space to work in.

  [...]

  2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not
  obtain block blk_-7610953303919156937_1089667 from any node:
   java.io.IOException: No live nodes contain current block
  [...]
  

 Are you accessing from mapreduce?  If so, does your hadoop have hdfs-127?

 Then there are the usual suspects.  Xceivers count -- up it to 2k or
 so -- and ulimit should be much greater than the default 1024.


 yes. Most of the traffic now is puts from reducers. I think HDFS is a recent
 Cloudera release. I will check. Most likely it wont have hdfs-127.


My guess is that it does.. but yeah, check (You should remember that
one -- smile)


 yup.. we hit Xceivers limit very early. The limit is 2k and fd limit is also
 high.

 [...]

  There are thousands of repeated reads of many small files like this.
 
  --- From NN log, this block was created
  for /hbase/.META./1028785192/info/1728561479703335912
  2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
  NameSystem.allocateBlock:
 /hbase/.META./1028785192/info/1728561479703335912.
  blk_8972126557191254374_1090962
  
 
  Btw, we had single replication set for this file by mistake.
 

 So, if any error getting a block, there is no recourse.  Was there
 concurrent processes sucking i/o from HDFS running at same time?



 Writing, clients need to figure where to write.  They'll do this by
 doing lookup in .META.  They'll then cache the info.  If clients are
 short-lived, then lots of .META. hits.


 Client here is HBase client (in our case reducers)?


Your reducers run for a while?  So clients have chance to exploit
cache of meta data info?


 And as Ryan says, whats the caching stats

Re: Regions assigned multiple times after disabling table

2010-04-08 Thread Stack
On Thu, Apr 8, 2010 at 6:38 AM, Martin Fiala fial...@gmail.com wrote:

 Now, the table is disabled, but the region is online on
 fernet1-v49.ng.seznam.cz!! Is there some race condition?


Enabling/disabling largish tables has always been problematic.  We
intend fixing it properly in the next major hbase but meantime we're
patching what is currently in place, effectively the sending of a
mesage across the cluster and then hoping rather than guaranteeing its
received by all regionservers and that there are no issues during the
enable/disable.

You could try being extra careful.  On disable, ensure all are offline
before proceeding.  This probably requires manual inspection of meta
region (region gets status=offline feature when disabled successfully)
for now.  0.20.4 which has improvements that particularly address this
issue should be out soon also.

Meantime, its probably best to do as little enable/disable as possible.

St.Ack


Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.

2010-04-08 Thread Stack
It'll depend on your access patterns but in general we'll be doing
lots of small accesses... many more.  A recently added clienttrace
log, in this case the client referred to is dfsclient, will log
messages like the following:

2010-04-07 22:15:52,078 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.20.20.189:50010, dest: /10.20.20.189:56736, bytes: 2022080, op:
HDFS_READ, cliID: DFSClient_-994492608, srvID:
DS-1740361948-10.20.20.189-50010-1270703663528, blockid:
blk_2797215769808904384_1015

Lots of them, one per access.

You could turn them off explicitly in your log4j.  That should help.

Don't run DEBUG level in datanode logs.

Other answers inlined below.

On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
...
        At present, my idea is calculating the data IO quantity of both HDFS
 and HBase for a given day, and with the result we can have a rough estimate
 of the situation.

Can you use the above noted clientrace logs to do this?  Are clients
on different hosts -- i.e. the hdfs clients and hbase clients?  If so
that'd make it easy enough.  Otherwise, it'd be a little difficult.
There is probably an easier way but one (awkward) means of calculating
would be by writing a mapreduce job that took clienttrace messages and
al blocks in the filesystem and then had it sort the clienttrace
messages that belong to the ${HBASE_ROOTDIR} subdirectory.

        One problem I met now is to decide from the regionserver log the
 quantity of data been read/written by Hbase, should I count the lengths in
 following log records as lengths of data been read/written?:

 org.apache.hadoop.hbase.regionserver.Store: loaded
 /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780,
 isReference=false,
 sequence id=1526201715, length=*72426373*, majorCompaction=true
 2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
 Started memstore flush for region table_word_in_doc, resort
 all-2010/01/01,1267629092479. Current region memstore size *40.5m*

        here I am not sure the *72426373/40.5m is the length (in byte) of
 data read by HBase. *

Thats just file size.  Above we opened a storefile and we just logged its size.

We don't log how much we've read/written any where in hbase logs.

St.Ack


Re: How to recover or clear the data of a bad region

2010-04-08 Thread Stack
Delete region from filesystem and from .META. table.  Then, to close
the gap in .META. that you've just made, merge the two regions on
either side of the hole (Or just rewrite the upper or lower regioninfo
in .META. so its start/end keys cover over the hole just made).

St.Ack

On Thu, Apr 8, 2010 at 12:54 AM, 无名氏 sitong1...@gmail.com wrote:
 hi

 How to recover or clear the data of a bad region,  but not to affect other
 region, so that continue write/read data to/from hbase.
 thks.



Re: HTable Client RS caching

2010-04-08 Thread Stack
Please update your hbase from 0.20.1.  Its not much fun for those
helping debug issues to discover, after having expended some effort
debugging, that the issue has already been fixed.

UnknownScannerException usually means the client has taken too long to
report back to the regionserver between next invocations or the
regionserver was stuck GC'ing longer than the scanner lease.

St.Ack


On Thu, Apr 8, 2010 at 11:03 AM, Ted Yu yuzhih...@gmail.com wrote:
 Here are snippets from master log w.r.t. region
 domaincrawltable,,1270600690648:
 2010-04-07 00:00:38,504 DEBUG [RegionManager.metaScanner]
 master.BaseScanner(385): GET on domaincrawltable,,1270600690648 got
 different startcode than SCAN: sc=1270602502182, serverAddress=1270597824201
 2010-04-07 00:00:38,541 INFO  [RegionManager.metaScanner]
 master.BaseScanner(224): RegionManager.metaScanner scan of 7 row(s) of meta
 region {server: 10.10.30.82:60020, regionname: .META.,,1, startKey: }
 complete

 2010-04-07 18:19:37,384 DEBUG [HMaster] master.ProcessRegionOpen(98): Adding
 to onlineMetaRegions: {server: 10.10.30.82:60020, regionname: .META.,,1,
 startKey: }
 2010-04-07 18:19:39,417 INFO  [IPC Server handler 11 on 6]
 master.ServerManager(440): Processing MSG_REPORT_PROCESS_OPEN:
 domaincrawltable,,1270600690648 from snvgold.pr.com,60020,1270689385704; 1
 of 2
 2010-04-07 18:19:39,417 INFO  [IPC Server handler 11 on 6]
 master.ServerManager(440): Processing MSG_REPORT_OPEN:
 domaincrawltable,,1270600690648 from snvgold.pr.com,60020,1270689385704; 2
 of 2
 2010-04-07 18:19:39,419 DEBUG [HMaster] master.HMaster(486): Processing
 todo: PendingOpenOperation from snvgold.pr.com,60020,1270689385704
 2010-04-07 18:19:39,419 INFO  [HMaster] master.ProcessRegionOpen(70):
 domaincrawltable,,1270600690648 open on 10.10.30.82:60020
 2010-04-07 18:19:39,423 INFO  [HMaster] master.ProcessRegionOpen(80):
 Updated row domaincrawltable,,1270600690648 in region .META.,,1 with
 startcode=1270689385704, server=10.10.30.82:60020

 We use hbase 0.20.1 on server and client.
 The most peculiar log from one of regionservers is:

 2010-04-08 10:26:38,391 ERROR [IPC Server handler 61 on 60020]
 regionserver.HRegionServer(844):
 org.apache.hadoop.hbase.UnknownScannerException: Name: -1
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1925)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 On Thu, Apr 8, 2010 at 10:40 AM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 No it's there: domaincrawltable,,1270600690648

 J-D

 On Thu, Apr 8, 2010 at 10:38 AM, Ted Yu yuzhih...@gmail.com wrote:
  What if there is no region information in NSRE ?
 
  2010-04-08 10:26:38,385 ERROR [IPC Server handler 60 on 60020]
  regionserver.HRegionServer(846): Failed openScanner
  org.apache.hadoop.hbase.NotServingRegionException:
  domaincrawltable,,1270600690648
         at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
         at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1893)
         at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
         at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at
  org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
         at
  org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 
 
  On Thu, Apr 8, 2010 at 9:39 AM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  On Wed, Apr 7, 2010 at 11:38 PM, Al Lias al.l...@gmx.de wrote:
   Occationally my HTable clients get a response that no server is
 serving
   a particular region...
   Normally, the region is back a few seconds later (perhaps a split?).
 
  Or the region moved.
 
  
   Anyway, the client (Using HTablePool) seems to need a restart to
 forget
   this.
 
  Seems wrong, would love a stack trace.
 
  
   Is there a config value to manipulate the caching time of regionserver
   assignments in the client?
 
  Nope, when the client sees a NSRE, it queries .META. to find the new
  location.
 
  
   I set a small value for hbase.client.pause to get failures fast. I am
   using 0.20.3 .
 
  Splits are still kinda slow, takes at least 2 seconds to happen, but
  finding the new location of a region is a core feature in HBase and
  it's rather well tested, Can you pin down your exact problem? Next
  time a NSRE happens, see which region it was looking for and grep the
  master log for it, you should see the history and how much time it
  took to move.
 
  
   Thx,
  
    Al
  
 
 




Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.

2010-04-08 Thread Stack
On Thu, Apr 8, 2010 at 7:42 PM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
         In this case, lots of access records, but fairly less data than
 usual Hadoop jobs, can we say usually there are many more blocks involved in
 a Hbase HDFS access than in a Hadoop HDFS access, this cannot be efficient.


For a random access, usually one block only is involved in hbase at
least.  If you first indexed your content in HDFS, it'd be about the
same.

  I know sometime there are small region store files, but if they are small,
 they would be merged into one by compaction, right?

The aim is storefiles of about the size of a single block. Usually I'd
say we spill over into the second block.

On compaction files are compacted and will tend to grow in size (add
more blocks).


       Is there anyway we lower number of small data access? maybe by
 setting higher rowcaching number, but that should be App dependent. Any
 other options we can use to lower this number?


What are you reads like?  Lots of random reads?  Or are they all
scans?  Do they adhere to any kind of pattern or are they random?

Yes, you could up your cache size too.

What is the problem you are trying to address?  Are you saying all the
i/o is killing our HDFS or something?  Or is it just making big logs
that you are trying to address?

 You could turn them off explicitly in your log4j.  That should help.

 Don't run DEBUG level in datanode logs.


 we are running the cluster at INFO level.


Do you see the clienttrace loggings?  You could explicitly disable
this class's loggings.  That should make a big difference in log size.

St.Ack


 Other answers inlined below.

 On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang
 steven.zhuang.1...@gmail.com wrote:
 ...
         At present, my idea is calculating the data IO quantity of both
 HDFS
  and HBase for a given day, and with the result we can have a rough
 estimate
  of the situation.

 Can you use the above noted clientrace logs to do this?  Are clients
 on different hosts -- i.e. the hdfs clients and hbase clients?  If so
 that'd make it easy enough.  Otherwise, it'd be a little difficult.
 There is probably an easier way but one (awkward) means of calculating
 would be by writing a mapreduce job that took clienttrace messages and
 al blocks in the filesystem and then had it sort the clienttrace
 messages that belong to the ${HBASE_ROOTDIR} subdirectory.

 yeah, the hbase regionserver and datanode are on same host. so I cannot get
 the data read/written by HBase just from the datanode log.
 the Map/Reduce way may have a problem, we can not get the historical block
 info from HDFS file system, I mean there are lots of blocks been garbage
 collected when we import or delete data.

         One problem I met now is to decide from the regionserver log the
  quantity of data been read/written by Hbase, should I count the lengths
 in
  following log records as lengths of data been read/written?:
 
  org.apache.hadoop.hbase.regionserver.Store: loaded
  /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780,
  isReference=false,
  sequence id=1526201715, length=*72426373*, majorCompaction=true
  2010-03-04 01:11:54,262 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion:
  Started memstore flush for region table_word_in_doc, resort
  all-2010/01/01,1267629092479. Current region memstore size *40.5m*
 
         here I am not sure the *72426373/40.5m is the length (in byte) of
  data read by HBase. *

 Thats just file size.  Above we opened a storefile and we just logged its
 size.

 We don't log how much we've read/written any where in hbase logs.

 St.Ack




Re: hundreds of reads of a metafile

2010-04-07 Thread Stack
On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi rang...@apache.org wrote:
 We are working with a small HBase cluster (5 nodes) with fairly beefy nodes.
 While looking at why all the regionservers died at one time, noticed that
 these servers read some files 100s of times a second. This may not be cause
 of the error... but do you think this is odd?

Check end of regionserver log.  Should say why RegionServer went away.
 The usual reason is long GC pause, one that is longer than zk session
timeout.

 HBase version : 0.20.1. The cluster was handling mainly write traffic.

Can you run a more recent hbase Raghu?  Lots of fixes since 0.20.1.

 Note that in datanode log, there are a lot of reads these files.

 One of RS logs:
  ---
 2010-04-06 21:51:33,923 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
 campaign,4522\x234\x23201003,1268865840941
 2010-04-06 21:51:34,211 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 region campaign,4522\x234\x23201003,1268865840941/407724784 available;
 sequence id is 1607026498
 2010-04-06 21:51:43,327 INFO org.apache.hadoop.hdfs.DFSClient: Could not
 obtain block blk_8972126557191254374_1090962 from any node:
  java.io.IOException: No live nodes contain current block
 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not
 obtain block blk_-5586169098563059270_1078171 from any node:
  java.io.IOException: No live nodes contain current block
 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient: Could not
 obtain block blk_-7610953303919156937_1089667 from any node:
  java.io.IOException: No live nodes contain current block
 [...]
 

Are you accessing from mapreduce?  If so, does your hadoop have hdfs-127?

Then there are the usual suspects.  Xceivers count -- up it to 2k or
so -- and ulimit should be much greater than the default 1024.


 portion of grep for one the blocks mentioned above in datanode log :
 
 39725703, offset: 0, srvID: DS-977430382-10.10.0.72-50010-1266601998020,
 blockid: blk_8972126557191254374_1090962, duration: 97000
 2010-04-06 21:51:43,307 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
 10.10.0.72:50010, dest: /10.10.0.72:43699, bytes: 6976, op: HDFS_READ,
 cliID: DFSClient_-1439725703, offset: 0, srvID:
 DS-977430382-10.10.0.72-50010-1266601998020, blockid:
 blk_8972126557191254374_1090962, duration: 76000
 2010-04-06 21:51:43,310 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
 10.10.0.72:50010, dest: /10.10.0.72:45123, bytes: 6976, op: HDFS_READ,
 cliID: DFSClient_-1439725703, offset: 0, srvID:
 DS-977430382-10.10.0.72-50010-1266601998020, blockid:
 blk_8972126557191254374_1090962, duration: 93000
 2010-04-06 21:51:43,314 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
 10.10.0.72:50010, dest: /10.10.0.72:41891, bytes: 6976, op: HDFS_READ,
 cliID: DFSClient_-1439725703, offset: 0, srvID:
 DS-977430382-10.10.0.72-50010-1266601998020, blockid:
 blk_8972126557191254374_1090962, duration: 267000
 2010-04-06 21:51:43,318 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
 10.10.0.72:50010, dest: /10.10.0.72:46412, bytes: 6976, op: HDFS_READ,
 cliID: DFSClient_-1439725703, offset: 0, srvID:
 DS-977430382-10.10.0.72-50010-1266601998020, blockid:
 blk_8972126557191254374_1090962, duration: 91000
 2010-04-06 21:51:46,330 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
 10.10.0.72:50010, dest: /10.10.0.72:40657, bytes: 6976, op: HDFS_READ,
 cliID: DFSClient_-1439725703, offset: 0, srvID:
 DS-977430382-10.10.0.72-50010-1266601998020, blockid:
 blk_8972126557191254374_1090962, duration: 85000
 --

 There are thousands of repeated reads of many small files like this.

 --- From NN log, this block was created
 for /hbase/.META./1028785192/info/1728561479703335912
 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
 NameSystem.allocateBlock: /hbase/.META./1028785192/info/1728561479703335912.
 blk_8972126557191254374_1090962
 

 Btw, we had single replication set for this file by mistake.


So, if any error getting a block, there is no recourse.  Was there
concurrent processes sucking i/o from HDFS running at same time?

Writing, clients need to figure where to write.  They'll do this by
doing lookup in .META.  They'll then cache the info.  If clients are
short-lived, then lots of .META. hits.

And as Ryan says, whats the caching stats look like for the .META.
region?  (See server it was hosted on and check its logs -- we dump
cache metrics every minute or so).

St.Ack


Re: Tuning question...

2010-04-07 Thread Stack
On Wed, Apr 7, 2010 at 3:36 PM, Michael Dalton mwdal...@gmail.com wrote:
 Correct me if I'm wrong on any of this, but is
 escape analysis safe at all to turn on in production given that it's totally
 disabled now in the most recent Java builds?

Thats sounds right.  Ryan, you are running u16?  Its off in u19 too I believe.
St.Ack


Re: enabling hbase metrics on a running instance

2010-04-06 Thread Stack
Also need to do configuration in hbase/conf/hadoop-metrics.xml (yes,
thats hadoop-metrics, not hbase-metrics) which I believe is only read
on restart.

So double-no.

St.Ack

On Tue, Apr 6, 2010 at 4:18 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
 This boils down to the question: can you enable JMX while the JVM is
 running? The answer is no (afaik).

 More doc here 
 http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

 J-D

 On Tue, Apr 6, 2010 at 4:12 PM, Igor Ranitovic irani...@gmail.com wrote:
 Is it possible to enable the hbase metrics without a restart? Thanks.

 i.




Re: Beginner question about querying records

2010-04-04 Thread Stack
2010/4/4 Onur AKTAS onur.ak...@live.com:

 Thank you very much for your answers.. I'm checking the document that you 
 gave.
 In short words, unless massive traffic and massive data size and massive 
 scale is needed, stick with regular RDBMSs, then if we grow up to terabytes 
 of data to be querried, then we can switch no NO-SQL databases.
 Thanks so much.

Well, the above is basically the case for hbase.  We should be better
at smaller scales than we currently are but that is another story.

But generalizing your lesson to all NOSQL is another matter.  The
category is broad covering myriad database types.

St.Ack



 Date: Sat, 3 Apr 2010 20:56:21 -0700
 Subject: Re: Beginner question about querying records
 From: st...@duboce.net
 To: hbase-user@hadoop.apache.org

 2010/4/3 Onur AKTAS onur.ak...@live.com:
 
  Hi all,
  I'm thinking of to switch from RDBMS to No-SQL database, but having lots 
  of unanswered questions in my mind.
  Please correct me if I'm wrong, is Hbase not suitable for small 
  environments? Like if we have 1 million records with no cluster or maybe 2 
  machines, is it not required?

 It'll work but thats not what its built for.  You'll be better off
 sticking with your current RDMBS if your dataset if that size and
 going by the rest of your questions below.

  As far as I know, Hbase does not support querying, but having Pig to 
  perform SQL like queries. It is multi dimensional hashmap distributed 
  across the network to be accessed fast by key. So if we need to query 
  something then we need to index it by ourselves.

 Yes.


  1) If we have a user list, and a potential Give me all people 
  above/beyond age 30 query, then do we need to create an index from the 
  beginning of the first data as:
  above_30_list : value: [ A, B, C ]beyond_30_list :value: [ X, Y, Z ]   ?

 Yes. Or if you can tolerate getting answer offline, run a mapreduce
 against the table.

 Or, if this is the only query you'll be running , think about how you
 could design the primary key so you can answer this question: e.g.
 userid_age.


  2) What if we need just people at age 45. Then do we need to get all 
  above_30 and scan each of them one by one?
  3) If we need so many various queries, then should we create such keys as 
  I wrote above for all potential queries? And entering the data to all that 
  indexes when inserting.

 Effectively yes.

  4) Parallelizing across clusters to share scanning is what HBase or Map 
  Reduce technique does to solve this issue?
  In short words, I'm willing to switch Hbase for my applications, and 
  wondering how can I do all these kind of operations in HBase with better 
  performance than I do in RDBMSs.

 HBase is about scaling.  To achieve scale, the model is changed.
 Moving your RDBMS schema to hbase will take some thought and not all
 will make it across.  For a considered thesis on nosql vs rdbms
 modeling, see http://j.mp/2PjPB.

 St.Acka


  Thanks so much.
 
 
 
  _
  Yeni Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin.
  http://windows.microsoft.com/shop

 _
 Yeni Windows 7: Gündelik işlerinizi basitleştirin. Size en uygun bilgisayarı 
 bulun.
 http://windows.microsoft.com/shop


Re: Beginner question about querying records

2010-04-03 Thread Stack
2010/4/3 Onur AKTAS onur.ak...@live.com:

 Hi all,
 I'm thinking of to switch from RDBMS to No-SQL database, but having lots of 
 unanswered questions in my mind.
 Please correct me if I'm wrong, is Hbase not suitable for small environments? 
 Like if we have 1 million records with no cluster or maybe 2 machines, is it 
 not required?

It'll work but thats not what its built for.  You'll be better off
sticking with your current RDMBS if your dataset if that size and
going by the rest of your questions below.

 As far as I know, Hbase does not support querying, but having Pig to perform 
 SQL like queries. It is multi dimensional hashmap distributed across the 
 network to be accessed fast by key. So if we need to query something then we 
 need to index it by ourselves.

Yes.


 1) If we have a user list, and a potential Give me all people above/beyond 
 age 30 query, then do we need to create an index from the beginning of the 
 first data as:
 above_30_list : value: [ A, B, C ]beyond_30_list :value: [ X, Y, Z ]   ?

Yes. Or if you can tolerate getting answer offline, run a mapreduce
against the table.

Or, if this is the only query you'll be running , think about how you
could design the primary key so you can answer this question: e.g.
userid_age.


 2) What if we need just people at age 45. Then do we need to get all above_30 
 and scan each of them one by one?
 3) If we need so many various queries, then should we create such keys as I 
 wrote above for all potential queries? And entering the data to all that 
 indexes when inserting.

Effectively yes.

 4) Parallelizing across clusters to share scanning is what HBase or Map 
 Reduce technique does to solve this issue?
 In short words, I'm willing to switch Hbase for my applications, and 
 wondering how can I do all these kind of operations in HBase with better 
 performance than I do in RDBMSs.

HBase is about scaling.  To achieve scale, the model is changed.
Moving your RDBMS schema to hbase will take some thought and not all
will make it across.  For a considered thesis on nosql vs rdbms
modeling, see http://j.mp/2PjPB.

St.Acka


 Thanks so much.



 _
 Yeni Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin.
 http://windows.microsoft.com/shop


Re: More about LogFlusher

2010-04-02 Thread Stack
On Fri, Apr 2, 2010 at 10:59 AM, ChingShen chingshenc...@gmail.com wrote:
 Thanks, Jean-Daniel

   I haven't patched yet, but what intention does it just write a marker
 into the file?


IIRC, its so that if the file is corrupted, the parse can be picked up
at other side of the corrupt section as soon as the parse trips over
the next marker.
St.Ack


Re: TableMapper and getSplits

2010-04-02 Thread Stack
Splitting a table on its Regions makes most sense when one table only
involved.  For your case, just override the splitter and make
different split objects.

As to the 'underloaded' hbase when one task per region, I'd say try it
first.  If many regions on the one regionserver, could make for a
decent load on the regionserver hosting.

Good luck,
St.Ack

On Fri, Apr 2, 2010 at 12:19 PM, Geoff Hendrey ghend...@decarta.com wrote:
 Hello,

 I have subclassed TableInputFormat and TableMapper. My job needs to read
 from two tables (one row from each) during its map method. the reduce
 method needs to write out to a table. For both the reads and the writes,
 I am using simple Get and Put respectively with autoflush true.

 One problem I see is that the number of map tasks that I get with HBase
 is limited to the number of regions in the table. This seems to make the
 job slower than it would be if I had many more mappers. Could I improve
 the situation by overriding getSplits so that I could have many more
 mappers?

 I saw the following doc'd in TableMapReduceUtil: Ensures that the given
 number of reduce tasks for the given job configuration does not exceed
 the number of regions for the given table.  Is there some reason one
 would want to insure that the number of tasks doesn't exceed the number
 of regions? It just seems to me that having one region serv only a
 single task would result in an underloaded HBase. Thoughts?

 -geoff



Re: come to HUG10!

2010-04-02 Thread Stack
HBaseWorld.  HBaseDisneyLand!
St.Ack

On Fri, Apr 2, 2010 at 2:31 PM, Jonathan Gray jg...@facebook.com wrote:
 Three cheers for Andrew and Trend Micro!  This is very awesome.  HBaseCon?  
 HBase Summit?

 -Original Message-
 From: Andrew Purtell [mailto:apurt...@apache.org]
 Sent: Friday, April 02, 2010 11:39 AM
 To: hbase-user@hadoop.apache.org
 Subject: come to HUG10!

 We are holding an all day event (noon - 9pm(ish)) on Monday April 19th,
 for the 10th HBase Users' Group meetup:

   Hackathon: http://www.meetup.com/hackathon/
   Meetup:    http://www.meetup.com/hbaseusergroup/calendar/12689490/

 There is space available for 30 at the Hackathon (8 RSVP so far) and
 100 at the meetup (67 RSVP so far).

 Consider coming out to one or both functions to meet the HBase
 developers and user community, to network, or just to learn more about
 HBase and Hadoop. Come to the Hackathon if you are a HBase developer or
 power user or committer or have interest in becoming one; bring your
 laptop, coding skills, and creativity.

 We have arranged to host HUG10 at the Sheraton San Jose, in Milpitas.

   Sheraton San Jose
   1801 Barber Lane
   Milpitas, CA 95035
   (408)-943-0600

 This is a little south for many HBasers or would-be HBasers, but in
 exchange for the drive we have arranged:

   - Private salon for hackathon and meetup

   - Classroom for 30 from noon-5PM for the hackathon with catered lunch
 buffet at 1PM

   - Meetup for 100 from 6PM-9PM(ish), catered dinner at 8PM (southwest
 buffet), cash bar, and use of an outdoor terrace

   - Free wireless Internet

   - Special room rate of $135/night

 Best regards,

 Andrew Purtell
 apurt...@apache.org
 andrew_purt...@trendmicro.com








Re: PerformanceEvaluation times

2010-04-01 Thread Stack
sequentialWrite 2 makes a job of two clients (two tasks only) each
doing 1M rows.  Your job only has 2 tasks total, right?  My guess is
you are paying MR overhead (though 10k seconds is excessive --
something else is going on).  You could try sequentialWrite 20 (20
tasks each writing 1M rows).  Also, set your cluster to have 1 map and
1 reduce per slave only.   MR can impinge on DN and RS stealing i/o,
RAM.  You don't have that much RAM so start with a small number of
slots.

St.Ack

On Thu, Apr 1, 2010 at 3:07 AM, Michael Dalton mwdal...@gmail.com wrote:
 Hi, I have an issue I've been running into with the Performance Evaluation
 results on my cluster. We have a cluster with 5 slaves, quad-core machines
 with 8GB RAM/2x1TB disk. There are 4 map and 4 reduce slots per slave. The
 MapReduce-related tests seem to be running really slow. For example,
 sequentialWrite 1 (no MapReduce) takes 335 seconds to insert 1 million rows.
  Running PerfEval with sequentialWrite 2 --nomapred takes approximately 750
 seconds.

 However, sequentialWrite 2 with MapReduce enabled takes 9483 seconds to
 finish the 20 Map tasks, over 10x longer than the --nomapred version. I do
 have jvm reuse set to -1, but I don't see why this should drastically
 increase MR latency. Am I missing some tuning or configuration parameter, or
 has anyone seen a drastic drop in performance when executing
 PerformanceEvaluation in MapReduce mode? I understand this is a bit of
 degenerate case for MapReduce, since PerformanceEvaluation generates its row
 values at runtime in memory, but these numbers seems a bit excessive at
 first glance. Thanks,

 Best regards,

 Mike



Re: why hbase using much more space than actual ?

2010-03-31 Thread Stack
Even after major compacting it all?

hbase major_compact TABLENAME

.. then wait a while or just leave the cluster up 24 hours and do
your measurement again.

What did you du?

du /hbase

or

du /hbase/TABLENAME?

The former will include all WAL logs still outstanding.

If size is a concern, run w/ lzo.  See wiki page for how-to.

St.Ack


On Tue, Mar 30, 2010 at 10:45 PM, Chen Bangzhong bangzh...@gmail.com wrote:
 Hi, ALL

 I am benchmarking HBase. I found that HBase used much more space than actual
 size. Here is my test environment.

 One NameNode Server
 One JobTracker Server (Secondary NameNode also on this machine)
 One DataNode

 dfs.replication set to 1

 property
    namedfs.replication/name
    value1/value
  /property

 My HBase Cluster includes one Master, one region server and one zookeeper on
 3 servers.

 I used the example code in HBase documentation to fill the test table. From
 hadoop, I found that the space used is about 3 times the actual size.

 for example, I wrote 10k records to the table, each record is about 20k, the
 actual size would be 2G. But from hadoop du command, the size used is more
 than 6G.

 I don't know if this is by design? Or my configuration is wrong.

 thanks



Re: could not be reached after 1 tries

2010-03-31 Thread Stack
2010/3/30  y_823...@tsmc.com:
 I am a veggie ^_^
 That's a slogan; It urges people to eat more veggetables on Monday for
 saving our planet.


I'm down w/ that (smile).
St.Ack


Re: Is NotServingRegionException really an Exception?

2010-03-31 Thread Stack
I always thought that the throwing of an exception to signal moved
region was broke if only for the reason that it disturbing to new
users.  See https://issues.apache.org/jira/browse/HBASE-72

Would be nice to change it.  I don't think it easy though.  We'd need
to rig the RPC so calls were enveloped or some such so we could pass
status messages along with (or instead of) a query results.

St.Ack


On Wed, Mar 31, 2010 at 8:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 On Wed, Mar 31, 2010 at 11:02 AM, Gary Helmling ghelml...@gmail.com wrote:

 Well I would still view it as an exceptional condition.  The client asked
 for data back from a server that does not own that data.  Sending back an
 exception seems like the appropriate response, to me at least.  It's just
 an
 exceptional condition that's allowed to happen in favor of the optimization
 of caching region locations in memory on the client.

 I could see the reporting of the exception being misleading though if it's
 being logged at an error or warn level when it's a normal part of
 operations.  What's the logging level of the messages?


 On Wed, Mar 31, 2010 at 10:51 AM, Al Lias al.l...@gmx.de wrote:

  Am 31.03.2010 16:47, schrieb Gary Helmling:
   NotServingRegionException is a normal part of operations when regions
   transition (ie due to splits).  It's how the region server signals back
  to
   the client that it needs to re-lookup the region location in .META.
  (which
   is normally cached in memory by the client, so can become stale).
  
   I'm sure it can also show up as a symptom of other problems, but if
  you're
   not seeing any other issues, then it's nothing to be concerned about.
  
 
  Thx Gary,
 
         this is my point: I see this many times in the (production) logs
  when
  it is actually nothing to worry about. Should'nt this rather be a normal
  response of a region server, instead an Exception?
 
  Al
 
  
   On Wed, Mar 31, 2010 at 7:38 AM, Al Lias al.l...@gmx.de wrote:
  
   As I do see this Exception really often in our logs. I wonder if this
   indicates a regular thing (within splits etc) or if this is something
   that should not normally happen.
  
   I see it often in Jira as a reason for something else that fails, but
   for a regular client request, where the client not perfectly
 up-to-date
   with region information it looks as something normal. Am I right here?
  
  
   Al
  
  
 
 

 The LDAP api's throw a ReferralException when you try to update a read only
 slave, so heir is a precedence for that. But true that an exception may be
 strong for something that is technically a warning.



Re: Using SPARQL against HBase

2010-03-31 Thread Stack
Writes would update your in-memory graph and the backing hbase store?

The in-memory graph would hold all data or just metadata?  You might
look at IHBase in the 'indexed' contrib to see how it loads an index
on region open (It subclasses hbase so it can catch key transistions)

Why HBase and not a native graph database?

Yours,
St.Ack


On Wed, Mar 31, 2010 at 8:27 AM, Basmajian, Raffi
rbasmaj...@oppenheimerfunds.com wrote:
 We are currently researching how to use SPARQL against data in Hbase. I
 understand the use of Get and Scan classes in the Hbase API, but these
 search classes do not return data in the same way SPARQL against RDF
 data returns it. My colleagues and I were discussing that these types of
 search results will require creating an in-memory graph first from
 Hbase, then using SPARQL against that graph. We are not sure how this is
 accomplished. Any advice would help, thank you

 -RNY

 --
 This e-mail transmission may contain information that is proprietary, 
 privileged and/or confidential and is intended exclusively for the person(s) 
 to whom it is addressed. Any use, copying, retention or disclosure by any 
 person other than the intended recipient or the intended recipient's 
 designees is strictly prohibited. If you are not the intended recipient or 
 their designee, please notify the sender immediately by return e-mail and 
 delete all copies. OppenheimerFunds may, at its sole discretion, monitor, 
 review, retain and/or disclose the content of all email communications.
 ==



Re: could not be reached after 1 tries

2010-03-30 Thread Stack
Was it busy at the time the client was trying to access it?  Do
subsequent accesses to this regionserver work?
St.Ack

2010/3/29  y_823...@tsmc.com:
 Hi,

 One of my region server is still listed on the webpage Region Server, but
 it raised folloing message while running my program.
 10/03/30 13:11:18 INFO ipc.HbaseRPC: Server at /10.81.47.43:60020 could not
 be reached after 1 tries, giving up
 Any suggestion?


 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  john smith
  js1987.sm...@gmaTo:  
 hbase-user@hadoop.apache.org
  il.com  cc:  (bcc: Y_823910/TSMC)
   Subject: Re: Region assignment 
 in Hbase
  2010/03/30 10:49
  AM
  Please respond to
  hbase-user






 J-D thanks for your reply. I have some doubts which I posted inline .
 Kindly
 help me

 On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans
 jdcry...@apache.orgwrote:

 Inline.

 J-D

 On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com
 wrote:
  Hi all,
 
  I read the issue HBase-57 (
 https://issues.apache.org/jira/browse/HBASE-57 )
  . I don't really understand the use of assigning regions keeping DFS in
  mind. Can anyone give an example usecase showing its advantages

 A region is composed of files, files are composed of blocks. To read
 data, you need to fetch those blocks. In HDFS you normally have access
 to 3 replicas and you fetch one of them over the network. If one of
 the replica is on the local datanode, you don't need to go through the
 network. This means less network traffic and better response time.


 Is this the scenario that occurs for catering the read requests?  In the
 thread Data distribution in HBase , one of the people mentioned that the
 data hosted by the Region Server may not actually reside on the same
 machine
 . So when asked for data , it fetches from the system containing the data.
 Am I right?  Why is the data hosted by a Region Server doesn't lie on the
 same machine . Doesn't the name name Region Server imply that it holds
 all
 the regions it contains? Is it due to splits or restarting the HBase ?



  Can
  map-reduce exploit it's advantage in any way (if data is distributed in
 the
  above manner)  or is it just the read-write performance that gets
 improved .

 MapReduce works in the exact same way, it always tries to put the
 computation next to where the data is. I recommend reading the
 MapReduce tutorial
 http://hadoop.apache.org/common/docs/r0.20.0
 /mapred_tutorial.html#Overview


 Also the same case Applies here I guess . When a map is run on a Region
 Server, It's data may not actually lie on the same machine . So it fetches
 from the machine containing it. This reduces the data locality !



  Can some one please help me in understanding this.
 
  Regards
  JS
 





  ---
 TSMC PROPERTY
  This email communication (and any attachments) is proprietary information
  for the sole use of its
  intended recipient. Any unauthorized review, use or distribution by anyone
  other than the intended
  recipient is strictly prohibited.  If you are not the intended recipient,
  please notify the sender by
  replying to this email, and then delete this email and any copies of it
  immediately. Thank you.
  ---






Re: could not be reached after 1 tries

2010-03-30 Thread Stack
Well, what do the logs tell about that servers state?
StAck

2010/3/30  y_823...@tsmc.com:
 Was it busy at the time the client was trying to access it? No

 Do subsequent accesses to this regionserver work? No
 Showing that message all the time.

 Thanks


 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  saint@gmail.c
  om   To:  
 hbase-user@hadoop.apache.org
  Sent by: cc:  (bcc: Y_823910/TSMC)
  saint@gmail.cSubject: Re: could not be 
 reached after 1 tries
  om


  2010/03/30 03:44
  PM
  Please respond to
  hbase-user






 Was it busy at the time the client was trying to access it?  Do
 subsequent accesses to this regionserver work?
 St.Ack

 2010/3/29  y_823...@tsmc.com:
 Hi,

 One of my region server is still listed on the webpage Region Server, but
 it raised folloing message while running my program.
 10/03/30 13:11:18 INFO ipc.HbaseRPC: Server at /10.81.47.43:60020 could
 not
 be reached after 1 tries, giving up
 Any suggestion?


 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  john smith
  js1987.sm...@gmaTo:
 hbase-user@hadoop.apache.org
  il.com  cc:  (bcc:
 Y_823910/TSMC)
   Subject: Re: Region
 assignment in Hbase
  2010/03/30 10:49
  AM
  Please respond to
  hbase-user






 J-D thanks for your reply. I have some doubts which I posted inline .
 Kindly
 help me

 On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans
 jdcry...@apache.orgwrote:

 Inline.

 J-D

 On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com
 wrote:
  Hi all,
 
  I read the issue HBase-57 (
 https://issues.apache.org/jira/browse/HBASE-57 )
  . I don't really understand the use of assigning regions keeping DFS
 in
  mind. Can anyone give an example usecase showing its advantages

 A region is composed of files, files are composed of blocks. To read
 data, you need to fetch those blocks. In HDFS you normally have access
 to 3 replicas and you fetch one of them over the network. If one of
 the replica is on the local datanode, you don't need to go through the
 network. This means less network traffic and better response time.


 Is this the scenario that occurs for catering the read requests?  In the
 thread Data distribution in HBase , one of the people mentioned that
 the
 data hosted by the Region Server may not actually reside on the same
 machine
 . So when asked for data , it fetches from the system containing the
 data.
 Am I right?  Why is the data hosted by a Region Server doesn't lie on
 the
 same machine . Doesn't the name name Region Server imply that it holds
 all
 the regions it contains? Is it due to splits or restarting the HBase ?



  Can
  map-reduce exploit it's advantage in any way (if data is distributed
 in
 the
  above manner)  or is it just the read-write performance that gets
 improved .

 MapReduce works in the exact same way, it always tries to put the
 computation next to where the data is. I recommend reading the
 MapReduce tutorial
 http://hadoop.apache.org/common/docs/r0.20.0
 /mapred_tutorial.html#Overview


 Also the same case Applies here I guess . When a map is run on a Region
 Server, It's data may not actually lie on the same machine . So it
 fetches
 from the machine containing it. This reduces the data locality !



  Can some one please help me in understanding this.
 
  Regards
  JS
 






 ---
 TSMC PROPERTY
  This email communication (and any attachments) is proprietary
 information
  for the sole use of its
  intended recipient. Any unauthorized review, use or distribution by
 anyone
  other than the intended
  recipient is strictly prohibited.  If you are not the intended
 recipient,
  please notify the sender by
  replying to this email, and then delete this email and any copies of it
  immediately. Thank you.

 ---








  ---
 TSMC PROPERTY
  This email communication (and any attachments) is proprietary information
  for the sole use of its
  intended recipient. Any unauthorized review, use or distribution by anyone
  other than the intended
  recipient is strictly prohibited.  If you are not the intended recipient,
  please notify the sender by
  replying to this email, and then delete this email and any

Re: Contrib tableindexed package vs. custom indexes

2010-03-30 Thread Stack
Sorry George for lack of response.  I think it probably a bit of 3.)
and then 4.) which is that you know cleanly the options so there is
nothing really to add.

My sense is that when fellas say roll your own indexes, between the
lines I think what they are saying is that they do not want to do the
two updates transactionally -- that they do not want to pay the
ITHBase tax -- and are ok w/ losing a index add or two.

The preso at ApacheCon was a callout to THBase and ITHBase recognizing
that its been a contrib for a good while now and that perhaps its
graduated beyond our designation of it as 'experimental'.

St.Ack


On Mon, Mar 29, 2010 at 12:51 PM, George Stathis gstat...@gmail.com wrote:
 Hi folks,

 I've seen some people around the list that recommend rolling one's own
 indexes. Others say to just go with
 the org.apache.hadoop.hbase.client.tableindexed package. A quick scan of the
 wiki does not reveal any best practices. Presentations from the devs such as
 the Oakland ApacheCon slides point to the contrib package.

 Some of the comments in the list seem to note that IndexedTable is not very
 performant; then again, I would assume that a custom index would have to
 wrap any table+index operations in a transaction anyway. So unless folks
 forego transactions when rolling their own indexes, I don't see how a custom
 implementation could be that much faster.

 What do the majority of people here do for indexing? Is there a generally
 accepted good middle-of-the-road approach offering an acceptable compromise
 between performance and maintainability? I must admit that rolling our own
 indexes does not seem like a viable long term approach to me (from a
 maintenance POV).

 I'm interested in people's opinion.

 -GS



Re: could not be reached after 1 tries

2010-03-30 Thread Stack
If you are still on HBase version : 0.20.2, r834515, you should
update.  There are a few NPE fixes in 0.20.3 and if I look at the code
in the branch around where you are getting the NPE below, its got lots
of protections against what you are seeing below.

 週一無肉日吃素救地球(Meat Free Monday Taiwan)

What is the above about?

Thanks,
St.Ack


2010/3/30  y_823...@tsmc.com:
 I found an error in the log.

 2010-03-30 10:10:32,135 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.lang.NullPointerException
 2010-03-30 10:10:32,136 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
 handler 45 on 60020, call delete([...@7a914631,
 row=N900W06LS1110_810593676N900W06LS111, ts=9223372036854775807,
 families={}) from 10.81.47.36:41022: error: java.io.IOException:
 java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
  at
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869)
  at
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859)
  at
 org.apache.hadoop.hbase.regionserver.HRegionServer.delete(HRegionServer.java:2028)
  at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
  at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 Caused by: java.lang.NullPointerException


 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  saint@gmail.c
  om   To:  
 hbase-user@hadoop.apache.org
  Sent by: cc:  (bcc: Y_823910/TSMC)
  saint@gmail.cSubject: Re: could not be 
 reached after 1 tries
  om


  2010/03/31 12:15
  AM
  Please respond to
  hbase-user






 Well, what do the logs tell about that servers state?
 StAck

 2010/3/30  y_823...@tsmc.com:
 Was it busy at the time the client was trying to access it? No

 Do subsequent accesses to this regionserver work? No
 Showing that message all the time.

 Thanks


 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  saint@gmail.c
  om   To:
 hbase-user@hadoop.apache.org
  Sent by: cc:  (bcc:
 Y_823910/TSMC)
  saint@gmail.cSubject: Re: could not be
 reached after 1 tries
  om


  2010/03/30 03:44
  PM
  Please respond to
  hbase-user






 Was it busy at the time the client was trying to access it?  Do
 subsequent accesses to this regionserver work?
 St.Ack

 2010/3/29  y_823...@tsmc.com:
 Hi,

 One of my region server is still listed on the webpage Region Server,
 but
 it raised folloing message while running my program.
 10/03/30 13:11:18 INFO ipc.HbaseRPC: Server at /10.81.47.43:60020 could
 not
 be reached after 1 tries, giving up
 Any suggestion?


 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  john smith
  js1987.sm...@gmaTo:
 hbase-user@hadoop.apache.org
  il.com  cc:  (bcc:
 Y_823910/TSMC)
   Subject: Re: Region
 assignment in Hbase
  2010/03/30 10:49
  AM
  Please respond to
  hbase-user






 J-D thanks for your reply. I have some doubts which I posted inline .
 Kindly
 help me

 On Tue, Mar 30, 2010 at 2:23 AM, Jean-Daniel Cryans
 jdcry...@apache.orgwrote:

 Inline.

 J-D

 On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com
 wrote:
  Hi all,
 
  I read the issue HBase-57 (
 https://issues.apache.org/jira/browse/HBASE-57 )
  . I don't really understand the use of assigning regions keeping DFS
 in
  mind. Can anyone give an example usecase showing its advantages

 A region is composed of files, files are composed of blocks. To read
 data, you need to fetch those blocks. In HDFS you normally have access
 to 3 replicas and you fetch one of them over the network. If one of
 the replica is on the local datanode, you don't need to go through the
 network. This means less network traffic and better response time.


 Is this the scenario that occurs for catering the read requests?  In the
 thread Data distribution in HBase , one of the people mentioned that
 the
 data hosted by the Region Server may not actually reside on the same
 machine
 . So

Re: Multi ranges Scan

2010-03-25 Thread Stack
Can you use a filter to do this?  If no pattern to the excludes then  
it's tougher. How do you know what to exclude?   It's in a repository  
somewhere?  Add a filter to query this repo?




On Mar 25, 2010, at 4:07 PM, Andriy Kolyadenko cryp...@mail.saturnfans.com 
 wrote:


Ok, it would work for regions pruning. And what about actual rows  
pruning inside single region? Do you have any ideas how to implement  
it?


--- Stack wrote: ---

I think you need to make a custom splitter for your mapreduce job, one
that makes splits that align with the ranges you'd have your job run
over.   A permutation on HBASE-2302 might work for you.

St.Ack

On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko  
cryp...@mailx.ru wrote:

Hi all,

maybe somebody could give me advice in the following situation:

Currently HBase Scan interface provides ability to set up only  
first and
last rows for MR scanning. Is it any way to get multiple ranges  
into the map

input?

For example let's assume I have following table:
key value
1   v1
2   v2
3   v3
4   v4
5   v5

What I need is to get for example [1,2) and [4,5) ranges as input  
for my Map

task. Actually I need this for the performance optimization.

Any advice?

Thanks.



_
Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/


Re: Data Loss During Bulk Load

2010-03-24 Thread Stack

Is this mr?  If so do u have hdfs-127 applied to your cluster?



On Mar 24, 2010, at 11:22 AM, Nathan Harkenrider nathan.harkenri...@gmail.com 
 wrote:



Thanks Ryan.

We have this config setting in place and are currently running an  
insert of

40 million rows into an empty pair of tables. The job has inserted 25
million rows so far, and we are not seeing any failed compact/split  
errors
in the log. I'll report back after the import is complete and we've  
verified

integrity of the data.

Regards,

Nathan

On Wed, Mar 24, 2010 at 11:12 AM, Ryan Rawson ryano...@gmail.com  
wrote:



You'll want this one:

property
namedfs.datanode.socket.write.timeout/name
value0/value
/property

A classic standby from just over a year ago.  It should be in the
recommended config - might not be anymore, but I am finding it
necessary now.

On Wed, Mar 24, 2010 at 11:06 AM, Rod Cope rod.c...@openlogic.com  
wrote:

This describes my situation, too.  I never could get rid of the
SocketTimeoutException's, even after dozens of hours of research and
applying every tuning and configuration suggestion I could find.

Rod


On 3/24/10 Wednesday, March 24, 201011:45 AM, Tuan Nguyen
tua...@gmail.com wrote:


Hi Nathan,

We recently run a performance test again hbase 0.20.3 and hadoop  
0.20.2.

We

have a quite similar problem to your.  At the first scan test ,  we

notice
that we loose some data on certain column in certain row and out  
log

have

the errors such Error Recovery for block, Coul Not get the block,
IOException, SocketTimeoutException: 48 millis timeout... And  
the

test
completely fail at the middle. After various tuning the GC,  
caching,
xcievier... We can finish the test without any data loss. Our log  
have

only

SocketTimeoutException: 48 millis timeout error left.

Tuan Nguyen!



--

Rod Cope | CTO and Founder
rod.c...@openlogic.com
Follow me on Twitter @RodCope

720 240 4501|  phone
720 240 4557|  fax
1 888 OpenLogic|  toll free
www.openlogic.com
Follow OpenLogic on Twitter @openlogic










Re: Cannot open filename Exceptions

2010-03-23 Thread Stack
So, for sure ugly stuff is going on.  I filed
https://issues.apache.org/jira/browse/HBASE-2365.  It looks like we're
doubly assigning a region.

Can you confirm that 209 lags behind the master (207) by about 25
seconds?  Are you running NTP on these machines so they sync their
clocks?

With DEBUG enabled have you been able to reproduce?

That said there might be enough in these logs to go on if you can
confirm the above.

Thanks for your patience Zheng.

St.Ack



On Thu, Mar 18, 2010 at 11:43 PM, Zheng Lv lvzheng19800...@gmail.com wrote:
 Hello Stack,
  I must say thank you, for your patience too.
  I'm sorry for that you had tried for many times but the logs you got were
 not that usful. Now I have turn the logging to debug level, so if we get
 these exceptions again, I will send you debug logs. Anyway, I still upload
 the logs you want to rapidshare, although they are not in debug level. The
 urls:

 http://rapidshare.com/files/365292889/hadoop-root-namenode-cactus207.log.2010-03-15.html

 http://rapidshare.com/files/365293127/hbase-root-master-cactus207.log.2010-03-15.html

 http://rapidshare.com/files/365293238/hbase-root-regionserver-cactus208.log.2010-03-15.html

 http://rapidshare.com/files/365293391/hbase-root-regionserver-cactus209.log.2010-03-15.html

 http://rapidshare.com/files/365293488/hbase-root-regionserver-cactus210.log.2010-03-15.html

  For sure you've upped xceivers on your hdfs cluster and you've upped
the file descriptors as per the 'Getting Started'? (Sorry, have to
ask).
  Before I got your mail, we didn't set the properties you mentioned,
 because we didn't got the too many open files or something which are
 mentioned in getting start docs. But now I have upped these properties.
 We'll see what will happen.

  If you need more infomations, just tell me.

  Thanks again,
  LvZheng.


 2010/3/19 Stack st...@duboce.net

 Yeah, I had to retry a couple of times (Too busy; try back later --
 or sign up premium service!).

 It would have been nice to have wider log snippets.  I'd like to have
 seen if the issue was double assignment.  The master log snippet only
 shows the split.  Regionserver 209's log is the one where the
 interesting stuff is going on around this time, 2010-03-15
 16:06:51,150, but its not in the provided set.  Neither are you
 running at DEBUG level so it'd be harder to see what is up even if you
 provided it.

 Looking in 208, I see a few exceptions beyond the one you paste below.
  For sure you've upped xceivers on your hdfs cluster and you've upped
 the file descriptors as per the 'Getting Started'? (Sorry, have to
 ask).

 Can I have more of the logs?  Can I have all of the namenode log, all
 of the master log and 209's log?  This rapidshare thing is fine with
 me.  I don't mind retrying.

 Sorry it took me a while to get to this.
 St.Ack











 On Wed, Mar 17, 2010 at 8:32 PM, Zheng Lv lvzheng19800...@gmail.com
 wrote:
  Hello Stack,
     Sorry. It's taken me a while.  Let try and get to this this evening
     Is it downloading the log files what take you a while? I m sorry, I
 used
  to upload files to skydrive, but now we cant access the website. Is there
  any netdisk or something you can download fast? I can upload to it.
     LvZheng
  2010/3/18 Stack saint@gmail.com
 
  Sorry. It's taken me a while.  Let try and get to this this evening
 
  Thank you for your patience
 
 
 
 
  On Mar 17, 2010, at 2:29 AM, Zheng Lv lvzheng19800...@gmail.com
 wrote:
 
  Hello Stack,
   Did you receive my mail?It looks like you didnt.
    LvZheng
 
  2010/3/16 Zheng Lv lvzheng19800...@gmail.com
 
  Hello Stack,
   I have uploaded some parts of the logs on master, regionserver208 and
  regionserver210 to:
   http://rapidshare.com/files/363988384/master_207_log.txt.html
   http://rapidshare.com/files/363988673/regionserver_208_log.txt.html
   http://rapidshare.com/files/363988819/regionserver_210_log.txt.html
   I noticed that there are some LeaseExpiredException and 2010-03-15
  16:06:32,864 ERROR
  org.apache.hadoop.hbase.regionserver.CompactSplitThread:
  Compaction/Split failed for region ... before 17 oclock. Did these
 lead
  to
  the error? Why did these happened? How to avoid these?
   Thanks.
    LvZheng
  2010/3/16 Stack st...@duboce.net
 
  Maybe just the master log would be sufficient from around this time to
  figure the story.
  St.Ack
 
  On Mon, Mar 15, 2010 at 10:04 PM, Stack st...@duboce.net wrote:
 
  Hey Zheng:
 
  On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv 
 lvzheng19800...@gmail.com
 
  wrote:
 
  Hello Stack,
  After we got these exceptions, we restart the cluster and restarted
 
  the
 
  job that failed, and the job succeeded.
  Now when we access
 
  /hbase/summary/1491233486/metrics/5046821377427277894,
 
  we got  Cannot access
  /hbase/summary/1491233486/metrics/5046821377427277894: No such file
 or
  directory. .
 
 
  So, that would seem to indicate that the reference was in memory
  only.. that file was not in filesystem.  You could

Re: The occasion to add region server

2010-03-23 Thread Stack
You saw no difference. Before starting job all regionservers had equal  
amount of regions.  Were regionservers loaded while the job ran.   Try  
with 5 regionservers.  If same for sure something is of.




On Mar 23, 2010, at 6:02 PM, y_823...@tsmc.com wrote:


Hi,
Our cluster with 20 machines(4 core, 12G Ram, 1U server)
HBase Version 0.20.2, r834515

ZK   :3
DataNode :   20
Total Region : 2088
Region Server:   10  --
Client Connection:   20
My job took time : 1264 sec

ZK   :3
DataNode :   20
Total Region : 2088
Region Server:   15  --
Client Connection:   20
My job took time : 1257 sec

ZK   :3
DataNode :   20
Total Region : 2088
Region Server:   20  --
Client Connection:   20
My job took time : 1267 sec

According to the above result, it didn't get the performance enhance  
after

adding region servers.
why?
I wonder when is the best occasion to add extra region server.
In my machine's spec ,maybe a region server will always get a good
performance with regions under 300.
Any ideas, thanks.




Fleming Chiu(邱宏明)
707-6128
y_823...@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)


--- 
--- 
-

TSMC PROPERTY
This email communication (and any attachments) is proprietary  
information

for the sole use of its
intended recipient. Any unauthorized review, use or distribution by  
anyone

other than the intended
recipient is strictly prohibited.  If you are not the intended  
recipient,

please notify the sender by
replying to this email, and then delete this email and any copies of  
it

immediately. Thank you.
--- 
--- 
-






Re: Cannot open filename Exceptions

2010-03-23 Thread Stack
On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv lvzheng19800...@gmail.com wrote:
 Hello Stack,
  So, for sure ugly stuff is going on.  I filed
  https://issues.apache.org/jira/browse/HBASE-2365.  It looks like we're
  doubly assigning a region.
  Can you tell me how this happened in detail? Thanks a lot.


Yes.

Splits are run by the regionserver.  It figures a region needs to be
split and goes ahead closing the parent and creating the daughter
regions.  It then adds edits to the meta table offlining the parent
and inserting the two new daughter regions.  Next it sends a message
to the master telling it that a region has been split.   The message
contains names of the daughter regions.  On receipt of the message,
the master adds the new daughter regions to the unassigned regions
list so they'll be passed out the next time a regionserver checks in.

Concurrently, the master is running a scan of the meta table every
minute making sure all is in order.  One thing it does is if it finds
unassigned regions, it'll add them to the unassigned regions (this
process is what gets all regions assigned after a startup).

In your case, whats happening is that there is a long period between
the add of the new split regions to the meta table and the report of
split to the master.  During this time, the master meta scan ran,
found one of the daughters and went and assigned it.  Then the split
message came in and the daughter was assigned again!

There was supposed to be protection against this happening IIRC.
Looking at responsible code, we are trying to defend against this
happening in ServerManager:

  /*
   * Assign new daughter-of-a-split UNLESS its already been assigned.
   * It could have been assigned already in rare case where there was a large
   * gap between insertion of the daughter region into .META. by the
   * splitting regionserver and receipt of the split message in master (See
   * HBASE-1784).
   * @param hri Region to assign.
   */
  private void assignSplitDaughter(final HRegionInfo hri) {
MetaRegion mr = this.master.regionManager.getFirstMetaRegionForRegion(hri);
Get g = new Get(hri.getRegionName());
g.addFamily(HConstants.CATALOG_FAMILY);
try {
  HRegionInterface server =
master.connection.getHRegionConnection(mr.getServer());
  Result r = server.get(mr.getRegionName(), g);
  // If size  3 -- presume regioninfo, startcode and server -- then presume
  // that this daughter already assigned and return.
  if (r.size() = 3) return;
} catch (IOException e) {
  LOG.warn(Failed get on  + HConstants.CATALOG_FAMILY_STR +
; possible double-assignment?, e);
}
this.master.regionManager.setUnassigned(hri, false);
  }

So, the above is not working in your case for some reason.   I'll take
a look but I'm not sure I can figure it w/o DEBUG (thanks for letting
me know about the out-of-sync clocks... Now I can have more faith in
what the logs are telling me).


  With DEBUG enabled have you been able to reproduce?
  These days the exception did not appera again, if it would, I'll show you
 the logs.


For sure, if you come across it again, I'm interested.

Thanks Zheng,
St.Ack


Re: Data Loss During Bulk Load

2010-03-22 Thread Stack
For sure each record in the input data is being uploaded with a unique
key?  For example, if same rowid and column and you are asking the
regionserver to supply the timestamp, if you add two cells with same
row+column coordinates, they'll both end up with the same
row/family/qualifier/timestamp key.  When you do your count, we'll
only see the last instance added.

St.Ack

On Mon, Mar 22, 2010 at 8:15 AM, Nathan Harkenrider
nathan.harkenri...@gmail.com wrote:
 Thanks Ryan.

 We currently have the xceiver count set to 16k (not sure if this is too
 high) and the fh max is 32k, and are still seeing the data loss issue.

 I'll dig through the datanode logs for errors and report back.

 Regards,

 Nathan

 On Sun, Mar 21, 2010 at 7:11 PM, Ryan Rawson ryano...@gmail.com wrote:

 Maybe you are having HDFS capacity issues?  Check your datanode logs
 for any exceptions.  While you are at it, double check the xceiver
 count is set high (2048 is a good value) and the ulimit -n (fh max) is
 also reasonably high - 32k should do it.

 I recently ran an import of 36 hours and perfectly imported 24 billion
 rows into 2 tables and the row counts between the tables lined up
 exactly.

 PS: one other thing, in your close() method of your map reduce, you
 call HTable#flushCommits() right? right?

 On Sun, Mar 21, 2010 at 3:50 PM, Nathan Harkenrider
 nathan.harkenri...@gmail.com wrote:
  Hi All,
 
  I'm currently running into data loss issues when bulk loading data into
  HBase. I'm loading data via a Map/Reduce job that is parsing XML and
  inserting rows into 2 HBase tables. The job is currently configured to
 run
  30 mappers concurrently (3 per node) and is inserting at a rate of
  approximately 6000 rows/sec. The Map/Reduce job appears to run correctly,
  however, when I run the HBase rowcounter job on the tables afterwards the
  row count is less than expected. The data loss is small percentage wise
  (~200,000 rows out of 80,000,000) but concerning nevertheless.
 
  I managed to locate the following errors in the regionserver logs related
 to
  failed compactions and/or splits.
  http://pastebin.com/5WjDpS9F
 
  I'm running HBase 0.20.3 and Cloudera CDH2, on CentOS 5.4. The cluster is
  comprised of 11 machines, 1 master and 10 region servers. Each machine is
 8
  cores, 8GB ram. A
 
  Any advice is appreciated. Thanks,
 
  Nathan Harkenrider
  nathan.harkenri...@gmail.com
 




Re: Data Loss During Bulk Load

2010-03-22 Thread Stack
On Mon, Mar 22, 2010 at 1:37 PM, Stack st...@duboce.net wrote:
 ...For example, if same rowid and column and you are asking the
 regionserver to supply the timestamp, if you add two cells with same
 row+column coordinates, they'll both end up with the same
 row/family/qualifier/timestamp key.

Sorry, I left the 'in the same millisecond' clause out of the above sentence.

St.Ack


Re: Data Loss During Bulk Load

2010-03-22 Thread Stack
On Sun, Mar 21, 2010 at 3:50 PM, Nathan Harkenrider
nathan.harkenri...@gmail.com wrote:
 I managed to locate the following errors in the regionserver logs related to
 failed compactions and/or splits.
 http://pastebin.com/5WjDpS9F

Is there anything else earlier in the logs on why the fail happened?
You might try running one MR task per node rather than 3.  You only
have 8G of RAM so three concurrent children are taking resources from
running datanodes and regionservers.
St.Ack


Re: Cannot open filename Exceptions

2010-03-18 Thread Stack
Yeah, I had to retry a couple of times (Too busy; try back later --
or sign up premium service!).

It would have been nice to have wider log snippets.  I'd like to have
seen if the issue was double assignment.  The master log snippet only
shows the split.  Regionserver 209's log is the one where the
interesting stuff is going on around this time, 2010-03-15
16:06:51,150, but its not in the provided set.  Neither are you
running at DEBUG level so it'd be harder to see what is up even if you
provided it.

Looking in 208, I see a few exceptions beyond the one you paste below.
 For sure you've upped xceivers on your hdfs cluster and you've upped
the file descriptors as per the 'Getting Started'? (Sorry, have to
ask).

Can I have more of the logs?  Can I have all of the namenode log, all
of the master log and 209's log?  This rapidshare thing is fine with
me.  I don't mind retrying.

Sorry it took me a while to get to this.
St.Ack











On Wed, Mar 17, 2010 at 8:32 PM, Zheng Lv lvzheng19800...@gmail.com wrote:
 Hello Stack,
    Sorry. It's taken me a while.  Let try and get to this this evening
    Is it downloading the log files what take you a while? I m sorry, I used
 to upload files to skydrive, but now we cant access the website. Is there
 any netdisk or something you can download fast? I can upload to it.
    LvZheng
 2010/3/18 Stack saint@gmail.com

 Sorry. It's taken me a while.  Let try and get to this this evening

 Thank you for your patience




 On Mar 17, 2010, at 2:29 AM, Zheng Lv lvzheng19800...@gmail.com wrote:

 Hello Stack,
  Did you receive my mail?It looks like you didnt.
   LvZheng

 2010/3/16 Zheng Lv lvzheng19800...@gmail.com

 Hello Stack,
  I have uploaded some parts of the logs on master, regionserver208 and
 regionserver210 to:
  http://rapidshare.com/files/363988384/master_207_log.txt.html
  http://rapidshare.com/files/363988673/regionserver_208_log.txt.html
  http://rapidshare.com/files/363988819/regionserver_210_log.txt.html
  I noticed that there are some LeaseExpiredException and 2010-03-15
 16:06:32,864 ERROR
 org.apache.hadoop.hbase.regionserver.CompactSplitThread:
 Compaction/Split failed for region ... before 17 oclock. Did these lead
 to
 the error? Why did these happened? How to avoid these?
  Thanks.
   LvZheng
 2010/3/16 Stack st...@duboce.net

 Maybe just the master log would be sufficient from around this time to
 figure the story.
 St.Ack

 On Mon, Mar 15, 2010 at 10:04 PM, Stack st...@duboce.net wrote:

 Hey Zheng:

 On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com

 wrote:

 Hello Stack,
 After we got these exceptions, we restart the cluster and restarted

 the

 job that failed, and the job succeeded.
 Now when we access

 /hbase/summary/1491233486/metrics/5046821377427277894,

 we got  Cannot access
 /hbase/summary/1491233486/metrics/5046821377427277894: No such file or
 directory. .


 So, that would seem to indicate that the reference was in memory
 only.. that file was not in filesystem.  You could have tried closing
 that region.   It would have been interesting also to find history on
 that region, to try and figure how it came to hold in memory a
 reference to a file since removed.

 The messages about this file in namenode logs are in here:
 http://rapidshare.com/files/363938595/log.txt.html


 This is interesting.  Do you have regionserver logs from 209, 208, and
 210 for corresponding times?

 Thanks,
 St.Ack

 The job failed startted about at 17 o'clock.
 By the way, the hadoop version we are using is 0.20.1, the hbase

 version

 we are using is 0.20.3.

 Regards,
 LvZheng
 2010/3/16 Stack st...@duboce.net

 Can you get that file from hdfs?

 ./bin/hadoop fs -get

 /hbase/summary/1491233486/metrics/5046821377427277894

 Does it look wholesome?  Is it empty?

 What if you trace the life of that file in regionserver logs or
 probably better, over in namenode log?  If you move this file aside,
 the region deploys?

 St.Ack

 On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv lvzheng19800...@gmail.com
 
 wrote:

 Hello Everyone,
  Recently we often got these in our client logs:
  org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying

 to

  contact region server 172.16.1.208:60020 for region



 summary,SITE_32\x01pt\x012010031400\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017,

  row



 'SITE_32\x01pt\x012010031500\x01

Re: Multi ranges Scan

2010-03-17 Thread Stack
I think you need to make a custom splitter for your mapreduce job, one
that makes splits that align with the ranges you'd have your job run
over.   A permutation on HBASE-2302 might work for you.

St.Ack

On Wed, Mar 17, 2010 at 1:32 PM, Andrey Kolyadenko cryp...@mailx.ru wrote:
 Hi all,

 maybe somebody could give me advice in the following situation:

 Currently HBase Scan interface provides ability to set up only first and
 last rows for MR scanning. Is it any way to get multiple ranges into the map
 input?

 For example let's assume I have following table:
 key value
 1   v1
 2   v2
 3   v3
 4   v4
 5   v5

 What I need is to get for example [1,2) and [4,5) ranges as input for my Map
 task. Actually I need this for the performance optimization.

 Any advice?

 Thanks.

 ---
 Миллионы анкет ждут Вас на на http://mylove.in.ua
 Немедленная регистрация здесь http://mylove.in.ua/my/reg.phtml

 Биржа ссылок, тысячи отзывов о нас в Рунете
 http://www.sape.ru/r.7fddbf83ee.php




Re: Cannot open filename Exceptions

2010-03-17 Thread Stack

Sorry. It's taken me a while.  Let try and get to this this evening

Thank you for your patience



On Mar 17, 2010, at 2:29 AM, Zheng Lv lvzheng19800...@gmail.com wrote:


Hello Stack,
 Did you receive my mail?It looks like you didnt.
   LvZheng

2010/3/16 Zheng Lv lvzheng19800...@gmail.com


Hello Stack,
 I have uploaded some parts of the logs on master, regionserver208  
and

regionserver210 to:
 http://rapidshare.com/files/363988384/master_207_log.txt.html
 http://rapidshare.com/files/363988673/regionserver_208_log.txt.html
 http://rapidshare.com/files/363988819/regionserver_210_log.txt.html
 I noticed that there are some LeaseExpiredException and 2010-03-15
16:06:32,864 ERROR  
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region ... before 17 oclock. Did these  
lead to

the error? Why did these happened? How to avoid these?
 Thanks.
   LvZheng
2010/3/16 Stack st...@duboce.net

Maybe just the master log would be sufficient from around this  
time to

figure the story.
St.Ack

On Mon, Mar 15, 2010 at 10:04 PM, Stack st...@duboce.net wrote:

Hey Zheng:

On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com 


wrote:

Hello Stack,
After we got these exceptions, we restart the cluster and  
restarted

the

job that failed, and the job succeeded.
Now when we access

/hbase/summary/1491233486/metrics/5046821377427277894,

we got  Cannot access
/hbase/summary/1491233486/metrics/5046821377427277894: No such  
file or

directory. .



So, that would seem to indicate that the reference was in memory
only.. that file was not in filesystem.  You could have tried  
closing
that region.   It would have been interesting also to find  
history on

that region, to try and figure how it came to hold in memory a
reference to a file since removed.


The messages about this file in namenode logs are in here:
http://rapidshare.com/files/363938595/log.txt.html


This is interesting.  Do you have regionserver logs from 209,  
208, and

210 for corresponding times?

Thanks,
St.Ack


The job failed startted about at 17 o'clock.
By the way, the hadoop version we are using is 0.20.1, the hbase

version

we are using is 0.20.3.

Regards,
LvZheng
2010/3/16 Stack st...@duboce.net


Can you get that file from hdfs?


./bin/hadoop fs -get

/hbase/summary/1491233486/metrics/5046821377427277894

Does it look wholesome?  Is it empty?

What if you trace the life of that file in regionserver logs or
probably better, over in namenode log?  If you move this file  
aside,

the region deploys?

St.Ack

On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv lvzheng19800...@gmail.com 


wrote:

Hello Everyone,
  Recently we often got these in our client logs:
  org.apache.hadoop.hbase.client.RetriesExhaustedException:  
Trying

to

contact region server 172.16.1.208:60020 for region



summary,SITE_32\x01pt\x012010031400\x01\x25E7\x258C 
\x25AE\x25E5\x258E\x25BF 
\x25E5\ 
x2586\ 
x2580\ 
x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D 
\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD 
\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC 
\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583-- 
\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE 
\x2589\x25E5\x2585\x25A8\x25E7\x259A 
\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D 
\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA 
\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B 
\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC 
\x2581,1268640385017,

row



'SITE_32\x01pt\x012010031500\x01\x2521\x25EF\x25BC 
\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F 
\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T 
\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF 
\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F 
\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD 
\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA 
\x25E5\x2599\x25A8\x25EF\x25BC\x258C 
\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583-- 
\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE 
\x2589\x25E5\x2585\x25A8\x25E7\x259A 
\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D 
\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA 
\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B 
\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581',

but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: Cannot open filename
/hbase/summary/1491233486/metrics/5046821377427277894
at



org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo 
(DFSClient.java:1474)

at



org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode 
(DFSClient.java:1800)

at



org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo 
(DFSClient.java:1616)

at


org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read 
(DFSClient.java:1743)

at java.io.DataInputStream.read(DataInputStream.java:132

Re: Cannot open filename Exceptions

2010-03-16 Thread Stack
Hey Zheng:

On Mon, Mar 15, 2010 at 8:16 PM, Zheng Lv lvzheng19800...@gmail.com wrote:
 Hello Stack,
  After we got these exceptions, we restart the cluster and restarted the
 job that failed, and the job succeeded.
  Now when we access /hbase/summary/1491233486/metrics/5046821377427277894,
 we got  Cannot access
 /hbase/summary/1491233486/metrics/5046821377427277894: No such file or
 directory. .


So, that would seem to indicate that the reference was in memory
only.. that file was not in filesystem.  You could have tried closing
that region.   It would have been interesting also to find history on
that region, to try and figure how it came to hold in memory a
reference to a file since removed.

  The messages about this file in namenode logs are in here:
 http://rapidshare.com/files/363938595/log.txt.html

This is interesting.  Do you have regionserver logs from 209, 208, and
210 for corresponding times?

Thanks,
St.Ack

  The job failed startted about at 17 o'clock.
  By the way, the hadoop version we are using is 0.20.1, the hbase version
 we are using is 0.20.3.

  Regards,
  LvZheng
 2010/3/16 Stack st...@duboce.net

 Can you get that file from hdfs?

  ./bin/hadoop fs -get
  /hbase/summary/1491233486/metrics/5046821377427277894

 Does it look wholesome?  Is it empty?

 What if you trace the life of that file in regionserver logs or
 probably better, over in namenode log?  If you move this file aside,
 the region deploys?

 St.Ack

 On Mon, Mar 15, 2010 at 3:40 AM, Zheng Lv lvzheng19800...@gmail.com
 wrote:
  Hello Everyone,
     Recently we often got these in our client logs:
     org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
  contact region server 172.16.1.208:60020 for region
 
 summary,SITE_32\x01pt\x012010031400\x01\x25E7\x258C\x25AE\x25E5\x258E\x25BF\x25E5\x2586\x2580\x25E9\x25B9\x25B0\x25E6\x2591\x25A9\x25E6\x2593\x25A6\x25E6\x259D\x2590\x25E6\x2596\x2599\x25E5\x258E\x2582\x2B\x25E6\x25B1\x25BD\x25E8\x25BD\x25A6\x25E9\x2585\x258D\x25E4\x25BB\x25B6\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581,1268640385017,
  row
 
 'SITE_32\x01pt\x012010031500\x01\x2521\x25EF\x25BC\x2581\x25E9\x2594\x2580\x25E5\x2594\x25AE\x252F\x25E6\x2594\x25B6\x25E8\x25B4\x25AD\x25EF\x25BC\x2581VM700T\x2BVM700T\x2B\x25E5\x259B\x25BE\x25E5\x2583\x258F\x25E4\x25BF\x25A1\x25E5\x258F\x25B7\x25E4\x25BA\x25A7\x25E7\x2594\x259F\x25E5\x2599\x25A8\x2B\x25E7\x2594\x25B5\x25E5\x25AD\x2590\x25E6\x25B5\x258B\x25E9\x2587\x258F\x25E4\x25BB\x25AA\x25E5\x2599\x25A8\x25EF\x25BC\x258C\x25E5\x2598\x2580\x25E9\x2593\x2583\x25E9\x2593\x2583--\x25E7\x259C\x259F\x25E5\x25AE\x259E\x25E5\x25AE\x2589\x25E5\x2585\x25A8\x25E7\x259A\x2584\x25E7\x2594\x25B5\x25E8\x25AF\x259D\x25E3\x2580\x2581\x25E7\x25BD\x2591\x25E7\x25BB\x259C\x25E4\x25BA\x2592\x25E5\x258A\x25A8\x25E4\x25BA\x25A4\x25E5\x258F\x258B\x25E7\x25A4\x25BE\x25E5\x258C\x25BA\x25EF\x25BC\x2581',
  but failed after 10 attempts.
  Exceptions:
  java.io.IOException: java.io.IOException: Cannot open filename
  /hbase/summary/1491233486/metrics/5046821377427277894
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1800)
  at
 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
  at java.io.DataInputStream.read(DataInputStream.java:132)
  at
 
 org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99)
  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
  at
 org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1020)
  at
 org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:971)
  at
 
 org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1304)
  at
 
 org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1186)
  at
 
 org.apache.hadoop.hbase.io.HalfHFileReader$1.seekTo(HalfHFileReader.java:207)
  at
 
 org.apache.hadoop.hbase.regionserver.StoreFileGetScan.getStoreFile(StoreFileGetScan.java:80)
  at
 
 org.apache.hadoop.hbase.regionserver.StoreFileGetScan.get(StoreFileGetScan.java:65)
  at org.apache.hadoop.hbase.regionserver.Store.get(Store.java:1461)
  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2396)
  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2385)
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1731)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source

  1   2   3   4   5   6   7   8   9   10   >