date:20120910

Re: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-10 Thread Dhirendra Singh

i tried with a smaller caching i.e 10, it failed again, not its not really
a big cell. this small cluster(4 nodes) is only used for Hbase, i am
currently using hbase-0.92.1-cdh4.0.1. ,  could you just let me know how
could i debug this issue ?


aused by: org.apache.hadoop.hbase.client.ScannerTimeoutException:
99560ms passed since the last invocation, timeout is currently set to
6
at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1302)
at 
org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1399)
... 5 more
Caused by: org.apache.hadoop.hbase.UnknownScannerException:
org.apache.hadoop.hbase.UnknownScannerException: Name:
-8889369042827960647
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2114)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)



On Mon, Sep 10, 2012 at 10:53 PM, Stack  wrote:

> On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh 
> wrote:
> > I am facing this exception while iterating over a big table,  by default
> i
> > have specified caching as 100,
> >
> > i am getting the below exception, even though i checked there are several
> > calls made to the scanner before it threw this exception, but somehow its
> > saying 86095ms were passed since last invocation.
> >
> > i also observed that if it set scan.setCaching(false),  it succeeds,
>  could
> > some one please explain or point me to some document as if what's
> happening
> > here and what's the best practices to avoid it.
> >
> >
>
> Try again cachine < 100.  See if it works.  A big cell?  A GC pause?
> You should be able to tell roughly which server is being traversed
> when you get the timeout.  Anything else going on on that server at
> the time?  What version of HBase?
> St.Ack
>



-- 
Warm Regards,
Dhirendra Pratap
+91. 9717394713

java.io.IOException: Pass a Delete or a Put

2012-09-10 Thread Jothikumar Ekanath

Hi,
   Getting this error while using hbase as a sink.


Error
java.io.IOException: Pass a Delete or a Put
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:125)
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:84)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:156)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)



 Below is my code
Using the following version

Hbase = 0.94
Hadoop - 1.0.3

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.*;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;

public class DailyAggMapReduce {

public static void main(String args[]) throws Exception {
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "DailyAverageMR");
job.setJarByClass(DailyAggMapReduce.class);
Scan scan = new Scan();
// 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCaching(500);
// don't set to true for MR jobs
scan.setCacheBlocks(false);

TableMapReduceUtil.initTableMapperJob(
"HTASDB",// input table
scan,   // Scan instance to control CF and
attribute selection
DailySumMapper.class, // mapper class
Text.class, // mapper output key
Text.class,  // mapper output value
job);

TableMapReduceUtil.initTableReducerJob(
"DA",// output table
DailySumReducer.class,// reducer class
job);

//job.setOutputValueClass(Put.class);
job.setNumReduceTasks(1);   // at least one, adjust as required

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}

}


public static class DailySumMapper extends TableMapper {

public void map(ImmutableBytesWritable row, Result value,
Mapper.Context context) throws IOException, InterruptedException {
List key = getRowKey(row.get());
Text rowKey = new Text(key.get(0));
int time = Integer.parseInt(key.get(1));
//limiting the time for one day (Aug 04 2012) -- Testing, Not a
good way
if (time <= 1344146400) {
List data = value.list();
long inbound = 0l;
long outbound = 0l;
for (KeyValue kv : data) {
List values = getValues(kv.getValue());
if (values.get(0) != -1) {
inbound = inbound + values.get(0);
}
if (values.get(1) != -1) {
outbound = outbound + values.get(1);
}
}
context.write(rowKey, new Text(String.valueOf(inbound) +
"-" + String.valueOf(outbound)));
}
}

private static List getValues(byte[] data) {
List values = new ArrayList();
ByteBuffer buffer = ByteBuffer.wrap(data);
values.add(buffer.getLong());
values.add(buffer.getLong());
return values;
}

private static List getRowKey(byte[] key) {
List keys = new ArrayList();
ByteBuffer buffer = ByteBuffer.wrap(key);
StringBuilder sb = new StringBuilder();
sb.append(buffer.getInt());
sb.append("-");
if (key.length == 13) {
sb.ap

Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread lars hofhansl

You want to look at the hbase-xxx.jar inside the .tar.gz archive.


Tested with 0.94.1:

$ tar -O -xf hbase-0.94.1.tar.gz "*hbase-0.94.1.jar" | jar -t | grep 
hbase-default.xmlhbase-default.xml


It's there. :)

-- Lars


- Original Message -
From: Srinivas Mupparapu 
To: user@hbase.apache.org
Cc: 
Sent: Monday, September 10, 2012 10:26 AM
Subject: Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if 
using RPM install

I just installed HBase from .tar.gz file and I couldn't find that file
either.

Thanks,
Srinivas M
On Sep 10, 2012 11:03 AM, "huaxiang"  wrote:

> Hi，
>    I don't find the hbase-default.xml file using following command, any
> other way?
>    To be clear, this hadoop was installed with CDH RPM package.
>
> Huaxiang
>
> [root@hadoop1 ~]# clear
> [root@hadoop1 ~]# rpm -qlp *rpm_file_name.rpm*
> [root@hadoop1 ~]# ^C
> [root@hadoop1 ~]# find / -name "*hbase-default.xml*"
> /usr/share/doc/hbase-0.92.1+67/hbase-default.xml
> [root@hadoop1 ~]#
>
> -邮件原件-
> 发件人: Monish r [mailto:monishs...@gmail.com]
> 发送时间: 2012年9月10日 15:00
> 收件人: user@hbase.apache.org
> 主题: Re: for CDH4.0, where can i find the hbase-default.xml file if using
> RPM install
>
> Hi,
> Try
>
> rpm -qlp *rpm_file_name.rpm*
>
> This will list all files in the rpm , from this u can know where
> hbase-default.xml is.
>
>
> On Sat, Sep 8, 2012 at 3:16 PM, John Hancock 
> wrote:
>
> > Huaxiang,
> >
> > This may not be the quickest way to find it, but if it's anywhere in
> > your system, this command will find it:
> >
> > find / -name "*hbase-default.xml*"
> >
> > or
> >
> > cd / find / -name "*hbase-default.xml*" > temp.txt
> >
> > will save the output of the find command to a text file leaving out
> > any error messages that might be distracting.
> >
> >
> > -John
> >
> >
> >
> > On Sat, Sep 8, 2012 at 12:47 AM, huaxiang
> >  > >wrote:
> >
> > > Hi,
> > >
> > > I install CDH4.0 with RPM package, but I cannot find the
> > hbase-default.xml
> > > file?
> > >
> > > Where can I find it?
> > >
> > >
> > >
> > > Best R.
> > >
> > >
> > >
> > > Huaxiang
> > >
> > >
> >
>
>

Re: Put and Increment atomically

2012-09-10 Thread Jean-Daniel Cryans

Hi Pablo,

It's currently not possible (like you saw).

What's your use case? Maybe there's different/better way to achieve
what you want to do?

J-D

On Mon, Sep 3, 2012 at 1:22 PM, Pablo Musa  wrote:
> Hey guys,
> I want to insert new columns into a row:fam and increment 2 of them 
> atomically.
> In other words, I am using columns as entries and I want to count them and 
> their
> sizes.
>
> For example, I want to insert these new columns:
> row1:f1:c1 = h_v_j
> row1:f1:c2 = g_r_w
> row1:f1:c3 = z_l_p
> row1:f1:c4 = n_m_j
>
> *The values are fixed size information which I used '_' as separator above.
>
> But I also want to keep track of how many columns do I have and sum one of
> the information.
> row1:f1:count = Increment(4)
> row1:f1:size = Increment(j+w+p+j)
>
> I can do them separately, send the puts and then send the increments, but
> I would like to know if it is possible to do it atomically.
>
> I did some search and found the link below, but it only talks about 
> put/delete.
> Why not adding increment?
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/RowMutations.html
>
> I don't know if it is clear enough.
>
> Thanks,
> Pablo

Re: Tracking down coprocessor pauses

2012-09-10 Thread Tom Brown

Micheal,

We are using HBase to track the usage of our service. Specifically,
each client sends an update when they start a task, at regular
intervals during the task, and an update when they finish a task (and
then presumably they start another, continuing the cycle). Each user
has various attributes (which version of our software they're using,
their location, which task they're working on, etc), and we want to be
able to see stats in aggregate, and be able to drill-down into various
areas (similar to OLAP; Incidentally, we chose HBase because none of
the OLAP systems seemed to accept real-time updates).

The key design is a compound of:  [Attribute1 Attribute2... AttributeN].

Each row has roughly 10 cells, all of which represent counters; Some
require simple incrementing, but others require fancier bitwise
operations to properly increment (using HyperLogLog to estimate a
unique count).

The rows are stored with a 15-second granularity (everything from
0:00-0:15 is stored in one row, everything from 0:15-0:30 is in the
next, etc). The data is formatted such that you can get the
aggregation for a larger time period by combining all of the rows that
comprise that time frame. For the counter cells, this uses straight
addition. For the unique counters, bitwise operations are required.

The most frequently requested data has only one or two relevant
attributes. For example, we commonly want to see the stats of our
system broken out just by task. Of course, that makes writes a little
more difficult. When we have 1000's of users working on the same kind
of task, we'll receive a lot of concurrent updates to the row with
[attribute=TheTask]. HBase supports atomic increments, but not atomic
bitwise operations, so we were required to implement a locking
solution on our own.

There seemed to be a lot of problems with row-level locks, so we
decided to do the locking in the one place we could guarantee it: a
coprocessor. Within the coprocessor is logic to coalesce multiple
updates to the same row into a single HBase update. When performing
aggregations, a requested time period might summarize thousands of
rows into a single summary row. We thought that sending the entire set
over the network was overkill, especially since the aggregation
operations are fairly simple (addition and some bitwise calculations),
so the co-processor also contains code to perform aggregations.

I'm interested in improving the design, so any suggestions will be appreciated.

Thanks in advance,

--Tom

On Mon, Sep 10, 2012 at 12:45 PM, Michael Segel
 wrote:
>
> On Sep 10, 2012, at 12:32 PM, Tom Brown  wrote:
>
>> We have our system setup such that all interaction is done through
>> co-processors. We update the database via a co-processor (it has the
>> appropriate logic for dealing with concurrent access to rows), and we
>> also query/aggregate via co-processor (since we don't want to send all
>> the data over the network).
>
> Could you expand on this? On the surface, this doesn't sound like a very good 
> idea.
>

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 12:05 PM, Norbert Burger
 wrote:
>
Mind putting up full listing in pastebin?

Let me have a look.

We could try a master restart too... so it refreshes its in-memory
state.  That might do it.

St.Ack

Re: Doubt in performance tuning

2012-09-10 Thread Michael Segel

Well, 

Lets actually skip a few rounds of questions... and start from the beginning. 

What does your physical cluster look like? 

On Sep 10, 2012, at 12:40 PM, Ramasubramanian 
 wrote:

> Hi,
> Will be helpful if u say specific things to  look into. Pls help
> 
> Regards,
> Rams
> 
> On 10-Sep-2012, at 10:40 PM, Stack  wrote:
> 
>> On Mon, Sep 10, 2012 at 9:58 AM, Ramasubramanian
>>  wrote:
>>> Hi,
>>> 
>>> Currently it takes 11 odd minutes to load 1.2 million record into hbase 
>>> from hdfs. Can u pls share some tips to do the same in few seconds?
>>> 
>>> We tried doing this in both pig script and in pentaho. Both are taking 11 
>>> odd minutes.
>>> 
>> 
>> You have had a look at http://hbase.apache.org/book.html#performance?
>> St.Ack
>

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Norbert Burger

On Mon, Sep 10, 2012 at 2:17 PM, Stack  wrote:
> Thanks.  I was asking about the info:regioninfo column that prints out
> the HRegionInfo for each region.  I was wondering if it included a
> status=offline attribute.
>
> You could try one region only and see if that makes a difference.

Hmmm... no status=offline anywhere in my dump of .META.  Will dig into
the code and try disable/enable when I get I chance.

Attached below is a copy of .META. for one of the problematic regions.
 As far as I can tell, it has all the required cols, and I don't see a
difference between this and a region which gets "displayed" correctly:

 sessions,,1342211893146.8 column=info:regioninfo,
timestamp=1342211893177, value=REGION => {NAME =>
 81ca12b4a6c9a7670bb7ef69b
'sessions,,1342211893146.881ca12b4a6c9a7670bb7ef69b3e5db4.', STARTKEY
=>
 3e5db4.'', ENDKEY => '01', ENCODED =>
881ca12b4a6c9a7670bb7ef69b3e5db4, TABLE =
   > {{NAME => 'sessions', FAMILIES => [{NAME
=> 'event', BLOOMFILTER => 'RO
   W', REPLICATION_SCOPE => '0', VERSIONS =>
'1', COMPRESSION => 'LZO', TTL
   => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE =
   > 'true'}]}}
 sessions,,1342211893146.8 column=info:server,
timestamp=1346979452245, value=aspen8hdp2.turner.com:
 81ca12b4a6c9a7670bb7ef69b 60020
 3e5db4.
 sessions,,1342211893146.8 column=info:serverstartcode,
timestamp=1346979452245, value=134697743
 81ca12b4a6c9a7670bb7ef69b
 3e5db4.

Thanks again,
Norbert

Re: Tracking down coprocessor pauses

2012-09-10 Thread Michael Segel


On Sep 10, 2012, at 12:32 PM, Tom Brown  wrote:

> We have our system setup such that all interaction is done through
> co-processors. We update the database via a co-processor (it has the
> appropriate logic for dealing with concurrent access to rows), and we
> also query/aggregate via co-processor (since we don't want to send all
> the data over the network).

Could you expand on this? On the surface, this doesn't sound like a very good 
idea.

Re: Tracking down coprocessor pauses

2012-09-10 Thread Andrew Purtell

On Mon, Sep 10, 2012 at 10:32 AM, Tom Brown  wrote:
> I want to know more details about the specifics of those requests; Is
> there an API I can use that will allow my coprocessor requests to be
> tracked more functionally? Is there a way to hook into the UI so I can
> provide my own list of running processes? Or would I have to write
> that all myself?
>
> I am using HBase 0.92.1, but will be upgrading to 0.94.1 soon.

I haven't actually done this, so YMMV, but you should be able to get a
reference to the TaskMonitor singleton
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/monitoring/TaskMonitor.html)
via the static method TaskMonitor.get() and then create and update the
state of MonitoredTasks
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/monitoring/MonitoredTask.html)
for your coprocessor's internal functions.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Re: BigDecimalColumnInterpreter

2012-09-10 Thread anil gupta

Hi Julian,

I am using only cdh4 libraries. I use the jars present under hadoop and
hbase installed dir. In my last email i gave you some more pointers. Try to
follow them and see what happens.
If then also it doesn't works for you, then i will try to write an utility
to test BigDecimalColumnInterpreter on your setup also.

Thanks,
Anil

On Mon, Sep 10, 2012 at 9:36 AM, Julian Wissmann
wrote:

> Hi,
>
> I haven't really gotten to working on this, since last wednesday.
> Checked readFields() and write() today, but don't really see, why I would
> need to reimplement those. Admittedly I'm not that into the whole Hbase
> codebase, yet, so there is a good chance I'm missing something, here.
>
> Also, Anil, what hbase library are you coding this against?
> It does seem like madness, that even though, we're both using this
> identically it does not work for me.
>
> Cheers,
>
> Julian
>
> 2012/9/6 anil gupta 
>
> > Yes, we do. :)
> > Let me know the outcome. If you look at the BD ColumnInterpreter,
> getValue
> > method is converting the byte array into BigDecimal. So you should not
> have
> > any problem. The BD ColumnInterpreter is pretty similar to
> > LongColumnInterpreter.
> >
> > Here is the code snippet for getValue() method which will convert Byte[]
> to
> > BigDecimal:
> >
> > @Override
> > public BigDecimal getValue(byte[] paramArrayOfByte1, byte[]
> > paramArrayOfByte2,
> > KeyValue kv) throws IOException {
> >  if ((kv == null || kv.getValue() == null))
> >return null;
> >  return Bytes.toBigDecimal(kv.getValue());
> > }
> >
> > Thanks,
> > Anil
> >
> >
> > On Thu, Sep 6, 2012 at 11:43 AM, Julian Wissmann
> > wrote:
> >
> > > 0.92.1 from cdh4. I assume we use the same thing.
> > >
> > > 2012/9/6 anil gupta 
> > >
> > > > I am using HBase0.92.1. Which version you are using?
> > > >
> > > >
> > > > On Thu, Sep 6, 2012 at 10:19 AM, anil gupta 
> > > wrote:
> > > >
> > > > > Hi Julian,
> > > > >
> > > > > You need to add the column qualifier explicitly in the scanner. You
> > > have
> > > > > only added the column family in the scanner.
> > > > > I am also assuming that you are writing a ByteArray of BigDecimal
> > > object
> > > > > as value of these cells in HBase. Is that right?
> > > > >
> > > > > Thanks,
> > > > > Anil
> > > > >
> > > > >
> > > > > On Thu, Sep 6, 2012 at 2:28 AM, Julian Wissmann <
> > > > julian.wissm...@sdace.de>wrote:
> > > > >
> > > > >> Hi, anil,
> > > > >>
> > > > >> I presume you mean something like this:
> > > > >> Scan scan = new Scan(_start, _end);
> > > > >> scan.addFamily(family.getBytes());
> > > > >> final ColumnInterpreter ci = new
> > > > >> mypackage.BigDecimalColumnInterpreter();
> > > > >> AggregationClient ag = new org.apache.hadoop.hbase.
> > > > >> client.coprocessor.AggregationClient(config);
> > > > >> BigDecimal sum = ag.sum(Bytes.toBytes(tableName), new
> > > > >> BigDecimalColumnInterpreter(), scan);
> > > > >>
> > > > >>
> > > > >> When I call this,with the Endpoint in place and loaded as a jar, I
> > get
> > > > the
> > > > >> above error.
> > > > >> When I call it without the endpoint loaded as coprocessor,
> though, I
> > > get
> > > > >> this:
> > > > >>
> > > > >> java.util.concurrent.ExecutionException:
> > > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > after
> > > > >> attempts=10, exceptions:
> > > > >> Thu Sep 06 11:07:39 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields
> > > > >> Thu Sep 06 11:07:40 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields
> > > > >> Thu Sep 06 11:07:41 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields
> > > > >> Thu Sep 06 11:07:42 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields
> > > > >> Thu Sep 06 11:07:44 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields
> > > > >> Thu Sep 06 11:07:46 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields
> > > > >> Thu Sep 06 11:07:50 CEST 2012,
> > > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > > >> java.io.IOException:
> > > > >> IPC server unable to read call parameters: Error in readFields

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 10:50 AM, Norbert Burger
 wrote:
> On Mon, Sep 10, 2012 at 1:37 PM, Stack  wrote:
>> What version of hbase?
>
> We're on cdh3u3, 0.90.4 + patches.
>
>> Can you disable and reenable the table?
>
> I will try disabling/re-enabling at the next opportunity.  Perhaps
> that'll resolve that the issue, but this is a PROD cluster, so
> unfortunately can't try right away.
>
>> When you scan the table in shell, do you see 'status=offline'.

Thanks.  I was asking about the info:regioninfo column that prints out
the HRegionInfo for each region.  I was wondering if it included a
status=offline attribute.

You could try one region only and see if that makes a difference.

My guess is that this a vestige of the rename script.  You disabled
before using it (as it asks for at head of script).

St.Ack

Re: More rows or less rows and more columns

2012-09-10 Thread Harsh J

Ah, sorry for assuming that then. I don't know of a way to sort
qualifiers. I haven't seen anyone do that or require it for
unstructured data (i.e. a query like "fetch me the latest qualifier
added to this row"). I suppose you can compare the last two versions
to see what was changed, but I still don't see why you need this?

For timeseries, I'd recommend looking at what OpenTSDB already provides though.

On Mon, Sep 10, 2012 at 11:32 PM, Mohit Anchlia  wrote:
> On Mon, Sep 10, 2012 at 10:59 AM, Harsh J  wrote:
>
>> Versions is what you're talking about, and by default all queries
>> return the latest version of updated values.
>>
>
> No actually I was asking if I have columns with qualifier:
>
> d,b,c,e can I store them sorted such that it is e,d,c,b? This ways I can
> just get the most recent qualifier or for timeseries most recent qualifier.
>
>>
>> On Mon, Sep 10, 2012 at 11:04 PM, Mohit Anchlia 
>> wrote:
>> > On Mon, Sep 10, 2012 at 10:30 AM, Harsh J  wrote:
>> >
>> >> Hey Mohit,
>> >>
>> >> See http://hbase.apache.org/book.html#schema.smackdown.rowscols
>> >
>> >
>> > Thanks! Is there a way in HBase to get the most recent inserted column?
>> Or
>> > a way to sort columns such that I can manage how many columns I want to
>> > read? In timeseries we might be interested in only most recent data
>> point.
>> >
>> >>
>> >>
>> >> On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia > >
>> >> wrote:
>> >> > Is there any recommendation on how many columns one should have per
>> row.
>> >> My
>> >> > columns are < 200 bytes. This will help me to decide if I should
>> shard my
>> >> > rows with id + .
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>>
>>
>>
>> --
>> Harsh J
>>



-- 
Harsh J

Re: More rows or less rows and more columns

2012-09-10 Thread Mohit Anchlia

On Mon, Sep 10, 2012 at 10:59 AM, Harsh J  wrote:

> Versions is what you're talking about, and by default all queries
> return the latest version of updated values.
>

No actually I was asking if I have columns with qualifier:

d,b,c,e can I store them sorted such that it is e,d,c,b? This ways I can
just get the most recent qualifier or for timeseries most recent qualifier.

>
> On Mon, Sep 10, 2012 at 11:04 PM, Mohit Anchlia 
> wrote:
> > On Mon, Sep 10, 2012 at 10:30 AM, Harsh J  wrote:
> >
> >> Hey Mohit,
> >>
> >> See http://hbase.apache.org/book.html#schema.smackdown.rowscols
> >
> >
> > Thanks! Is there a way in HBase to get the most recent inserted column?
> Or
> > a way to sort columns such that I can manage how many columns I want to
> > read? In timeseries we might be interested in only most recent data
> point.
> >
> >>
> >>
> >> On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia  >
> >> wrote:
> >> > Is there any recommendation on how many columns one should have per
> row.
> >> My
> >> > columns are < 200 bytes. This will help me to decide if I should
> shard my
> >> > rows with id + .
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: More rows or less rows and more columns

2012-09-10 Thread Harsh J

Versions is what you're talking about, and by default all queries
return the latest version of updated values.

On Mon, Sep 10, 2012 at 11:04 PM, Mohit Anchlia  wrote:
> On Mon, Sep 10, 2012 at 10:30 AM, Harsh J  wrote:
>
>> Hey Mohit,
>>
>> See http://hbase.apache.org/book.html#schema.smackdown.rowscols
>
>
> Thanks! Is there a way in HBase to get the most recent inserted column? Or
> a way to sort columns such that I can manage how many columns I want to
> read? In timeseries we might be interested in only most recent data point.
>
>>
>>
>> On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia 
>> wrote:
>> > Is there any recommendation on how many columns one should have per row.
>> My
>> > columns are < 200 bytes. This will help me to decide if I should shard my
>> > rows with id + .
>>
>>
>>
>> --
>> Harsh J
>>



-- 
Harsh J

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Norbert Burger

On Mon, Sep 10, 2012 at 1:37 PM, Stack  wrote:
> What version of hbase?

We're on cdh3u3, 0.90.4 + patches.

> Can you disable and reenable the table?

I will try disabling/re-enabling at the next opportunity.  Perhaps
that'll resolve that the issue, but this is a PROD cluster, so
unfortunately can't try right away.

> When you scan the table in shell, do you see 'status=offline'.

a) the master UI (table.jsp) shows the table, and reports enabled=true
b) is_enabled from the shell also reports true

Norbert

Re: Hbase filter-SubstringComparator vs full text search indexing

2012-09-10 Thread Otis Gospodnetic

Hello,

If you need to scan lots of log messages and process them use HBase
(or Hive or Pig or simply HDFS+MR)
If you need to query your data set by anything in the text of the log
message, use ElasticSearch or Solr 4.0 or Sensei or just Lucene.

Otis
-- 
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Sep 10, 2012 at 10:24 AM, Shengjie Min  wrote:
> In my case, I have all the log events stored in HDFS/hbase in this format:
>
> timestamp | priority | category | message body
>
> Given I have only 4 fields here, that limits my queries to only against
> these four. I am thinking about more advanced search like full text search
> the message body. well, mainly substring query against message body.
>
>1.
>
>Has anybody tried to use Hbase SubstringComparator? How does it perform,
>with reasonable huge amount of data, can it still provide us the real time
>response capability?
>2.
>
>In my case, does it make more sene to use a proper full text search
>engine(lucene/solr/elasticsearch) to index the message body, does that
>sound like a better idea?
>
> would be great someone experienced can share some stories here.
>
> -Shengjie Min

Re: Doubt in performance tuning

2012-09-10 Thread Ramasubramanian

Hi,
Will be helpful if u say specific things to  look into. Pls help

Regards,
Rams

On 10-Sep-2012, at 10:40 PM, Stack  wrote:

> On Mon, Sep 10, 2012 at 9:58 AM, Ramasubramanian
>  wrote:
>> Hi,
>> 
>> Currently it takes 11 odd minutes to load 1.2 million record into hbase from 
>> hdfs. Can u pls share some tips to do the same in few seconds?
>> 
>> We tried doing this in both pig script and in pentaho. Both are taking 11 
>> odd minutes.
>> 
> 
> You have had a look at http://hbase.apache.org/book.html#performance?
> St.Ack

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 10:33 AM, Norbert Burger
 wrote:
> On Mon, Sep 10, 2012 at 1:24 PM, Srinivas Mupparapu
>  wrote:
>> It scans .META. table just like any other table. I just tested it and it
>> produced the expected output.
>
> I'm pretty sure Srinivas scanned .META. in his own environment, not mine.  ;-)
>
>>  On Sep 10, 2012 12:19 PM, "Stack"  wrote:
>>> What happens if you scan .META. in shell?
>>>
>>> hbase> scan ".META."
>>>
>>> Does it all show?
>
> Thanks, Stack.  Strangely, all regions do show up in .META.  The table
> in question has 256 regions, and all are listed as rowkeys in .META.
> Perhaps there's a column missing from region definitions which is
> preventing the UI from rendering the regions?  I'll dig through the
> code, but are there specific columns known to be expected the UI?
>
> Fwiw, this particular table was the result of a rename_table.rb
> attempt that didn't go as smoothly as I would've liked.  I had to dig
> through .META. and resolve inconsistencies.
>

What version of hbase?  Can you disable and reenable the table?  When
you scan the table in shell, do you see 'status=offline'.
St.Ack

Re: More rows or less rows and more columns

2012-09-10 Thread Mohit Anchlia

On Mon, Sep 10, 2012 at 10:30 AM, Harsh J  wrote:

> Hey Mohit,
>
> See http://hbase.apache.org/book.html#schema.smackdown.rowscols

Thanks! Is there a way in HBase to get the most recent inserted column? Or
a way to sort columns such that I can manage how many columns I want to
read? In timeseries we might be interested in only most recent data point.

>
>
> On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia 
> wrote:
> > Is there any recommendation on how many columns one should have per row.
> My
> > columns are < 200 bytes. This will help me to decide if I should shard my
> > rows with id + .
>
>
>
> --
> Harsh J
>

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Norbert Burger

On Mon, Sep 10, 2012 at 1:24 PM, Srinivas Mupparapu
 wrote:
> It scans .META. table just like any other table. I just tested it and it
> produced the expected output.

I'm pretty sure Srinivas scanned .META. in his own environment, not mine.  ;-)

>  On Sep 10, 2012 12:19 PM, "Stack"  wrote:
>> What happens if you scan .META. in shell?
>>
>> hbase> scan ".META."
>>
>> Does it all show?

Thanks, Stack.  Strangely, all regions do show up in .META.  The table
in question has 256 regions, and all are listed as rowkeys in .META.
Perhaps there's a column missing from region definitions which is
preventing the UI from rendering the regions?  I'll dig through the
code, but are there specific columns known to be expected the UI?

Fwiw, this particular table was the result of a rename_table.rb
attempt that didn't go as smoothly as I would've liked.  I had to dig
through .META. and resolve inconsistencies.

Norbert

Tracking down coprocessor pauses

2012-09-10 Thread Tom Brown

Hi,

We have our system setup such that all interaction is done through
co-processors. We update the database via a co-processor (it has the
appropriate logic for dealing with concurrent access to rows), and we
also query/aggregate via co-processor (since we don't want to send all
the data over the network).

This generally works very well. However, some times one of the region
servers will "pause". This doesn't appear to be a GC pause since it
still serves up the UI, and adds occasional messages to the log
regarding the LRU. The only thing I've found is that when I check the
server that's causing the problem (easy to tell, since all the
"working" servers have a low load, and the problem server has a higher
load), I can see that there are a number of execCoprocessor requests
that have been executing for much longer than they should.

I want to know more details about the specifics of those requests; Is
there an API I can use that will allow my coprocessor requests to be
tracked more functionally? Is there a way to hook into the UI so I can
provide my own list of running processes? Or would I have to write
that all myself?

I am using HBase 0.92.1, but will be upgrading to 0.94.1 soon.

Thanks in advance!

--Tom

Re: More rows or less rows and more columns

2012-09-10 Thread Harsh J

Hey Mohit,

See http://hbase.apache.org/book.html#schema.smackdown.rowscols

On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia  wrote:
> Is there any recommendation on how many columns one should have per row. My
> columns are < 200 bytes. This will help me to decide if I should shard my
> rows with id + .



-- 
Harsh J

Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread Harsh J

Srinivas,

In the source tarball, the file is at
$HBASE_HOME/src/main/resources/hbase-default.xml

On Mon, Sep 10, 2012 at 10:56 PM, Srinivas Mupparapu
 wrote:
> I just installed HBase from .tar.gz file and I couldn't find that file
> either.
>
> Thanks,
> Srinivas M
> On Sep 10, 2012 11:03 AM, "huaxiang"  wrote:
>
>> Hi，
>>I don't find the hbase-default.xml file using following command, any
>> other way?
>>To be clear, this hadoop was installed with CDH RPM package.
>>
>> Huaxiang
>>
>> [root@hadoop1 ~]# clear
>> [root@hadoop1 ~]# rpm -qlp *rpm_file_name.rpm*
>> [root@hadoop1 ~]# ^C
>> [root@hadoop1 ~]# find / -name "*hbase-default.xml*"
>> /usr/share/doc/hbase-0.92.1+67/hbase-default.xml
>> [root@hadoop1 ~]#
>>
>> -邮件原件-
>> 发件人: Monish r [mailto:monishs...@gmail.com]
>> 发送时间: 2012年9月10日 15:00
>> 收件人: user@hbase.apache.org
>> 主题: Re: for CDH4.0, where can i find the hbase-default.xml file if using
>> RPM install
>>
>> Hi,
>> Try
>>
>> rpm -qlp *rpm_file_name.rpm*
>>
>> This will list all files in the rpm , from this u can know where
>> hbase-default.xml is.
>>
>>
>> On Sat, Sep 8, 2012 at 3:16 PM, John Hancock 
>> wrote:
>>
>> > Huaxiang,
>> >
>> > This may not be the quickest way to find it, but if it's anywhere in
>> > your system, this command will find it:
>> >
>> > find / -name "*hbase-default.xml*"
>> >
>> > or
>> >
>> > cd / find / -name "*hbase-default.xml*" > temp.txt
>> >
>> > will save the output of the find command to a text file leaving out
>> > any error messages that might be distracting.
>> >
>> >
>> > -John
>> >
>> >
>> >
>> > On Sat, Sep 8, 2012 at 12:47 AM, huaxiang
>> > > > >wrote:
>> >
>> > > Hi,
>> > >
>> > > I install CDH4.0 with RPM package, but I cannot find the
>> > hbase-default.xml
>> > > file?
>> > >
>> > > Where can I find it?
>> > >
>> > >
>> > >
>> > > Best R.
>> > >
>> > >
>> > >
>> > > Huaxiang
>> > >
>> > >
>> >
>>
>>



-- 
Harsh J

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 10:24 AM, Srinivas Mupparapu
 wrote:
> It scans .META. table just like any other table. I just tested it and it
> produced the expected output.
>

When you refresh the master UI, it makes a few lines in the master
log.  Are these the lines you posted?  Mind checking again?  What does
the Master UI page look like?  Complete?  Or is it cut off where its
should be listing regions (maybe look at html src?).

If shell can scan .META., odd that UI can't.  Lets try and figure the
difference.

St.Ack


> Thanks,
> Srinivas M
>  On Sep 10, 2012 12:19 PM, "Stack"  wrote:
>
>> On Mon, Sep 10, 2012 at 8:33 AM, Norbert Burger
>>  wrote:
>> > Hi all -- we're currently on cdh3u3 (0.90.4 + patches).  I have one
>> > table in our cluster which seems to functioning fine (gets/puts/scans
>> > are all working), but for which no regions are listed on the UI.  The
>> > table/regions exist in .META.  Other tables in the same cluster show
>> > their regions list fine.  Seems like this might be a problem with
>> > .META. or ZK, but would appreciate any pointers.
>> >
>> > 1) hbase hbck reports 2 "multiply assigned to region servers"
>> > inconsistencies, but on a table different than the one I'm having
>> > problems with.
>> > 2) The hbase master log shows this fragment when navigating to
>> > table.jsp for the affected table:
>> >
>> > 2012-09-10 11:29:07,682 DEBUG org.apache.zookeeper.ClientCnxn: Reading
>> > reply sessionid:0x1370e3604c49580, packet:: clientPath:null
>> > serverPath:null finished:false header:: 10,4  replyHeader::
>> > 10,167713215,-101  request:: '/hbase/table/sessions,F  response::
>> > 2012-09-10 11:29:07,682 DEBUG
>> > org.apache.hadoop.hbase.zookeeper.ZKUtil:
>> > hconnection-0x1370e3604c49580 Unable to get data of znode
>> > /hbase/table/sessions because node does not exist (not an error)
>> > 2012-09-10 11:29:07,682 DEBUG
>> > org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting
>> > at row=sessions,,00 for max=2147483647 rows
>> >
>> > But since I see this "Unable to get data of znode" for all tables, my
>> > assumption is that it's a red herring.  Any thoughts as how to debug
>> > further, or why only this table would not show a region list?
>> >
>>
>> What happens if you scan .META. in shell?
>>
>> hbase> scan ".META."
>>
>> Does it all show?
>>
>> (You might want to echo into a file so you can poke around after scan is
>> done).
>>
>> St.Ack
>>

Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread Harsh J

HBase in packaged form bundles the default XML only inside the HBase
jar(s). You need to download a source package tarball to get the
default XML otherwise.

> /usr/share/doc/hbase-0.92.1+67/hbase-default.xml

The above looks right, you can use that as a reference. Looks to be
installed via a docs package from the same distribution.

Otherwise, get it via a simple unzip from the jar:

$ unzip $HBASE_HOME/hbase-0.92.1-cdh4.0.1-security.jar hbase-default.xml
$ cat hbase-default.xml

On Mon, Sep 10, 2012 at 9:32 PM, huaxiang  wrote:
> Hi，
>I don't find the hbase-default.xml file using following command, any
> other way?
>To be clear, this hadoop was installed with CDH RPM package.
>
> Huaxiang
>
> [root@hadoop1 ~]# clear
> [root@hadoop1 ~]# rpm -qlp *rpm_file_name.rpm*
> [root@hadoop1 ~]# ^C
> [root@hadoop1 ~]# find / -name "*hbase-default.xml*"
> /usr/share/doc/hbase-0.92.1+67/hbase-default.xml
> [root@hadoop1 ~]#
>
> -邮件原件-
> 发件人: Monish r [mailto:monishs...@gmail.com]
> 发送时间: 2012年9月10日 15:00
> 收件人: user@hbase.apache.org
> 主题: Re: for CDH4.0, where can i find the hbase-default.xml file if using
> RPM install
>
> Hi,
> Try
>
> rpm -qlp *rpm_file_name.rpm*
>
> This will list all files in the rpm , from this u can know where
> hbase-default.xml is.
>
>
> On Sat, Sep 8, 2012 at 3:16 PM, John Hancock  wrote:
>
>> Huaxiang,
>>
>> This may not be the quickest way to find it, but if it's anywhere in
>> your system, this command will find it:
>>
>> find / -name "*hbase-default.xml*"
>>
>> or
>>
>> cd / find / -name "*hbase-default.xml*" > temp.txt
>>
>> will save the output of the find command to a text file leaving out
>> any error messages that might be distracting.
>>
>>
>> -John
>>
>>
>>
>> On Sat, Sep 8, 2012 at 12:47 AM, huaxiang
>> > >wrote:
>>
>> > Hi,
>> >
>> > I install CDH4.0 with RPM package, but I cannot find the
>> hbase-default.xml
>> > file?
>> >
>> > Where can I find it?
>> >
>> >
>> >
>> > Best R.
>> >
>> >
>> >
>> > Huaxiang
>> >
>> >
>>
>



-- 
Harsh J

Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread Srinivas Mupparapu

I just installed HBase from .tar.gz file and I couldn't find that file
either.

Thanks,
Srinivas M
On Sep 10, 2012 11:03 AM, "huaxiang"  wrote:

> Hi，
>I don't find the hbase-default.xml file using following command, any
> other way?
>To be clear, this hadoop was installed with CDH RPM package.
>
> Huaxiang
>
> [root@hadoop1 ~]# clear
> [root@hadoop1 ~]# rpm -qlp *rpm_file_name.rpm*
> [root@hadoop1 ~]# ^C
> [root@hadoop1 ~]# find / -name "*hbase-default.xml*"
> /usr/share/doc/hbase-0.92.1+67/hbase-default.xml
> [root@hadoop1 ~]#
>
> -邮件原件-
> 发件人: Monish r [mailto:monishs...@gmail.com]
> 发送时间: 2012年9月10日 15:00
> 收件人: user@hbase.apache.org
> 主题: Re: for CDH4.0, where can i find the hbase-default.xml file if using
> RPM install
>
> Hi,
> Try
>
> rpm -qlp *rpm_file_name.rpm*
>
> This will list all files in the rpm , from this u can know where
> hbase-default.xml is.
>
>
> On Sat, Sep 8, 2012 at 3:16 PM, John Hancock 
> wrote:
>
> > Huaxiang,
> >
> > This may not be the quickest way to find it, but if it's anywhere in
> > your system, this command will find it:
> >
> > find / -name "*hbase-default.xml*"
> >
> > or
> >
> > cd / find / -name "*hbase-default.xml*" > temp.txt
> >
> > will save the output of the find command to a text file leaving out
> > any error messages that might be distracting.
> >
> >
> > -John
> >
> >
> >
> > On Sat, Sep 8, 2012 at 12:47 AM, huaxiang
> >  > >wrote:
> >
> > > Hi,
> > >
> > > I install CDH4.0 with RPM package, but I cannot find the
> > hbase-default.xml
> > > file?
> > >
> > > Where can I find it?
> > >
> > >
> > >
> > > Best R.
> > >
> > >
> > >
> > > Huaxiang
> > >
> > >
> >
>
>

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Srinivas Mupparapu

It scans .META. table just like any other table. I just tested it and it
produced the expected output.

Thanks,
Srinivas M
 On Sep 10, 2012 12:19 PM, "Stack"  wrote:

> On Mon, Sep 10, 2012 at 8:33 AM, Norbert Burger
>  wrote:
> > Hi all -- we're currently on cdh3u3 (0.90.4 + patches).  I have one
> > table in our cluster which seems to functioning fine (gets/puts/scans
> > are all working), but for which no regions are listed on the UI.  The
> > table/regions exist in .META.  Other tables in the same cluster show
> > their regions list fine.  Seems like this might be a problem with
> > .META. or ZK, but would appreciate any pointers.
> >
> > 1) hbase hbck reports 2 "multiply assigned to region servers"
> > inconsistencies, but on a table different than the one I'm having
> > problems with.
> > 2) The hbase master log shows this fragment when navigating to
> > table.jsp for the affected table:
> >
> > 2012-09-10 11:29:07,682 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> > reply sessionid:0x1370e3604c49580, packet:: clientPath:null
> > serverPath:null finished:false header:: 10,4  replyHeader::
> > 10,167713215,-101  request:: '/hbase/table/sessions,F  response::
> > 2012-09-10 11:29:07,682 DEBUG
> > org.apache.hadoop.hbase.zookeeper.ZKUtil:
> > hconnection-0x1370e3604c49580 Unable to get data of znode
> > /hbase/table/sessions because node does not exist (not an error)
> > 2012-09-10 11:29:07,682 DEBUG
> > org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting
> > at row=sessions,,00 for max=2147483647 rows
> >
> > But since I see this "Unable to get data of znode" for all tables, my
> > assumption is that it's a red herring.  Any thoughts as how to debug
> > further, or why only this table would not show a region list?
> >
>
> What happens if you scan .META. in shell?
>
> hbase> scan ".META."
>
> Does it all show?
>
> (You might want to echo into a file so you can poke around after scan is
> done).
>
> St.Ack
>

Re: Getting ScannerTimeoutException even after several calls in the specified time limit

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 10:13 AM, Dhirendra Singh  wrote:
> I am facing this exception while iterating over a big table,  by default i
> have specified caching as 100,
>
> i am getting the below exception, even though i checked there are several
> calls made to the scanner before it threw this exception, but somehow its
> saying 86095ms were passed since last invocation.
>
> i also observed that if it set scan.setCaching(false),  it succeeds,  could
> some one please explain or point me to some document as if what's happening
> here and what's the best practices to avoid it.
>
>

Try again cachine < 100.  See if it works.  A big cell?  A GC pause?
You should be able to tell roughly which server is being traversed
when you get the timeout.  Anything else going on on that server at
the time?  What version of HBase?
St.Ack

Re: bulk loading regions number

2012-09-10 Thread Harsh J

The decision can be made depending on the number of total regions you
want deployed across your 10 machines, and the size you expect the
total to be before you have to expand the size of cluster.
Additionally add in a parallelism factor of say 5-10 (or more if you
want) regions of the same table per RS, so that cluster expansion is
easy later.

The penalty of large HFile sizes (I am considering > 4 GB large
enough) may be that major compactions will begin taking time on
full/full-ish regions (writes a single file worth that much). I don't
think there's too much impact to parallelism (# of regions
independently serve-able) or to random reads with the new HFileV2
format with such big files.

If it suits your data ingest, go for bigger files.

On Mon, Sep 10, 2012 at 2:15 PM, Oleg Ruchovets  wrote:
> Great
>   That is actually what I am thinking about too.
> What is the best practice to choose HFile size?
> What is the penalty to define it very big?
>
> Thanks
> Oleg.
>
> On Mon, Sep 10, 2012 at 4:24 AM, Harsh J  wrote:
>
>> Hi Oleg,
>>
>> If the root issue is a growing number of regions, why not control that
>> instead of a way to control the Reducer count? You could, for example,
>> raise the split-point sizes for HFiles, to not have it split too much,
>> and hence have larger but fewer regions?
>>
>> Given that you have 10 machines, I'd go this way rather than ending up
>> with a lot of regions causing issues with load.
>>
>> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets 
>> wrote:
>> > Hi ,
>> >   I am using bulk loading to write my data to hbase.
>> >
>> > I works fine , but number of regions growing very rapidly.
>> > Entering ONE WEEK of data I got  200 regions (I am going to save years of
>> > data).
>> > As a result job which writes data to HBase has REDUCERS number equals
>> > REGIONS number.
>> > So entering only one WEEK of data I have 200 reducers.
>> >
>> > Questions:
>> >How to resolve the problem of constantly growing reducers number using
>> > bulk loading and TotalOrderPartition.
>> >  I have 10 machine cluster and I think I should have ~ 30 reducers.
>> >
>> > Thank in advance.
>> > Oleg.
>>
>>
>>
>> --
>> Harsh J
>>

-- 
Harsh J

Re: HBase UI missing region list for active/functioning table

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 8:33 AM, Norbert Burger
 wrote:
> Hi all -- we're currently on cdh3u3 (0.90.4 + patches).  I have one
> table in our cluster which seems to functioning fine (gets/puts/scans
> are all working), but for which no regions are listed on the UI.  The
> table/regions exist in .META.  Other tables in the same cluster show
> their regions list fine.  Seems like this might be a problem with
> .META. or ZK, but would appreciate any pointers.
>
> 1) hbase hbck reports 2 "multiply assigned to region servers"
> inconsistencies, but on a table different than the one I'm having
> problems with.
> 2) The hbase master log shows this fragment when navigating to
> table.jsp for the affected table:
>
> 2012-09-10 11:29:07,682 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> reply sessionid:0x1370e3604c49580, packet:: clientPath:null
> serverPath:null finished:false header:: 10,4  replyHeader::
> 10,167713215,-101  request:: '/hbase/table/sessions,F  response::
> 2012-09-10 11:29:07,682 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKUtil:
> hconnection-0x1370e3604c49580 Unable to get data of znode
> /hbase/table/sessions because node does not exist (not an error)
> 2012-09-10 11:29:07,682 DEBUG
> org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting
> at row=sessions,,00 for max=2147483647 rows
>
> But since I see this "Unable to get data of znode" for all tables, my
> assumption is that it's a red herring.  Any thoughts as how to debug
> further, or why only this table would not show a region list?
>

What happens if you scan .META. in shell?

hbase> scan ".META."

Does it all show?

(You might want to echo into a file so you can poke around after scan is done).

St.Ack

Re: 答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 9:02 AM, huaxiang  wrote:
> Hi，
>I don't find the hbase-default.xml file using following command, any
> other way?
>To be clear, this hadoop was installed with CDH RPM package.
>

Is it not bundled inside the hbase-*.jar?
St.Ack

Re: Doubt in performance tuning

2012-09-10 Thread Stack

On Mon, Sep 10, 2012 at 9:58 AM, Ramasubramanian
 wrote:
> Hi,
>
> Currently it takes 11 odd minutes to load 1.2 million record into hbase from 
> hdfs. Can u pls share some tips to do the same in few seconds?
>
> We tried doing this in both pig script and in pentaho. Both are taking 11 odd 
> minutes.
>

You have had a look at http://hbase.apache.org/book.html#performance?
St.Ack

Doubt in performance tuning

2012-09-10 Thread Ramasubramanian

Hi,

Currently it takes 11 odd minutes to load 1.2 million record into hbase from 
hdfs. Can u pls share some tips to do the same in few seconds?

We tried doing this in both pig script and in pentaho. Both are taking 11 odd 
minutes. 

Regards,
Rams

Re: Hbase filter-SubstringComparator vs full text search indexing

2012-09-10 Thread Jacques

Two cents below...

On Mon, Sep 10, 2012 at 7:24 AM, Shengjie Min  wrote:

> In my case, I have all the log events stored in HDFS/hbase in this format:
>
> timestamp | priority | category | message body
>
> Given I have only 4 fields here, that limits my queries to only against
> these four. I am thinking about more advanced search like full text search
> the message body. well, mainly substring query against message body.
>
>1.
>
>Has anybody tried to use Hbase SubstringComparator? How does it perform,
>with reasonable huge amount of data, can it still provide us the real
> time
>response capability?
>

Probably not if "huge" is sufficiently large.  Since HBase only stores data
indexed by the primary row key, any other criteria search requires a full
scan of all data.


>2.
>
>In my case, does it make more sene to use a proper full text search
>engine(lucene/solr/elasticsearch) to index the message body, does that
>sound like a better idea?
>

Often yes.  For big data especially, this is where ElasticSearch accels.



>
> would be great someone experienced can share some stories here.
>
> -Shengjie Min
>

Re: BigDecimalColumnInterpreter

2012-09-10 Thread Julian Wissmann

Hi,

I haven't really gotten to working on this, since last wednesday.
Checked readFields() and write() today, but don't really see, why I would
need to reimplement those. Admittedly I'm not that into the whole Hbase
codebase, yet, so there is a good chance I'm missing something, here.

Also, Anil, what hbase library are you coding this against?
It does seem like madness, that even though, we're both using this
identically it does not work for me.

Cheers,

Julian

2012/9/6 anil gupta 

> Yes, we do. :)
> Let me know the outcome. If you look at the BD ColumnInterpreter, getValue
> method is converting the byte array into BigDecimal. So you should not have
> any problem. The BD ColumnInterpreter is pretty similar to
> LongColumnInterpreter.
>
> Here is the code snippet for getValue() method which will convert Byte[] to
> BigDecimal:
>
> @Override
> public BigDecimal getValue(byte[] paramArrayOfByte1, byte[]
> paramArrayOfByte2,
> KeyValue kv) throws IOException {
>  if ((kv == null || kv.getValue() == null))
>return null;
>  return Bytes.toBigDecimal(kv.getValue());
> }
>
> Thanks,
> Anil
>
>
> On Thu, Sep 6, 2012 at 11:43 AM, Julian Wissmann
> wrote:
>
> > 0.92.1 from cdh4. I assume we use the same thing.
> >
> > 2012/9/6 anil gupta 
> >
> > > I am using HBase0.92.1. Which version you are using?
> > >
> > >
> > > On Thu, Sep 6, 2012 at 10:19 AM, anil gupta 
> > wrote:
> > >
> > > > Hi Julian,
> > > >
> > > > You need to add the column qualifier explicitly in the scanner. You
> > have
> > > > only added the column family in the scanner.
> > > > I am also assuming that you are writing a ByteArray of BigDecimal
> > object
> > > > as value of these cells in HBase. Is that right?
> > > >
> > > > Thanks,
> > > > Anil
> > > >
> > > >
> > > > On Thu, Sep 6, 2012 at 2:28 AM, Julian Wissmann <
> > > julian.wissm...@sdace.de>wrote:
> > > >
> > > >> Hi, anil,
> > > >>
> > > >> I presume you mean something like this:
> > > >> Scan scan = new Scan(_start, _end);
> > > >> scan.addFamily(family.getBytes());
> > > >> final ColumnInterpreter ci = new
> > > >> mypackage.BigDecimalColumnInterpreter();
> > > >> AggregationClient ag = new org.apache.hadoop.hbase.
> > > >> client.coprocessor.AggregationClient(config);
> > > >> BigDecimal sum = ag.sum(Bytes.toBytes(tableName), new
> > > >> BigDecimalColumnInterpreter(), scan);
> > > >>
> > > >>
> > > >> When I call this,with the Endpoint in place and loaded as a jar, I
> get
> > > the
> > > >> above error.
> > > >> When I call it without the endpoint loaded as coprocessor, though, I
> > get
> > > >> this:
> > > >>
> > > >> java.util.concurrent.ExecutionException:
> > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after
> > > >> attempts=10, exceptions:
> > > >> Thu Sep 06 11:07:39 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:40 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:41 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:42 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:44 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:46 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:50 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:07:54 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:08:02 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >> Thu Sep 06 11:08:18 CEST 2012,
> > > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@7bd6747b,
> > > >> java.io.IOException:
> > > >> IPC server unable to read call parameters: Error in readFields
> > > >>
> > > >> at
>

答复: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread huaxiang

Hi， 
   I don't find the hbase-default.xml file using following command, any
other way?
   To be clear, this hadoop was installed with CDH RPM package.

Huaxiang 

[root@hadoop1 ~]# clear
[root@hadoop1 ~]# rpm -qlp *rpm_file_name.rpm*
[root@hadoop1 ~]# ^C
[root@hadoop1 ~]# find / -name "*hbase-default.xml*"
/usr/share/doc/hbase-0.92.1+67/hbase-default.xml
[root@hadoop1 ~]#

-邮件原件-
发件人: Monish r [mailto:monishs...@gmail.com] 
发送时间: 2012年9月10日 15:00
收件人: user@hbase.apache.org
主题: Re: for CDH4.0, where can i find the hbase-default.xml file if using
RPM install

Hi,
Try

rpm -qlp *rpm_file_name.rpm*

This will list all files in the rpm , from this u can know where
hbase-default.xml is.


On Sat, Sep 8, 2012 at 3:16 PM, John Hancock  wrote:

> Huaxiang,
>
> This may not be the quickest way to find it, but if it's anywhere in 
> your system, this command will find it:
>
> find / -name "*hbase-default.xml*"
>
> or
>
> cd / find / -name "*hbase-default.xml*" > temp.txt
>
> will save the output of the find command to a text file leaving out 
> any error messages that might be distracting.
>
>
> -John
>
>
>
> On Sat, Sep 8, 2012 at 12:47 AM, huaxiang 
>  >wrote:
>
> > Hi,
> >
> > I install CDH4.0 with RPM package, but I cannot find the
> hbase-default.xml
> > file?
> >
> > Where can I find it?
> >
> >
> >
> > Best R.
> >
> >
> >
> > Huaxiang
> >
> >
>

HBase UI missing region list for active/functioning table

2012-09-10 Thread Norbert Burger

Hi all -- we're currently on cdh3u3 (0.90.4 + patches).  I have one
table in our cluster which seems to functioning fine (gets/puts/scans
are all working), but for which no regions are listed on the UI.  The
table/regions exist in .META.  Other tables in the same cluster show
their regions list fine.  Seems like this might be a problem with
.META. or ZK, but would appreciate any pointers.

1) hbase hbck reports 2 "multiply assigned to region servers"
inconsistencies, but on a table different than the one I'm having
problems with.
2) The hbase master log shows this fragment when navigating to
table.jsp for the affected table:

2012-09-10 11:29:07,682 DEBUG org.apache.zookeeper.ClientCnxn: Reading
reply sessionid:0x1370e3604c49580, packet:: clientPath:null
serverPath:null finished:false header:: 10,4  replyHeader::
10,167713215,-101  request:: '/hbase/table/sessions,F  response::
2012-09-10 11:29:07,682 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKUtil:
hconnection-0x1370e3604c49580 Unable to get data of znode
/hbase/table/sessions because node does not exist (not an error)
2012-09-10 11:29:07,682 DEBUG
org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting
at row=sessions,,00 for max=2147483647 rows

But since I see this "Unable to get data of znode" for all tables, my
assumption is that it's a red herring.  Any thoughts as how to debug
further, or why only this table would not show a region list?

Norbert

Re: HBase aggregate query

2012-09-10 Thread Doug Meil

Hi there, if there are common questions I'd suggest creating summary
tables of the pre-aggregated results.

http://hbase.apache.org/book.html#mapreduce.example

7.2.4. HBase MapReduce Summary to HBase Example

On 9/10/12 10:03 AM, "iwannaplay games"  wrote:

>Hi ,
>
>I want to run query like
>
>select month(eventdate),scene,count(1),sum(timespent) from eventlog
>group by month(eventdate),scene
>
>
>in hbase.Through hive its taking a lot of time for 40 million
>records.Do we have any syntax in hbase to find its result?In sql
>server it takes around 9 minutes,How long it might take in hbase??
>
>Regards
>Prabhjot
>

Hbase filter-SubstringComparator vs full text search indexing

2012-09-10 Thread Shengjie Min

In my case, I have all the log events stored in HDFS/hbase in this format:

timestamp | priority | category | message body

Given I have only 4 fields here, that limits my queries to only against
these four. I am thinking about more advanced search like full text search
the message body. well, mainly substring query against message body.

   1.

   Has anybody tried to use Hbase SubstringComparator? How does it perform,
   with reasonable huge amount of data, can it still provide us the real time
   response capability?
   2.

   In my case, does it make more sene to use a proper full text search
   engine(lucene/solr/elasticsearch) to index the message body, does that
   sound like a better idea?

would be great someone experienced can share some stories here.

-Shengjie Min

Re: HBase aggregate query

2012-09-10 Thread iwannaplay games

its taking very long

On Mon, Sep 10, 2012 at 7:34 PM, Ted Yu  wrote:

> Hi,
> Are you able to get the number you want through hive log ?
>
> Thanks
>
> On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games <
> funnlearnfork...@gmail.com> wrote:
>
> > Hi ,
> >
> > I want to run query like
> >
> > select month(eventdate),scene,count(1),sum(timespent) from eventlog
> > group by month(eventdate),scene
> >
> >
> > in hbase.Through hive its taking a lot of time for 40 million
> > records.Do we have any syntax in hbase to find its result?In sql
> > server it takes around 9 minutes,How long it might take in hbase??
> >
> > Regards
> > Prabhjot
> >
>

Re: HBase aggregate query

2012-09-10 Thread Srinivas Mupparapu

HBase only provides CRUD operations by means of Put/Get/Delete API and
there is no built in SQL interface.

Thanks,
Srinivas M
On Sep 10, 2012 9:03 AM, "iwannaplay games" 
wrote:

> Hi ,
>
> I want to run query like
>
> select month(eventdate),scene,count(1),sum(timespent) from eventlog
> group by month(eventdate),scene
>
>
> in hbase.Through hive its taking a lot of time for 40 million
> records.Do we have any syntax in hbase to find its result?In sql
> server it takes around 9 minutes,How long it might take in hbase??
>
> Regards
> Prabhjot
>

Re: HBase aggregate query

2012-09-10 Thread Ted Yu

Hi,
Are you able to get the number you want through hive log ?

Thanks

On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games <
funnlearnfork...@gmail.com> wrote:

> Hi ,
>
> I want to run query like
>
> select month(eventdate),scene,count(1),sum(timespent) from eventlog
> group by month(eventdate),scene
>
>
> in hbase.Through hive its taking a lot of time for 40 million
> records.Do we have any syntax in hbase to find its result?In sql
> server it takes around 9 minutes,How long it might take in hbase??
>
> Regards
> Prabhjot
>

HBase aggregate query

2012-09-10 Thread iwannaplay games

Hi ,

I want to run query like

select month(eventdate),scene,count(1),sum(timespent) from eventlog
group by month(eventdate),scene


in hbase.Through hive its taking a lot of time for 40 million
records.Do we have any syntax in hbase to find its result?In sql
server it takes around 9 minutes,How long it might take in hbase??

Regards
Prabhjot

Re: bulk loading regions number

2012-09-10 Thread Marcos Ortiz

Well, the defaul value for a region is 256 MB, so, if you want to 
storage a lot of date, you should want to consider to

increase that value.
With the preSplit() method, you can control how to do this process.

On 09/10/2012 04:45 AM, Oleg Ruchovets wrote:

Great
   That is actually what I am thinking about too.
What is the best practice to choose HFile size?
What is the penalty to define it very big?

Thanks
Oleg.

On Mon, Sep 10, 2012 at 4:24 AM, Harsh J  wrote:


Hi Oleg,

If the root issue is a growing number of regions, why not control that
instead of a way to control the Reducer count? You could, for example,
raise the split-point sizes for HFiles, to not have it split too much,
and hence have larger but fewer regions?

Given that you have 10 machines, I'd go this way rather than ending up
with a lot of regions causing issues with load.

On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets 
wrote:

Hi ,
   I am using bulk loading to write my data to hbase.

I works fine , but number of regions growing very rapidly.
Entering ONE WEEK of data I got  200 regions (I am going to save years of
data).
As a result job which writes data to HBase has REDUCERS number equals
REGIONS number.
So entering only one WEEK of data I have 200 reducers.

Questions:
How to resolve the problem of constantly growing reducers number using
bulk loading and TotalOrderPartition.
  I have 10 machine cluster and I think I should have ~ 30 reducers.

Thank in advance.
Oleg.



--
Harsh J



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--

Marcos Luis Ortíz Valmaseda
*Data Engineer && Sr. System Administrator at UCI*
about.me/marcosortiz 
My Blog 
Tumblr's blog 
@marcosluis2186 





10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Tomb Stone Marker

2012-09-10 Thread Doug Meil


Hi there...

In this chapter...

http://hbase.apache.org/book.html#datamodel

.. it explains that the "updates" are just a view.  There is a merge
happening across CFs and versions (and delete-markers)..

In this...

http://hbase.apache.org/book.html#regions.arch
9.7.5.5. Compaction

... it explains how and when the delete markers are removed in the
compaction process.





On 9/10/12 2:50 AM, "Monish r"  wrote:

>Hi,
>Thanks for the link . If the meta data information for a delete is part of
>key value , then when does this update happen
>
>When the region is re written by minor compaction. ?
>or Is the region  re written for a set of batched deletes ?
>
>
>
>On Sun, Sep 9, 2012 at 6:42 PM, Doug Meil
>wrote:
>
>>
>> Hi there,
>>
>> See 9.7.5.4. KeyValue...
>>
>> http://hbase.apache.org/book.html#regions.arch
>>
>> Š the tombstone is one of the keytypes.
>>
>>
>>
>> On 9/9/12 5:21 AM, "Monish r"  wrote:
>>
>> >Hi,
>> >I need some clarifications regarding the Tomb Stone Marker .
>> >I was wondering where exactly are the tomb stone markers stored when a
>>row
>> >is deleted .
>> >
>> >Are they kept in some memory area and  updated in the HFile  during
>>minor
>> >compaction ?
>> >If they are updated in the HFile , then what part of a  HFile contains
>> >this
>> >information.
>> >
>> >Regards,
>> >R.Monish
>>
>>
>>

Re: bulk loading regions number

2012-09-10 Thread Oleg Ruchovets

Great
  That is actually what I am thinking about too.
What is the best practice to choose HFile size?
What is the penalty to define it very big?

Thanks
Oleg.

On Mon, Sep 10, 2012 at 4:24 AM, Harsh J  wrote:

> Hi Oleg,
>
> If the root issue is a growing number of regions, why not control that
> instead of a way to control the Reducer count? You could, for example,
> raise the split-point sizes for HFiles, to not have it split too much,
> and hence have larger but fewer regions?
>
> Given that you have 10 machines, I'd go this way rather than ending up
> with a lot of regions causing issues with load.
>
> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets 
> wrote:
> > Hi ,
> >   I am using bulk loading to write my data to hbase.
> >
> > I works fine , but number of regions growing very rapidly.
> > Entering ONE WEEK of data I got  200 regions (I am going to save years of
> > data).
> > As a result job which writes data to HBase has REDUCERS number equals
> > REGIONS number.
> > So entering only one WEEK of data I have 200 reducers.
> >
> > Questions:
> >How to resolve the problem of constantly growing reducers number using
> > bulk loading and TotalOrderPartition.
> >  I have 10 machine cluster and I think I should have ~ 30 reducers.
> >
> > Thank in advance.
> > Oleg.
>
>
>
> --
> Harsh J
>

Re: About Reloading Coprocessors

2012-09-10 Thread Sever Fundatureanu

If it is associated to a certain table, you only have to disable the
table, reload coprocessor, enable the table.

Regards,
Sever

On Wed, Sep 5, 2012 at 5:18 AM, Aaron Wong  wrote:
> Hello all,
>
> I have an endpoint coprocessor running in HBase that I would like to
> modify.  I previously loaded this coprocessor via the shell, without having
> to restart HBase.  However, after some experimentation I have not found any
> way to reload a new version of the coprocessor without restarting HBase.
> Is there presently any mechanism for doing so?
>
> Regards,
> Aaron



-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: fundatureanu.se...@gmail.com

Re: bulk loading regions number

2012-09-10 Thread Harsh J

Hi Oleg,

If the root issue is a growing number of regions, why not control that
instead of a way to control the Reducer count? You could, for example,
raise the split-point sizes for HFiles, to not have it split too much,
and hence have larger but fewer regions?

Given that you have 10 machines, I'd go this way rather than ending up
with a lot of regions causing issues with load.

On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets  wrote:
> Hi ,
>   I am using bulk loading to write my data to hbase.
>
> I works fine , but number of regions growing very rapidly.
> Entering ONE WEEK of data I got  200 regions (I am going to save years of
> data).
> As a result job which writes data to HBase has REDUCERS number equals
> REGIONS number.
> So entering only one WEEK of data I have 200 reducers.
>
> Questions:
>How to resolve the problem of constantly growing reducers number using
> bulk loading and TotalOrderPartition.
>  I have 10 machine cluster and I think I should have ~ 30 reducers.
>
> Thank in advance.
> Oleg.

-- 
Harsh J

bulk loading regions number

2012-09-10 Thread Oleg Ruchovets

Hi ,
  I am using bulk loading to write my data to hbase.

I works fine , but number of regions growing very rapidly.
Entering ONE WEEK of data I got  200 regions (I am going to save years of
data).
As a result job which writes data to HBase has REDUCERS number equals
REGIONS number.
So entering only one WEEK of data I have 200 reducers.

Questions:
   How to resolve the problem of constantly growing reducers number using
bulk loading and TotalOrderPartition.
 I have 10 machine cluster and I think I should have ~ 30 reducers.

Thank in advance.
Oleg.

Re: Local debugging (possibly with Maven and HBaseTestingUtility?)

2012-09-10 Thread Ulrich Staudinger

Hi there,

my AQ Master Server might be of interest to you  I have an embedded
HBase server in it, it's very very straight forward to use:
http://activequant.org/uberjar.html

What I essentially do is described here:

http://developers.activequant.org:3000/projects/aq2o/repository/entry/trunk/apps/src/main/java/com/activequant/server/LocalHBaseCluster.java

It is really blunt, but works flawless. Performance is of course not
super-great and it doesn't replicate, etc. I use it frequently to get the
ActiveQuant framework up and running on a new machine.


Maybe it helps,
Ulrich





On Mon, Sep 10, 2012 at 9:12 AM, Jeroen Hoek  wrote:

> 2012/9/7 n keywal :
> > You can use HBase in standalone mode? Cf.
> > http://hbase.apache.org/book.html#standalone_dist?
> > I guess you already tried and it didn't work?
>
> With stand-alone mode I assume you mean installing HBase locally and
> work with that?
>
> The problem with installing HBase directly on the developer laptop's
> OS is that this is limits you to the version installed at any one
> time. When writing software that uses the HBase client API it is
> sometimes necessary to switch between versions. For example, one day I
> might be working on a feature for our next release, based on
> Cloudera's CDH4 version of HBase, the next day I might have to switch
> back to CDH3, because that runs on production and a sudden hotfix is
> needed, and at the end of the week I might want to try out some of the
> new features in HBase 0.94.1.
>
> This is one of the reasons why the HBaseTestingUtility approach via
> Maven looks nice, but it lacks persistence.
>
> Another problem is that while I can install HBase easily on my
> GNU/Linux laptop, my colleagues run mostly Mac OS X and Windows. A
> solution that depends on the Java toolchain (Maven and friends) rather
> than the OS seems preferable.
>
> The main reason why we are still supporting AppEngine as well as HBase
> is the ease with which you can run a local Jetty application server
> backed by a file-backed AppEngine instance with one click from
> Eclipse.
>
> Jeroen Hoek
>



-- 
Ulrich Staudinger

http://www.activequant.com
Connect online: https://www.xing.com/profile/Ulrich_Staudinger

Re: Local debugging (possibly with Maven and HBaseTestingUtility?)

2012-09-10 Thread n keywal

With stand-alone mode I assume you mean installing HBase locally and

> work with that?
>

Yes. You can as well can launch "a la standalone" any version, including a
development version. The launch scripts check this, and use maven to get
the classpath needed on the dev version.

The problem with installing HBase directly on the developer laptop's
> OS is that this is limits you to the version installed at any one
> time. When writing software that uses the HBase client API it is
> sometimes necessary to switch between versions. For example, one day I
> might be working on a feature for our next release, based on
> Cloudera's CDH4 version of HBase, the next day I might have to switch
> back to CDH3, because that runs on production and a sudden hotfix is
> needed, and at the end of the week I might want to try out some of the
> new features in HBase 0.94.1.
>

I don''t know if CDH versions are exclusive. This would be a question for
the cdh lists. But for the apache releases at least, nothing prevents you
from having multiple ones installed on the same computer. If you need it,
you should even be able to run multiple versions simultaneously (I've never
done that, but I don't see why it would be an issue, it's just a matter of
ports & directory configuration).

Nicolas

Re: Local debugging (possibly with Maven and HBaseTestingUtility?)

2012-09-10 Thread Jeroen Hoek

2012/9/7 n keywal :
> You can use HBase in standalone mode? Cf.
> http://hbase.apache.org/book.html#standalone_dist?
> I guess you already tried and it didn't work?

With stand-alone mode I assume you mean installing HBase locally and
work with that?

The problem with installing HBase directly on the developer laptop's
OS is that this is limits you to the version installed at any one
time. When writing software that uses the HBase client API it is
sometimes necessary to switch between versions. For example, one day I
might be working on a feature for our next release, based on
Cloudera's CDH4 version of HBase, the next day I might have to switch
back to CDH3, because that runs on production and a sudden hotfix is
needed, and at the end of the week I might want to try out some of the
new features in HBase 0.94.1.

This is one of the reasons why the HBaseTestingUtility approach via
Maven looks nice, but it lacks persistence.

Another problem is that while I can install HBase easily on my
GNU/Linux laptop, my colleagues run mostly Mac OS X and Windows. A
solution that depends on the Java toolchain (Maven and friends) rather
than the OS seems preferable.

The main reason why we are still supporting AppEngine as well as HBase
is the ease with which you can run a local Jetty application server
backed by a file-backed AppEngine instance with one click from
Eclipse.

Jeroen Hoek

Re: Regionservers are dead...How to make them live again

2012-09-10 Thread Monish r

Hi,
Try checking the log files of both HDFS ( if it is used ) and HBase to find
out why the region server is going down.
If possible post the logs , i can have a look at it.




On Mon, Sep 10, 2012 at 10:46 AM, iwannaplay games <
funnlearnfork...@gmail.com> wrote:

> Its weird.
> I restarted everything on friday...their status didnt change.Today it
> happened by itself...
> Is restarting the solution?
>
> On Mon, Sep 10, 2012 at 10:41 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Restart them?
> >
> > Otis
> > --
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Mon, Sep 10, 2012 at 12:50 AM, iwannaplay games
> >  wrote:
> > > Hi all,
> > >
> > > Due to large data load my hbase regionservers are dead now..Does
> anybody
> > > have any idea how to make them live again
> > >
> > > Please help me..
> > >
> > > Thanks in advance
> > >
> > > Regards
> > > Prabhjot
> >
>

Re: for CDH4.0, where can i find the hbase-default.xml file if using RPM install

2012-09-10 Thread Monish r

Hi,
Try

rpm -qlp *rpm_file_name.rpm*

This will list all files in the rpm , from this u can know where
hbase-default.xml
is.


On Sat, Sep 8, 2012 at 3:16 PM, John Hancock  wrote:

> Huaxiang,
>
> This may not be the quickest way to find it, but if it's anywhere in your
> system, this command will find it:
>
> find / -name "*hbase-default.xml*"
>
> or
>
> cd / find / -name "*hbase-default.xml*" > temp.txt
>
> will save the output of the find command to a text file leaving out any
> error messages that might be distracting.
>
>
> -John
>
>
>
> On Sat, Sep 8, 2012 at 12:47 AM, huaxiang  >wrote:
>
> > Hi,
> >
> > I install CDH4.0 with RPM package, but I cannot find the
> hbase-default.xml
> > file?
> >
> > Where can I find it?
> >
> >
> >
> > Best R.
> >
> >
> >
> > Huaxiang
> >
> >
>

55 matches

Mail list logo