Re: Using HBase on other file systems

2010-05-12 Thread Jeff Hammerbacher
Some projects sacrifice stability and manageability for performance (see,
e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html).

On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo wrote:

> On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell 
> wrote:
>
> > Before recommending Gluster I suggest you set up a test cluster and then
> > randomly kill bricks.
> >
> > Also as pointed out in another mail, you'll want to colocate TaskTrackers
> > on Gluster bricks to get I/O locality, yet there is no way for Gluster to
> > export stripe locations back to Hadoop.
> >
> > It seems a poor choice.
> >
> >   - Andy
> >
> > > From: Edward Capriolo
> > > Subject: Re: Using HBase on other file systems
> > > To: "hbase-user@hadoop.apache.org" 
> > > Date: Wednesday, May 12, 2010, 6:38 AM
> > > On Tuesday, May 11, 2010, Jeff
> > > Hammerbacher 
> > > wrote:
> > > > Hey Edward,
> > > >
> > > > I do think that if you compare GoogleFS to HDFS, GFS
> > > looks more full
> > > >> featured.
> > > >>
> > > >
> > > > What features are you missing? Multi-writer append was
> > > explicitly called out
> > > > by Sean Quinlan as a bad idea, and rolled back. From
> > > internal conversations
> > > > with Google engineers, erasure coding of blocks
> > > suffered a similar fate.
> > > > Native client access would certainly be nice, but FUSE
> > > gets you most of the
> > > > way there. Scalability/availability of the NN, RPC
> > > QoS, alternative block
> > > > placement strategies are second-order features which
> > > didn't exist in GFS
> > > > until later in its lifecycle of development as well.
> > > HDFS is following a
> > > > similar path and has JIRA tickets with active
> > > discussions. I'd love to hear
> > > > your feature requests, and I'll be sure to translate
> > > them into JIRA tickets.
> > > >
> > > > I do believe my logic is reasonable. HBase has a lot
> > > of code designed around
> > > >> HDFS.  We know these tickets that get cited all
> > > the time, for better random
> > > >> reads, or for sync() support. HBase gets the
> > > benefits of HDFS and has to
> > > >> deal with its drawbacks. Other key value stores
> > > handle storage directly.
> > > >>
> > > >
> > > > Sync() works and will be in the next release, and its
> > > absence was simply a
> > > > result of the youth of the system. Now that that
> > > limitation has been
> > > > removed, please point to another place in the code
> > > where using HDFS rather
> > > > than the local file system is forcing HBase to make
> > > compromises. Your
> > > > initial attempts on this front (caching, HFile,
> > > compactions) were, I hope,
> > > > debunked by my previous email. It's also worth noting
> > > that Cassandra does
> > > > all three, despite managing its own storage.
> > > >
> > > > I'm trying to learn from this exchange and always
> > > enjoy understanding new
> > > > systems. Here's what I have so far from your
> > > arguments:
> > > > 1) HBase inherits both the advantages and
> > > disadvantages of HDFS. I clearly
> > > > agree on the general point; I'm pressing you to name
> > > some specific
> > > > disadvantages, in hopes of helping prioritize our
> > > development of HDFS. So
> > > > far, you've named things which are either a) not
> > > actually disadvantages b)
> > > > no longer true. If you can come up with the
> > > disadvantages, we'll certainly
> > > > take them into account. I've certainly got a number of
> > > them on our roadmap.
> > > > 2) If you don't want to use HDFS, you won't want to
> > > use HBase. Also
> > > > certainly true, but I'm not sure there's not much to
> > > learn from this
> > > > assertion. I'd once again ask: why would you not want
> > > to use HDFS, and what
> > > > is your choice in its stead?
> > > >
> > > > Thanks,
> > > > Jeff
> > > >
> > >
> > > Jeff,
> > >
> > > Let me first mention that you have mentioned some thing as
> > > fixed, that
> > > are only fixed in trunk. I consider trunk futureware and I
> > > do not like
> > > to have tempral conversations. Even when trunk becomes
> > > current there
> > > is no guarentee that the entire problem is solved. After
> > > all appends
> > > were fixed in .19 or not , or again?
> > >
> > > I rescanned the gfs white paper to support my argument that
> > > hdfs is
> > > stripped down. Found
> > > Writes at offset ARE supported
> > > Checkpoints
> > > Application level checkpoints
> > > Snapshot
> > > Shadow read only master
> > >
> > > hdfs chose features it wanted and ignored others that is
> > > why I called
> > > it a pure map reduce implementation.
> > >
> > > My main point, is that hbase by nature needs high speed
> > > random read
> > > and random write. Hdfs by nature is bad at these things. If
> > > you can
> > > not keep a high cache hit rate via large block cache via
> > > ram hbase is
> > > going to slam hdfs doing large block reads for small parts
> > > of files.
> > >
> > > So you ask. Me what I would use instead. I do not think
> > > there is a
> > > 

Re: Enabling Indexing in HBase

2010-05-12 Thread Seraph Imalia

Hi, I'm working with Michelan...

We are actually using the HBaseConfiguration object - which is why we  
were confused when the client was trying to connect to zookeeper on  
localhost.  Even stranger was that all other functions work fine -  
getting a table, putting and getting data.  It is only the code that  
adds indexing to the table that throws the error.


In the debug info we can see that...

1) when adding indexing to an existing table, it connects to  
zookeeper, it disables the table, enables the table, creates the index  
table and then tries to connect to zookeeper on localhost.


2) when adding a new table with indexing, it connects to zookeeper on  
the server, creates the table and the index table and fails after that  
trying to connect to zookeeper again, but again on localhost.


I have checked that we are using the same HBaseAdmin object the whole  
time (which was constructed with the HBaseConfiguration object) and it  
was only when we added the conf folder to the classpath as well that  
it completed with no errors.


I don't have the code or the debug info with me right now, but  
Michelan can send that to you in the morning - please let us know if  
you need it?


Regards,
Seraph


On 12 May 2010, at 6:28 PM, Jean-Daniel Cryans wrote:


Yes, you can also create a HBaseConfiguration object and configure it
with those exact configs (that you then provide to HTable).

J-D

On Wed, May 12, 2010 at 1:22 AM, Michelan Arendse > wrote:
Thank you. I have added the configuration folder to my client class  
path and it worked.


Now I am faced with another issue, since this application will be  
used in ColdFusion is there a way of making this work without  
having the configuration as part of the class path?


-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of  
Jean-Daniel Cryans

Sent: 11 May 2010 06:26 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Enabling Indexing in HBase

Per 
http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview
your client has to know where your zookeeper setup is. Since you want
to use HBase in a distributed fashion, that means you went through
http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib
and this is where the required configs are.

It could be made more obvious tho.

J-D

On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse > wrote:
Thanks. I have added that to the class path, but I still get an  
error.

This is the error that I get:

10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client  
connection, connectString=localhost:2181 sessionTimeout=6  
watcher=org.apache.hadoop.hbase.client.HConnectionManager 
$clientzkwatc...@12d15a9
10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection  
to server localhost/127.0.0.1:2181
10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing  
session 0x0 to sun.nio.ch.selectionkeyi...@b0ce8f
java.net.ConnectException: Connection refused: no further  
information

   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
   at org.apache.zookeeper.ClientCnxn$SendThread.run 
(ClientCnxn.java:933)
10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception  
during shutdown input


I'm working of a server and not standalone mode, where would I  
change a setting that tells the "connectString" to point to the  
server instead of "localhost".


-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of  
Jean-Daniel Cryans

Sent: 10 May 2010 07:05 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Enabling Indexing in HBase

Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar)  
in

your class path?

J-D

On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse > wrote:

Hi.

I added the following properties  to hbase-site.xml

   hbase.regionserver.class
   org.apache.hadoop.hbase.ipc.IndexedRegionInterfacevalue>

   

   
   hbase.regionserver.impl
   

org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer

   
   

I'm using hbase 0.20.3 and when I start hbase now it comes with  
the following:

ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
java.lang.UnsupportedOperationException: Unable to find region  
server interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface
Caused by: java.lang.ClassNotFoundException:  
org.apache.hadoop.hbase.ipc.IndexedRegionInterface


Can you please help with this problem that I am having.

Thank you,

Michelan Arendse
Junior Developer | AD:DYNAMO // happy business ;-)
Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587

Advertise Online Instantly - www.addynamo.com 












Re: HBase client hangs after upgrade to 0.20.4 when used from reducer

2010-05-12 Thread Todd Lipcon
Hi Friso,

Also, if you can capture a jstack of the regionservers at thie time
that would be great.

-Todd

On Wed, May 12, 2010 at 9:26 AM, Jean-Daniel Cryans  wrote:
> Friso,
>
> Unfortunately it's hard to determine the cause with the provided
> information, the client call you pasted is pretty much normal i.e. the
> client is waiting to receive a result from a region server.
>
> The fact that you can't shut down the master when this happens is very
> concerning. Do you still have those logs around? Same for the region
> servers? Can you post this in pastebin or on a web server?
>
> Also, feel free to come chat with us on IRC, it's always easier to
> debug when live. #hbase on freenode
>
> J-D
>
> On Wed, May 12, 2010 at 8:31 AM, Friso van Vollenhoven
>  wrote:
>> Hi all,
>>
>> I am using Hadoop (0.20.2) and HBase to periodically import data (every 15 
>> minutes). There are a number of import processes, but generally they all 
>> create a sequence file on HDFS, which is then run through a MapReduce job. 
>> The MapReduce uses the identity mapper (the input file is a Hadoop sequence 
>> file) and a specialized reducer that does the following:
>> - Combine the values for a key into one value
>> - Do a Get from HBase to retrieve existing values for the same key
>> - Combine the existing value from HBase and the new one into one value again
>> - Put the final value into HBase under the same key (thus 'overwrite' the 
>> existing row; I keep only one version)
>>
>> After I upgraded HBase to the 0.20.4 release, the reducers sometimes start 
>> hanging on a Get. When the jobs start, some reducers run to completion fine, 
>> but after a while the last reducers will start to hang. Eventually the 
>> reducers are killed of by Hadoop (after 600 secs).
>>
>> I did a thread dump for one of the hanging reducers. It looks like this:
>> "main" prio=10 tid=0x48083800 nid=0x4c93 in Object.wait() 
>> [0x420ca000]
>>   java.lang.Thread.State: WAITING (on object monitor)
>>        at java.lang.Object.wait(Native Method)
>>        - waiting on <0x2eb50d70> (a 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>>        at java.lang.Object.wait(Object.java:485)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
>>        - locked <0x2eb50d70> (a 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>>        at 
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>        at $Proxy2.get(Unknown Source)
>>        at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:450)
>>        at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:448)
>>        at 
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:447)
>>        at 
>> net.ripe.inrdb.hbase.accessor.real.HBaseTableAccessor.get(HBaseTableAccessor.java:36)
>>        at 
>> net.ripe.inrdb.hbase.store.HBaseStoreUpdater.getExistingRecords(HBaseStoreUpdater.java:101)
>>        at 
>> net.ripe.inrdb.hbase.store.HBaseStoreUpdater.mergeTimelinesWithExistingRecords(HBaseStoreUpdater.java:60)
>>        at 
>> net.ripe.inrdb.hbase.store.HBaseStoreUpdater.doInsert(HBaseStoreUpdater.java:40)
>>        at 
>> net.ripe.inrdb.core.store.SinglePartitionStore$Updater.insert(SinglePartitionStore.java:92)
>>        at 
>> net.ripe.inrdb.core.store.CompositeStore$CompositeStoreUpdater.insert(CompositeStore.java:142)
>>        at 
>> net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:70)
>>        at 
>> net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:17)
>>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>        at 
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> So the client hangs in a wait() call, waiting on a HBaseClient$Call object. 
>> I looked at the code. The wait is in a while() loop and has no time out, so 
>> it figures that it never gets out of there if no notify() gets called on the 
>> object. I am not sure for exactly what condition it is waiting, however.
>>
>> Meanwhile, after this has happened, I cannot shutdown the master server 
>> normally. I have to kill -9 it, to make it shut down. Normally and before 
>> this problem occurs, the master server shuts down just fine. (Sorry, didn't 
>> do a thread dump of the master and now I downgraded to 0.20.3 again.)
>>
>> I cannot reproduce this error on my local setup (developer machine). It only 
>> occurs on our (currently modest) cluster of one machine running 
>> master+NN+Zookeeper and four datanodes which are all task trackers and 
>> region servers as well. The inputs to the periodic MapReduce jobs are very 
>> small (ranging from some Kb to several Mb) and thus co

data redundancy in hbase tables for read performance

2010-05-12 Thread N Kapshoo
For the model I am designing, read speed is the highest priority. That being
said, I have a Customers table with information about Claims.

Here is the design today:

Table: Customers
RowId: CustomerId
Family: Claims
Column: ClaimId
Value: JSON(ClaimId, Status, Description, From)

I am storing the ClaimsInfo as a JSON object. This JSON object will be
displayed in a tabular format after querying.

Now I get an additional requirement to sort claims by status.

I resolve this by adding a new Family called 'Status'. (Denormalization +
Redundancy)

Table: Customers
RowId: CustomerId
Family: ClaimStatus
Column: ClaimId
Value: *String*


My concern is, do I continue down this path when more query requirements are
added to the system? For example, when they want to retrieve by 'From', then
I add another family called 'From'?

Should I be creating a new table in that case to support the new family?
Admittedly, the data in these columns is not huge, but I am worried about
doing multiple 'Puts' when the value changes.

Am I on the right track by adding redundancy to keep up with read
performance?

Thanks.


Re: Using HBase on other file systems

2010-05-12 Thread Edward Capriolo
On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell  wrote:

> Before recommending Gluster I suggest you set up a test cluster and then
> randomly kill bricks.
>
> Also as pointed out in another mail, you'll want to colocate TaskTrackers
> on Gluster bricks to get I/O locality, yet there is no way for Gluster to
> export stripe locations back to Hadoop.
>
> It seems a poor choice.
>
>   - Andy
>
> > From: Edward Capriolo
> > Subject: Re: Using HBase on other file systems
> > To: "hbase-user@hadoop.apache.org" 
> > Date: Wednesday, May 12, 2010, 6:38 AM
> > On Tuesday, May 11, 2010, Jeff
> > Hammerbacher 
> > wrote:
> > > Hey Edward,
> > >
> > > I do think that if you compare GoogleFS to HDFS, GFS
> > looks more full
> > >> featured.
> > >>
> > >
> > > What features are you missing? Multi-writer append was
> > explicitly called out
> > > by Sean Quinlan as a bad idea, and rolled back. From
> > internal conversations
> > > with Google engineers, erasure coding of blocks
> > suffered a similar fate.
> > > Native client access would certainly be nice, but FUSE
> > gets you most of the
> > > way there. Scalability/availability of the NN, RPC
> > QoS, alternative block
> > > placement strategies are second-order features which
> > didn't exist in GFS
> > > until later in its lifecycle of development as well.
> > HDFS is following a
> > > similar path and has JIRA tickets with active
> > discussions. I'd love to hear
> > > your feature requests, and I'll be sure to translate
> > them into JIRA tickets.
> > >
> > > I do believe my logic is reasonable. HBase has a lot
> > of code designed around
> > >> HDFS.  We know these tickets that get cited all
> > the time, for better random
> > >> reads, or for sync() support. HBase gets the
> > benefits of HDFS and has to
> > >> deal with its drawbacks. Other key value stores
> > handle storage directly.
> > >>
> > >
> > > Sync() works and will be in the next release, and its
> > absence was simply a
> > > result of the youth of the system. Now that that
> > limitation has been
> > > removed, please point to another place in the code
> > where using HDFS rather
> > > than the local file system is forcing HBase to make
> > compromises. Your
> > > initial attempts on this front (caching, HFile,
> > compactions) were, I hope,
> > > debunked by my previous email. It's also worth noting
> > that Cassandra does
> > > all three, despite managing its own storage.
> > >
> > > I'm trying to learn from this exchange and always
> > enjoy understanding new
> > > systems. Here's what I have so far from your
> > arguments:
> > > 1) HBase inherits both the advantages and
> > disadvantages of HDFS. I clearly
> > > agree on the general point; I'm pressing you to name
> > some specific
> > > disadvantages, in hopes of helping prioritize our
> > development of HDFS. So
> > > far, you've named things which are either a) not
> > actually disadvantages b)
> > > no longer true. If you can come up with the
> > disadvantages, we'll certainly
> > > take them into account. I've certainly got a number of
> > them on our roadmap.
> > > 2) If you don't want to use HDFS, you won't want to
> > use HBase. Also
> > > certainly true, but I'm not sure there's not much to
> > learn from this
> > > assertion. I'd once again ask: why would you not want
> > to use HDFS, and what
> > > is your choice in its stead?
> > >
> > > Thanks,
> > > Jeff
> > >
> >
> > Jeff,
> >
> > Let me first mention that you have mentioned some thing as
> > fixed, that
> > are only fixed in trunk. I consider trunk futureware and I
> > do not like
> > to have tempral conversations. Even when trunk becomes
> > current there
> > is no guarentee that the entire problem is solved. After
> > all appends
> > were fixed in .19 or not , or again?
> >
> > I rescanned the gfs white paper to support my argument that
> > hdfs is
> > stripped down. Found
> > Writes at offset ARE supported
> > Checkpoints
> > Application level checkpoints
> > Snapshot
> > Shadow read only master
> >
> > hdfs chose features it wanted and ignored others that is
> > why I called
> > it a pure map reduce implementation.
> >
> > My main point, is that hbase by nature needs high speed
> > random read
> > and random write. Hdfs by nature is bad at these things. If
> > you can
> > not keep a high cache hit rate via large block cache via
> > ram hbase is
> > going to slam hdfs doing large block reads for small parts
> > of files.
> >
> > So you ask. Me what I would use instead. I do not think
> > there is a
> > viable alternative in the 100 tb and up range but I do
> > think for
> > people in the 20 tb range somethink like gluster that is
> > very
> > performance focused might deliver amazing results in some
> > applications.
> >
>
>
>
>
>
I did not recommend anything

"people in the 20 tb range somethink like gluster that is very
performance focused might deliver amazing results in some
applications."

I used words like "something. like. might."

It may just be an interesting avenu

Re: Problem with performance with many columns in column familie

2010-05-12 Thread Sebastian Bauer

path has stupid bug with double lock...

Index: core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===
--- core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(wersja 942215)
+++ core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(kopia robocza)

@@ -1449,6 +1449,14 @@

   // Run a GET scan and put results into the specified list
   scanner.get(result);
+
+  this.memstore.readLockLock();
+  if (!result.isEmpty()) {
+  KeyValue kv = result.get(0);
+  this.memstore.add(kv);
+}
+  this.memstore.readLockUnlock();
+
 } finally {
   this.lock.readLock().unlock();
 }

W dniu 12.05.2010 17:16, Sebastian Bauer pisze:
I figured out what is taking so long, test data was 1 row with 10 
columns and 1 with 100


when i try to increament column this huge row data didnt land in 
MemStore and times was(test in python after warmup):


before path:
#get one column from big row
1 0:00:00.919464
#get one column from small row
2 0:00:00.009650
#atomicIncrement one column from big row
3 0:00:00.081196
#atomicIncrement one column from small row
4 0:00:00.006530

after path:
#get one column from big row
1 0:00:00.009909
#get one column from small row
2 0:00:00.003489
#atomicIncrement one column from big row
3 0:00:00.004890
#atomicIncrement one column from small row
4 0:00:00.004820


path:

Index: core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===
--- 
core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(wersja 942215)
+++ 
core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(kopia robocza)

@@ -1449,6 +1449,14 @@

   // Run a GET scan and put results into the specified list
   scanner.get(result);
+
+  this.memstore.readLockLock();
+  if (!result.isEmpty()) {
+  KeyValue kv = result.get(0);
+  this.memstore.add(kv);
+}
+  this.memstore.readLockLock();
+
 } finally {
   this.lock.readLock().unlock();
 }

what do you think about this change?
all suggestions welcome because i dont even know java ;)



Sebastian B.

W dniu 11.05.2010 18:58, Ted Yu pisze:

jstack is a handy tool:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html

On Tue, May 11, 2010 at 9:50 AM, Sebastian Bauer  
wrote:



Ram is not a problem, second region server using about 550mB and first
about 300mB problem is with CPU, when i making queries to both column
famielies second region server is using ablut 40% - 80% first about 
10%,
after turning off queries to AdvToUsers(this big) CPU on both 
servers are

2-7%.

Sorry but i dont know how to make thread-dumping and i dont know java.

W dniu 11.05.2010 18:40, Stack pisze:


You could try thread-dumping the regionserver to try and figure where
its hung up.  Counters are usually fast so maybe its something to do
w/ 8k of them in the one row.  What kinda numbers are you seeing?  How
much RAM you throwing at the problem?

Yours,
St.Ack



On Tue, May 11, 2010 at 8:51 AM, Sebastian Bauer
  wrote:




Hi,

maybe i'll get help here :)

I have 2 tables, UserToAdv and AdvToUsers.

UserToAdv is simple:
{ "row_id" =>   [ {"adv:":   },
{"adv:":   },
.about 100 columns
]
only one kind of operation is perform - increasing counter:
client.atomicIncrement("UsersToAdv", ID, column, 1)


AdvToUsers have one column familie: "user:" inside this i have 
about 8000

columns with format: "user:"
what i'm doing on DB is increasing counter inside "user:":

client.atomicIncrement("AdvToUsers", ID, column, 1)

i have 2 regions:


first one:
UsersToAdv,6FEC716B3960D1E8208DE6B06993A68D,1273580007602
stores=1, storefiles=1, storefileSizeMB=8, 
memstoreSizeMB=9,

storefileIndexSizeMB=0
UsersToAdv,0FDD84B9124B98B05A5E40F47C12DC45,1273580531847
stores=1, storefiles=1, storefileSizeMB=4, 
memstoreSizeMB=4,

storefileIndexSizeMB=0
AdvToUsers,5735,1273580575873
stores=1, storefiles=1, storefileSizeMB=15, 
memstoreSizeMB=10,

storefileIndexSizeMB=0
UsersToAdv,67CB411B48A7B83F0B863AC615285060,1273580533380
stores=1, storefiles=1, storefileSizeMB=4, 
memstoreSizeMB=4,

storefileIndexSizeMB=0
UsersToAdv,4012667F3E78C6431E3DD84641002FCE,1273580532995
stores=1, storefiles=1, storefileSizeMB=4, 
memstoreSizeMB=4,

storefileIndexSizeMB=0
UsersToAdv,5FE4A7506737CE0F38E254E62E23FE45,1273580533380
stores=1, storefiles=1, storefileSizeMB=4, 
memstoreSizeMB=4,

storefileIndexSizeMB=0
UsersToAdv,47E95EE30A11EBE45F055AC57EB2676E,1273580532995
stores=1, storefiles=1, storefileSizeMB=4, 
memstoreSizeMB=4,

storefileIndexSizeMB=0
UsersToAdv,37F9573415D9069B7E5810012AAD9CB7,127358053225

Stargate WAR target

2010-05-12 Thread Andrew Purtell
Anybody use it?

   - Andy



  



Re: Using HBase on other file systems

2010-05-12 Thread Andrew Purtell
Before recommending Gluster I suggest you set up a test cluster and then 
randomly kill bricks. 

Also as pointed out in another mail, you'll want to colocate TaskTrackers on 
Gluster bricks to get I/O locality, yet there is no way for Gluster to export 
stripe locations back to Hadoop. 

It seems a poor choice. 

   - Andy

> From: Edward Capriolo
> Subject: Re: Using HBase on other file systems
> To: "hbase-user@hadoop.apache.org" 
> Date: Wednesday, May 12, 2010, 6:38 AM
> On Tuesday, May 11, 2010, Jeff
> Hammerbacher 
> wrote:
> > Hey Edward,
> >
> > I do think that if you compare GoogleFS to HDFS, GFS
> looks more full
> >> featured.
> >>
> >
> > What features are you missing? Multi-writer append was
> explicitly called out
> > by Sean Quinlan as a bad idea, and rolled back. From
> internal conversations
> > with Google engineers, erasure coding of blocks
> suffered a similar fate.
> > Native client access would certainly be nice, but FUSE
> gets you most of the
> > way there. Scalability/availability of the NN, RPC
> QoS, alternative block
> > placement strategies are second-order features which
> didn't exist in GFS
> > until later in its lifecycle of development as well.
> HDFS is following a
> > similar path and has JIRA tickets with active
> discussions. I'd love to hear
> > your feature requests, and I'll be sure to translate
> them into JIRA tickets.
> >
> > I do believe my logic is reasonable. HBase has a lot
> of code designed around
> >> HDFS.  We know these tickets that get cited all
> the time, for better random
> >> reads, or for sync() support. HBase gets the
> benefits of HDFS and has to
> >> deal with its drawbacks. Other key value stores
> handle storage directly.
> >>
> >
> > Sync() works and will be in the next release, and its
> absence was simply a
> > result of the youth of the system. Now that that
> limitation has been
> > removed, please point to another place in the code
> where using HDFS rather
> > than the local file system is forcing HBase to make
> compromises. Your
> > initial attempts on this front (caching, HFile,
> compactions) were, I hope,
> > debunked by my previous email. It's also worth noting
> that Cassandra does
> > all three, despite managing its own storage.
> >
> > I'm trying to learn from this exchange and always
> enjoy understanding new
> > systems. Here's what I have so far from your
> arguments:
> > 1) HBase inherits both the advantages and
> disadvantages of HDFS. I clearly
> > agree on the general point; I'm pressing you to name
> some specific
> > disadvantages, in hopes of helping prioritize our
> development of HDFS. So
> > far, you've named things which are either a) not
> actually disadvantages b)
> > no longer true. If you can come up with the
> disadvantages, we'll certainly
> > take them into account. I've certainly got a number of
> them on our roadmap.
> > 2) If you don't want to use HDFS, you won't want to
> use HBase. Also
> > certainly true, but I'm not sure there's not much to
> learn from this
> > assertion. I'd once again ask: why would you not want
> to use HDFS, and what
> > is your choice in its stead?
> >
> > Thanks,
> > Jeff
> >
> 
> Jeff,
> 
> Let me first mention that you have mentioned some thing as
> fixed, that
> are only fixed in trunk. I consider trunk futureware and I
> do not like
> to have tempral conversations. Even when trunk becomes
> current there
> is no guarentee that the entire problem is solved. After
> all appends
> were fixed in .19 or not , or again?
> 
> I rescanned the gfs white paper to support my argument that
> hdfs is
> stripped down. Found
> Writes at offset ARE supported
> Checkpoints
> Application level checkpoints
> Snapshot
> Shadow read only master
> 
> hdfs chose features it wanted and ignored others that is
> why I called
> it a pure map reduce implementation.
> 
> My main point, is that hbase by nature needs high speed
> random read
> and random write. Hdfs by nature is bad at these things. If
> you can
> not keep a high cache hit rate via large block cache via
> ram hbase is
> going to slam hdfs doing large block reads for small parts
> of files.
> 
> So you ask. Me what I would use instead. I do not think
> there is a
> viable alternative in the 100 tb and up range but I do
> think for
> people in the 20 tb range somethink like gluster that is
> very
> performance focused might deliver amazing results in some
> applications.
> 






Re: Enabling Indexing in HBase

2010-05-12 Thread Jean-Daniel Cryans
Yes, you can also create a HBaseConfiguration object and configure it
with those exact configs (that you then provide to HTable).

J-D

On Wed, May 12, 2010 at 1:22 AM, Michelan Arendse  wrote:
> Thank you. I have added the configuration folder to my client class path and 
> it worked.
>
> Now I am faced with another issue, since this application will be used in 
> ColdFusion is there a way of making this work without having the 
> configuration as part of the class path?
>
> -Original Message-
> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
> Cryans
> Sent: 11 May 2010 06:26 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Enabling Indexing in HBase
>
> Per 
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview
> your client has to know where your zookeeper setup is. Since you want
> to use HBase in a distributed fashion, that means you went through
> http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib
> and this is where the required configs are.
>
> It could be made more obvious tho.
>
> J-D
>
> On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse  
> wrote:
>> Thanks. I have added that to the class path, but I still get an error.
>> This is the error that I get:
>>
>> 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, 
>> connectString=localhost:2181 sessionTimeout=6 
>> watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@12d15a9
>> 10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection to server 
>> localhost/127.0.0.1:2181
>> 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing session 0x0 
>> to sun.nio.ch.selectionkeyi...@b0ce8f
>> java.net.ConnectException: Connection refused: no further information
>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
>> 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception during 
>> shutdown input
>>
>> I'm working of a server and not standalone mode, where would I change a 
>> setting that tells the "connectString" to point to the server instead of 
>> "localhost".
>>
>> -Original Message-
>> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of 
>> Jean-Daniel Cryans
>> Sent: 10 May 2010 07:05 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Enabling Indexing in HBase
>>
>> Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in
>> your class path?
>>
>> J-D
>>
>> On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse  
>> wrote:
>>> Hi.
>>>
>>> I added the following properties  to hbase-site.xml
>>> 
>>>        hbase.regionserver.class
>>>        org.apache.hadoop.hbase.ipc.IndexedRegionInterface
>>>    
>>>
>>>    
>>>        hbase.regionserver.impl
>>>        
>>>        org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
>>>        
>>>    
>>>
>>> I'm using hbase 0.20.3 and when I start hbase now it comes with the 
>>> following:
>>> ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
>>> java.lang.UnsupportedOperationException: Unable to find region server 
>>> interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface
>>> Caused by: java.lang.ClassNotFoundException: 
>>> org.apache.hadoop.hbase.ipc.IndexedRegionInterface
>>>
>>> Can you please help with this problem that I am having.
>>>
>>> Thank you,
>>>
>>> Michelan Arendse
>>> Junior Developer | AD:DYNAMO // happy business ;-)
>>> Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587
>>>
>>> Advertise Online Instantly - www.addynamo.com 
>>> 
>>>
>>>
>>
>


Re: HBase client hangs after upgrade to 0.20.4 when used from reducer

2010-05-12 Thread Jean-Daniel Cryans
Friso,

Unfortunately it's hard to determine the cause with the provided
information, the client call you pasted is pretty much normal i.e. the
client is waiting to receive a result from a region server.

The fact that you can't shut down the master when this happens is very
concerning. Do you still have those logs around? Same for the region
servers? Can you post this in pastebin or on a web server?

Also, feel free to come chat with us on IRC, it's always easier to
debug when live. #hbase on freenode

J-D

On Wed, May 12, 2010 at 8:31 AM, Friso van Vollenhoven
 wrote:
> Hi all,
>
> I am using Hadoop (0.20.2) and HBase to periodically import data (every 15 
> minutes). There are a number of import processes, but generally they all 
> create a sequence file on HDFS, which is then run through a MapReduce job. 
> The MapReduce uses the identity mapper (the input file is a Hadoop sequence 
> file) and a specialized reducer that does the following:
> - Combine the values for a key into one value
> - Do a Get from HBase to retrieve existing values for the same key
> - Combine the existing value from HBase and the new one into one value again
> - Put the final value into HBase under the same key (thus 'overwrite' the 
> existing row; I keep only one version)
>
> After I upgraded HBase to the 0.20.4 release, the reducers sometimes start 
> hanging on a Get. When the jobs start, some reducers run to completion fine, 
> but after a while the last reducers will start to hang. Eventually the 
> reducers are killed of by Hadoop (after 600 secs).
>
> I did a thread dump for one of the hanging reducers. It looks like this:
> "main" prio=10 tid=0x48083800 nid=0x4c93 in Object.wait() 
> [0x420ca000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x2eb50d70> (a 
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>        at java.lang.Object.wait(Object.java:485)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
>        - locked <0x2eb50d70> (a 
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>        at $Proxy2.get(Unknown Source)
>        at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:450)
>        at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:448)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:447)
>        at 
> net.ripe.inrdb.hbase.accessor.real.HBaseTableAccessor.get(HBaseTableAccessor.java:36)
>        at 
> net.ripe.inrdb.hbase.store.HBaseStoreUpdater.getExistingRecords(HBaseStoreUpdater.java:101)
>        at 
> net.ripe.inrdb.hbase.store.HBaseStoreUpdater.mergeTimelinesWithExistingRecords(HBaseStoreUpdater.java:60)
>        at 
> net.ripe.inrdb.hbase.store.HBaseStoreUpdater.doInsert(HBaseStoreUpdater.java:40)
>        at 
> net.ripe.inrdb.core.store.SinglePartitionStore$Updater.insert(SinglePartitionStore.java:92)
>        at 
> net.ripe.inrdb.core.store.CompositeStore$CompositeStoreUpdater.insert(CompositeStore.java:142)
>        at 
> net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:70)
>        at 
> net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:17)
>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>        at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> So the client hangs in a wait() call, waiting on a HBaseClient$Call object. I 
> looked at the code. The wait is in a while() loop and has no time out, so it 
> figures that it never gets out of there if no notify() gets called on the 
> object. I am not sure for exactly what condition it is waiting, however.
>
> Meanwhile, after this has happened, I cannot shutdown the master server 
> normally. I have to kill -9 it, to make it shut down. Normally and before 
> this problem occurs, the master server shuts down just fine. (Sorry, didn't 
> do a thread dump of the master and now I downgraded to 0.20.3 again.)
>
> I cannot reproduce this error on my local setup (developer machine). It only 
> occurs on our (currently modest) cluster of one machine running 
> master+NN+Zookeeper and four datanodes which are all task trackers and region 
> servers as well. The inputs to the periodic MapReduce jobs are very small 
> (ranging from some Kb to several Mb) and thus contain not so many records. I 
> know this is not very efficient to do in MapReduce and will be faster when 
> inserted in process by the importer process because of startup overhead, but 
> we are setting up this architecture of importers and insertion for 
> anticipated

Re: Enabling IHbase

2010-05-12 Thread Stack
You saw this package doc over in the ihbase's new home on github?
http://github.com/ykulbak/ihbase/blob/master/src/main/java/org/apache/hadoop/hbase/client/idx/package.html
 It'll read better if you build the javadoc.  There is also this:
http://github.com/ykulbak/ihbase/blob/master/README

St.Ack

On Wed, May 12, 2010 at 8:27 AM, Renato Marroquín Mogrovejo
 wrote:
> Hi Alex,
>
> Thanks for your help, but I meant something more like a how-to set it up
> thing, or like a tutorial of it (=
> I also read these ones if anyone else is interested.
>
> http://blog.sematext.com/2010/03/31/hbase-digest-march-2010/
> http://search-hadoop.com/m/5MBst1uL87b1
>
> Renato M.
>
>
>
> 2010/5/12 alex kamil 
>
>> regarding usage this may be helpful
>> https://issues.apache.org/jira/browse/HBASE-2167
>>
>>
>> On Wed, May 12, 2010 at 10:48 AM, alex kamil  wrote:
>>
>>> Renato,
>>>
>>> just noticed you are looking for *Indexed *Hbase
>>>
>>> i found this
>>> http://blog.reactive.org/2010/03/indexed-hbase-it-might-not-be-what-you.html
>>>
>>> Alex
>>>
>>>
>>> On Wed, May 12, 2010 at 10:42 AM, alex kamil wrote:
>>>

 http://www.google.com/search?hl=en&source=hp&q=hbase+tutorial&aq=f&aqi=g-p1g-sx3g1g-sx4g-msx1&aql=&oq=&gs_rfai=


 On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo <
 renatoj.marroq...@gmail.com> wrote:

> Hi eveyone,
>
> I just read about IHbase and seems like something I could give it a try,
> but
> I haven't been able to find information (besides descriptions and
> advantages) regarding to how to install it or use it.
> Thanks in advance.
>
> Renato M.
>


>>>
>>
>


HBase client hangs after upgrade to 0.20.4 when used from reducer

2010-05-12 Thread Friso van Vollenhoven
Hi all,

I am using Hadoop (0.20.2) and HBase to periodically import data (every 15 
minutes). There are a number of import processes, but generally they all create 
a sequence file on HDFS, which is then run through a MapReduce job. The 
MapReduce uses the identity mapper (the input file is a Hadoop sequence file) 
and a specialized reducer that does the following:
- Combine the values for a key into one value
- Do a Get from HBase to retrieve existing values for the same key
- Combine the existing value from HBase and the new one into one value again
- Put the final value into HBase under the same key (thus 'overwrite' the 
existing row; I keep only one version)

After I upgraded HBase to the 0.20.4 release, the reducers sometimes start 
hanging on a Get. When the jobs start, some reducers run to completion fine, 
but after a while the last reducers will start to hang. Eventually the reducers 
are killed of by Hadoop (after 600 secs).

I did a thread dump for one of the hanging reducers. It looks like this:
"main" prio=10 tid=0x48083800 nid=0x4c93 in Object.wait() 
[0x420ca000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2eb50d70> (a 
org.apache.hadoop.hbase.ipc.HBaseClient$Call)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
- locked <0x2eb50d70> (a 
org.apache.hadoop.hbase.ipc.HBaseClient$Call)
at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
at $Proxy2.get(Unknown Source)
at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:450)
at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:448)
at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:447)
at 
net.ripe.inrdb.hbase.accessor.real.HBaseTableAccessor.get(HBaseTableAccessor.java:36)
at 
net.ripe.inrdb.hbase.store.HBaseStoreUpdater.getExistingRecords(HBaseStoreUpdater.java:101)
at 
net.ripe.inrdb.hbase.store.HBaseStoreUpdater.mergeTimelinesWithExistingRecords(HBaseStoreUpdater.java:60)
at 
net.ripe.inrdb.hbase.store.HBaseStoreUpdater.doInsert(HBaseStoreUpdater.java:40)
at 
net.ripe.inrdb.core.store.SinglePartitionStore$Updater.insert(SinglePartitionStore.java:92)
at 
net.ripe.inrdb.core.store.CompositeStore$CompositeStoreUpdater.insert(CompositeStore.java:142)
at 
net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:70)
at 
net.ripe.inrdb.importer.StoreInsertReducer.reduce(StoreInsertReducer.java:17)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

So the client hangs in a wait() call, waiting on a HBaseClient$Call object. I 
looked at the code. The wait is in a while() loop and has no time out, so it 
figures that it never gets out of there if no notify() gets called on the 
object. I am not sure for exactly what condition it is waiting, however.

Meanwhile, after this has happened, I cannot shutdown the master server 
normally. I have to kill -9 it, to make it shut down. Normally and before this 
problem occurs, the master server shuts down just fine. (Sorry, didn't do a 
thread dump of the master and now I downgraded to 0.20.3 again.)

I cannot reproduce this error on my local setup (developer machine). It only 
occurs on our (currently modest) cluster of one machine running 
master+NN+Zookeeper and four datanodes which are all task trackers and region 
servers as well. The inputs to the periodic MapReduce jobs are very small 
(ranging from some Kb to several Mb) and thus contain not so many records. I 
know this is not very efficient to do in MapReduce and will be faster when 
inserted in process by the importer process because of startup overhead, but we 
are setting up this architecture of importers and insertion for anticipated 
larger loads (up to 80 million records per day).

Does anyone have a clue about what happens? Or where to look for further 
investigation?

Thanks a lot!


Cheers,
Friso



Re: Enabling IHbase

2010-05-12 Thread Renato Marroquín Mogrovejo
Hi Alex,

Thanks for your help, but I meant something more like a how-to set it up
thing, or like a tutorial of it (=
I also read these ones if anyone else is interested.

http://blog.sematext.com/2010/03/31/hbase-digest-march-2010/
http://search-hadoop.com/m/5MBst1uL87b1

Renato M.



2010/5/12 alex kamil 

> regarding usage this may be helpful
> https://issues.apache.org/jira/browse/HBASE-2167
>
>
> On Wed, May 12, 2010 at 10:48 AM, alex kamil  wrote:
>
>> Renato,
>>
>> just noticed you are looking for *Indexed *Hbase
>>
>> i found this
>> http://blog.reactive.org/2010/03/indexed-hbase-it-might-not-be-what-you.html
>>
>> Alex
>>
>>
>> On Wed, May 12, 2010 at 10:42 AM, alex kamil wrote:
>>
>>>
>>> http://www.google.com/search?hl=en&source=hp&q=hbase+tutorial&aq=f&aqi=g-p1g-sx3g1g-sx4g-msx1&aql=&oq=&gs_rfai=
>>>
>>>
>>> On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo <
>>> renatoj.marroq...@gmail.com> wrote:
>>>
 Hi eveyone,

 I just read about IHbase and seems like something I could give it a try,
 but
 I haven't been able to find information (besides descriptions and
 advantages) regarding to how to install it or use it.
 Thanks in advance.

 Renato M.

>>>
>>>
>>
>


Re: Problem with performance with many columns in column familie

2010-05-12 Thread Sebastian Bauer
I figured out what is taking so long, test data was 1 row with 10 
columns and 1 with 100


when i try to increament column this huge row data didnt land in 
MemStore and times was(test in python after warmup):


before path:
#get one column from big row
1 0:00:00.919464
#get one column from small row
2 0:00:00.009650
#atomicIncrement one column from big row
3 0:00:00.081196
#atomicIncrement one column from small row
4 0:00:00.006530

after path:
#get one column from big row
1 0:00:00.009909
#get one column from small row
2 0:00:00.003489
#atomicIncrement one column from big row
3 0:00:00.004890
#atomicIncrement one column from small row
4 0:00:00.004820


path:

Index: core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===
--- core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(wersja 942215)
+++ core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(kopia robocza)

@@ -1449,6 +1449,14 @@

   // Run a GET scan and put results into the specified list
   scanner.get(result);
+
+  this.memstore.readLockLock();
+  if (!result.isEmpty()) {
+  KeyValue kv = result.get(0);
+  this.memstore.add(kv);
+}
+  this.memstore.readLockLock();
+
 } finally {
   this.lock.readLock().unlock();
 }

what do you think about this change?
all suggestions welcome because i dont even know java ;)



Sebastian B.

W dniu 11.05.2010 18:58, Ted Yu pisze:

jstack is a handy tool:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html

On Tue, May 11, 2010 at 9:50 AM, Sebastian Bauer  wrote:

   

Ram is not a problem, second region server using about 550mB and first
about 300mB problem is with CPU, when i making queries to both column
famielies second region server is using ablut 40% - 80% first about 10%,
after turning off queries to AdvToUsers(this big) CPU on both servers are
2-7%.

Sorry but i dont know how to make thread-dumping and i dont know java.

W dniu 11.05.2010 18:40, Stack pisze:

 

You could try thread-dumping the regionserver to try and figure where
its hung up.  Counters are usually fast so maybe its something to do
w/ 8k of them in the one row.  What kinda numbers are you seeing?  How
much RAM you throwing at the problem?

Yours,
St.Ack



On Tue, May 11, 2010 at 8:51 AM, Sebastian Bauer
  wrote:



   

Hi,

maybe i'll get help here :)

I have 2 tables, UserToAdv and AdvToUsers.

UserToAdv is simple:
{ "row_id" =>   [ {"adv:":   },
{"adv:":   },
.about 100 columns
]
only one kind of operation is perform - increasing counter:
client.atomicIncrement("UsersToAdv", ID, column, 1)


AdvToUsers have one column familie: "user:" inside this i have about 8000
columns with format: "user:"
what i'm doing on DB is increasing counter inside "user:":

client.atomicIncrement("AdvToUsers", ID, column, 1)

i have 2 regions:


first one:
UsersToAdv,6FEC716B3960D1E8208DE6B06993A68D,1273580007602
stores=1, storefiles=1, storefileSizeMB=8, memstoreSizeMB=9,
storefileIndexSizeMB=0
UsersToAdv,0FDD84B9124B98B05A5E40F47C12DC45,1273580531847
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
AdvToUsers,5735,1273580575873
stores=1, storefiles=1, storefileSizeMB=15, memstoreSizeMB=10,
storefileIndexSizeMB=0
UsersToAdv,67CB411B48A7B83F0B863AC615285060,1273580533380
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,4012667F3E78C6431E3DD84641002FCE,1273580532995
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,5FE4A7506737CE0F38E254E62E23FE45,1273580533380
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,47E95EE30A11EBE45F055AC57EB2676E,1273580532995
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,37F9573415D9069B7E5810012AAD9CB7,1273580532258
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,1FFFDF082566D93153B34BFE0C44A9BF,1273580532173
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,17C93FB0047BC4D660C6570B734CBE17,1273580531847
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,27DFD8F02CD98FF57E8334837C73C57A,1273580532173
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0

second one:
UsersToAdv,57C568066D35D09B4AF6CD7D68681144,1273580533427
stores=1, storefiles=1, storefileSizeMB=4, memstoreSizeMB=4,
storefileIndexSizeMB=0
UsersToAdv,4FA6A1A2681E2D2

Re: Enabling IHbase

2010-05-12 Thread alex kamil
regarding usage this may be helpful
https://issues.apache.org/jira/browse/HBASE-2167

On Wed, May 12, 2010 at 10:48 AM, alex kamil  wrote:

> Renato,
>
> just noticed you are looking for *Indexed *Hbase
>
> i found this
> http://blog.reactive.org/2010/03/indexed-hbase-it-might-not-be-what-you.html
>
> Alex
>
>
> On Wed, May 12, 2010 at 10:42 AM, alex kamil  wrote:
>
>>
>> http://www.google.com/search?hl=en&source=hp&q=hbase+tutorial&aq=f&aqi=g-p1g-sx3g1g-sx4g-msx1&aql=&oq=&gs_rfai=
>>
>>
>> On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> Hi eveyone,
>>>
>>> I just read about IHbase and seems like something I could give it a try,
>>> but
>>> I haven't been able to find information (besides descriptions and
>>> advantages) regarding to how to install it or use it.
>>> Thanks in advance.
>>>
>>> Renato M.
>>>
>>
>>
>


Re: Enabling IHbase

2010-05-12 Thread alex kamil
Renato,

just noticed you are looking for *Indexed *Hbase

i found this
http://blog.reactive.org/2010/03/indexed-hbase-it-might-not-be-what-you.html

Alex

On Wed, May 12, 2010 at 10:42 AM, alex kamil  wrote:

>
> http://www.google.com/search?hl=en&source=hp&q=hbase+tutorial&aq=f&aqi=g-p1g-sx3g1g-sx4g-msx1&aql=&oq=&gs_rfai=
>
>
> On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hi eveyone,
>>
>> I just read about IHbase and seems like something I could give it a try,
>> but
>> I haven't been able to find information (besides descriptions and
>> advantages) regarding to how to install it or use it.
>> Thanks in advance.
>>
>> Renato M.
>>
>
>


Re: Enabling IHbase

2010-05-12 Thread alex kamil
http://www.google.com/search?hl=en&source=hp&q=hbase+tutorial&aq=f&aqi=g-p1g-sx3g1g-sx4g-msx1&aql=&oq=&gs_rfai=

On Wed, May 12, 2010 at 10:25 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi eveyone,
>
> I just read about IHbase and seems like something I could give it a try,
> but
> I haven't been able to find information (besides descriptions and
> advantages) regarding to how to install it or use it.
> Thanks in advance.
>
> Renato M.
>


Enabling IHbase

2010-05-12 Thread Renato Marroquín Mogrovejo
Hi eveyone,

I just read about IHbase and seems like something I could give it a try, but
I haven't been able to find information (besides descriptions and
advantages) regarding to how to install it or use it.
Thanks in advance.

Renato M.


Re: Using HBase on other file systems

2010-05-12 Thread Edward Capriolo
On Tuesday, May 11, 2010, Jeff Hammerbacher  wrote:
> Hey Edward,
>
> I do think that if you compare GoogleFS to HDFS, GFS looks more full
>> featured.
>>
>
> What features are you missing? Multi-writer append was explicitly called out
> by Sean Quinlan as a bad idea, and rolled back. From internal conversations
> with Google engineers, erasure coding of blocks suffered a similar fate.
> Native client access would certainly be nice, but FUSE gets you most of the
> way there. Scalability/availability of the NN, RPC QoS, alternative block
> placement strategies are second-order features which didn't exist in GFS
> until later in its lifecycle of development as well. HDFS is following a
> similar path and has JIRA tickets with active discussions. I'd love to hear
> your feature requests, and I'll be sure to translate them into JIRA tickets.
>
> I do believe my logic is reasonable. HBase has a lot of code designed around
>> HDFS.  We know these tickets that get cited all the time, for better random
>> reads, or for sync() support. HBase gets the benefits of HDFS and has to
>> deal with its drawbacks. Other key value stores handle storage directly.
>>
>
> Sync() works and will be in the next release, and its absence was simply a
> result of the youth of the system. Now that that limitation has been
> removed, please point to another place in the code where using HDFS rather
> than the local file system is forcing HBase to make compromises. Your
> initial attempts on this front (caching, HFile, compactions) were, I hope,
> debunked by my previous email. It's also worth noting that Cassandra does
> all three, despite managing its own storage.
>
> I'm trying to learn from this exchange and always enjoy understanding new
> systems. Here's what I have so far from your arguments:
> 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly
> agree on the general point; I'm pressing you to name some specific
> disadvantages, in hopes of helping prioritize our development of HDFS. So
> far, you've named things which are either a) not actually disadvantages b)
> no longer true. If you can come up with the disadvantages, we'll certainly
> take them into account. I've certainly got a number of them on our roadmap.
> 2) If you don't want to use HDFS, you won't want to use HBase. Also
> certainly true, but I'm not sure there's not much to learn from this
> assertion. I'd once again ask: why would you not want to use HDFS, and what
> is your choice in its stead?
>
> Thanks,
> Jeff
>

Jeff,

Let me first mention that you have mentioned some thing as fixed, that
are only fixed in trunk. I consider trunk futureware and I do not like
to have tempral conversations. Even when trunk becomes current there
is no guarentee that the entire problem is solved. After all appends
were fixed in .19 or not , or again?

I rescanned the gfs white paper to support my argument that hdfs is
stripped down. Found
Writes at offset ARE supported
Checkpoints
Application level checkpoints
Snapshot
Shadow read only master

hdfs chose features it wanted and ignored others that is why I called
it a pure map reduce implementation.

My main point, is that hbase by nature needs high speed random read
and random write. Hdfs by nature is bad at these things. If you can
not keep a high cache hit rate via large block cache via ram hbase is
going to slam hdfs doing large block reads for small parts of files.

So you ask. Me what I would use instead. I do not think there is a
viable alternative in the 100 tb and up range but I do think for
people in the 20 tb range somethink like gluster that is very
performance focused might deliver amazing results in some
applications.


Re: Regarding IntSet implementation

2010-05-12 Thread Ram Kulbak
Hi Lekhnath,

The IntSets are package protected so that their callers will always use the
IntSet interface, thus preventing manipulation of the IntSet after it was
built and hiding implementation details. It seems to me that having an index
which can spill to disk may be a handy feature, perhaps you can create a
patch with your suggested changes/additions?
The latest version of IHBASE can be obtained from
http://github.com/ykulbak/ihbase

Cheers,
Yoram


On Mon, May 10, 2010 at 9:17 PM, Lekhnath  wrote:

> Hi folks,
> I have to use numerous search criteria and each having lots of distinct
>  values. So, the secondary indexing like IHBase will require lots of memory.
> I think I require a custom index implementation in which I decided to
> persist some of the IHBase like implementation. For that case I need to
> reuse the IHBase' IntSet implementations. They are package protected so that
> I could not extend the implementation and  am forced to rewrite the code.
> Is there any good reason why the implementations are package protected.
>
> Thanks,
> Lekhnath
>
>
>
> This email is intended for the recipient only. If you are not the intended
> recipient please disregard, and do not use the information for any purpose.
>


RE: Enabling Indexing in HBase

2010-05-12 Thread Michelan Arendse
Thank you. I have added the configuration folder to my client class path and it 
worked.

Now I am faced with another issue, since this application will be used in 
ColdFusion is there a way of making this work without having the configuration 
as part of the class path?

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
Cryans
Sent: 11 May 2010 06:26 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Enabling Indexing in HBase

Per 
http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview
your client has to know where your zookeeper setup is. Since you want
to use HBase in a distributed fashion, that means you went through
http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib
and this is where the required configs are.

It could be made more obvious tho.

J-D

On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse  wrote:
> Thanks. I have added that to the class path, but I still get an error.
> This is the error that I get:
>
> 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=localhost:2181 sessionTimeout=6 
> watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@12d15a9
> 10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection to server 
> localhost/127.0.0.1:2181
> 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing session 0x0 to 
> sun.nio.ch.selectionkeyi...@b0ce8f
> java.net.ConnectException: Connection refused: no further information
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
> 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception during 
> shutdown input
>
> I'm working of a server and not standalone mode, where would I change a 
> setting that tells the "connectString" to point to the server instead of 
> "localhost".
>
> -Original Message-
> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
> Cryans
> Sent: 10 May 2010 07:05 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Enabling Indexing in HBase
>
> Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in
> your class path?
>
> J-D
>
> On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse  
> wrote:
>> Hi.
>>
>> I added the following properties  to hbase-site.xml
>> 
>>        hbase.regionserver.class
>>        org.apache.hadoop.hbase.ipc.IndexedRegionInterface
>>    
>>
>>    
>>        hbase.regionserver.impl
>>        
>>        org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
>>        
>>    
>>
>> I'm using hbase 0.20.3 and when I start hbase now it comes with the 
>> following:
>> ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
>> java.lang.UnsupportedOperationException: Unable to find region server 
>> interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface
>> Caused by: java.lang.ClassNotFoundException: 
>> org.apache.hadoop.hbase.ipc.IndexedRegionInterface
>>
>> Can you please help with this problem that I am having.
>>
>> Thank you,
>>
>> Michelan Arendse
>> Junior Developer | AD:DYNAMO // happy business ;-)
>> Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587
>>
>> Advertise Online Instantly - www.addynamo.com 
>> 
>>
>>
>