Re: RowKey hashing in HBase 1.0

2015-05-05 Thread Koert Kuipers
we do this for almost all our tables
On May 5, 2015 11:05 AM, "jeremy p"  wrote:

> Thank you for your response!
>
> So I guess 'salt' is a bit of a misnomer.  What I used to do is this :
>
> 1) Say that my key value is something like '1234foobar'
> 2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
> 3) I mod the hash by my number of regions.  Let's say I have 2000 regions.
>  54824923 % 2000 = 923
> 4) I prepend that value to my original key value, so my new key is
> '923_1234foobar'
>
> Is this the same thing you were talking about?
>
> A couple questions :
>
> * Why would my regions only be 1/2 full?
> * Why would I only use this for sequential keys?  I would think this would
> give better performance in any situation where I don't need range scans.
> For example, let's say my key value is a person's last name.  That will
> naturally cluster around certain letters, giving me an uneven distribution.
>
> --Jeremy
>
>
>
> On Sun, May 3, 2015 at 11:46 AM, Michael Segel 
> wrote:
>
> > Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
> > random) to the base table row key.
> > You’re better off using a truncated hash (md5 is fastest) so that at
> least
> > you can use a single get().
> >
> > Common?
> >
> > Only if your row key is mostly sequential.
> >
> > Note that even with bucketing, you will still end up with regions only
> 1/2
> > full with the only exception being the last region.
> >
> > > On May 1, 2015, at 11:09 AM, jeremy p 
> > wrote:
> > >
> > > Hello all,
> > >
> > > I've been out of the HBase world for a while, and I'm just now jumping
> > back
> > > in.
> > >
> > > As of HBase .94, it was still common to take a hash of your RowKey and
> > use
> > > that to "salt" the beginning of your RowKey to obtain an even
> > distribution
> > > among your region servers.  Is this still a common practice, or is
> there
> > a
> > > better way to do this in HBase 1.0?
> > >
> > > --Jeremy
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> > thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
>


Re: 0.94 going forward

2014-12-15 Thread Koert Kuipers
given that CDH4 is hbase 0.94 i dont believe nobody is using it. for our
clients the majority is on 0.94 (versus 0.96 and up).

so i am going with 1), its very stable!

On Mon, Dec 15, 2014 at 1:53 PM, lars hofhansl  wrote:
>
> Over the past few months the rate of the change into 0.94 has slowed
> significantly.
> 0.94.25 was released on Nov 15th, and since then we had only 4 changes.
>
> This could mean two things: (1) 0.94 is very stable now or (2) nobody is
> using it (at least nobody is contributing to it anymore).
>
>
> If anybody out there is still using 0.94 and is not planning to upgrade to
> 0.98 or later soon (which will required downtime), please speak up.
> Otherwise it might be time to think about EOL'ing 0.94.
>
> It's not actually much work to do these releases, especially when they are
> so small, but I'd like to continue only if they are actually used.
> In any case, I am going to spin 0.94.26 with the current 4 fixes today or
> tomorrow.
>
> -- Lars
>


Re: Lighter Map/Reduce on HBase

2014-04-14 Thread Koert Kuipers
we do these jobs in cascading/scalding
On Apr 9, 2014 5:56 AM, "Henning Blohm"  wrote:

> We operate a solution that stores large amounts of data in HBASE that needs
> to be available for online access.
>
> For efficient scanning, there are three pieces of data encoded in row keys
> (in particular a time dimension) and for other reasons some columns hold
> JSON encoded data.
>
> Currently, analytics data is created in two ways:
>
> a) a non-trivial M/R job that computes pre-aggregated data sets and
> offloads them into an analytical data base for interactive reporting
> b) other M/R jobs that create specialize reports (heuristics) that cannot
> be computed from pre-aggregated data
>
> In particular for b) but possibly also for variations of a) I would like to
> find more "user friendly" ways than Java implemented M/R jobs - at least
> for some cases.
>
> So this is not about interactive querying of data directly from HBase
> tables. It is rather about pre-processing HBase stored (large) data sets
> into either input to interactive query engines (some other DB, Phoenix,...)
> or into some other specialized format.
>
> I spent some time with HIVE but found that the HBase integration simply
> doesn't cut it (parsing a row key, mapping JSON column content). I know
> there is some more out there, but before spending an eternity trying out
> various methods, I am shamelessly trying to benefit from your expertise by
> asking for some good pointers.
>
> Thanks,
> Henning
>


replication

2014-03-08 Thread Koert Kuipers
do i understand it correctly that it is safe to have 2 hbase clusters
replicate to each other (so in both directions)?

and as long as an update (put/delete) arrives at only one cluster this
setup will function correctly?

not entirely sure about situation where updates get routed to both
clusters, but i see no immediate harm if the operations are idempotent...

thanks! koert


is a hbase client HA?

2014-02-25 Thread Koert Kuipers
we had a master go down on a hbase 0.96 cluster with HA. the second master
took over and the hbase cluster continued to function. great! however
hbase-rest got stuck in a loop spitting out error messages. see below.
is something like hbase-rest, which uses the hbase client api, supposed to
survive a master failure?
thanks! koert

2014-02-25 01:12:41,373 INFO org.apache.zookeeper.
ClientCnxn: Unable to read additional data from server sessionid
0x440dd5dbbe0028, likely server has closed socket, closing s\
ocket connection and attempting reconnect
2014-02-25 01:12:41,931 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server master01/10.111.111..41:2181. Will not attempt
to authenticat\
e using SASL (java.lang.SecurityException: Unable to locate a login
configuration)
2014-02-25 01:12:55,278 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 13804ms for sessionid
0x440dd5dbbe0028, closing socket co\
nnection and attempting reconnect
2014-02-25 01:12:55,776 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server master02/10.111.111..42:2181. Will not attempt
to authenticat\
e using SASL (java.lang.SecurityException: Unable to locate a login
configuration)
2014-02-25 01:12:55,776 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to master02/10.111.111..42:2181, initiating session
2014-02-25 01:12:55,778 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server master02/10.111.111..42:2181, sessionid =
0x440dd5dbbe00\
28, negotiated timeout = 4
2014-02-25 01:17:54,246 ERROR org.mortbay.log: /
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy7.getHTableDescriptors(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1858)
at
org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:278)
at
org.apache.hadoop.hbase.rest.RootResource.getTableList(RootResource.java:63)
at
org.apache.hadoop.hbase.rest.RootResource.get(RootResource.java:79)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205\
)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at
org.apache.hadoop.hbase.rest.filter.GzipFilter.doFilter(GzipFilter.java:73)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpCo

Re: Scalability of stargate

2014-02-05 Thread Koert Kuipers
stargate can be run distributed behind load balancer to scale out. however
the scanners are implemented stateful i think, so i would suggest to stay
away from those if you load balance.


On Wed, Feb 5, 2014 at 10:33 AM, jeevi tesh  wrote:

>   Hi,
> Planning to use hbase stargate in project.
> Just had a concern in areas of scalability of stargate. Can all the
> operation of hbase can be performed using stargate?
>
> Sent from my Windows Phone
>


Re: HBase 6x bigger than raw data

2014-01-27 Thread Koert Kuipers
if compression is already enabled on a column family, do i understand it
correctly that the main benefit of DATA_BLOCK_ENCODING is in memory?


On Mon, Jan 27, 2014 at 6:02 PM, Nick Xie  wrote:

> Thanks all for the information. Appreciated!! I'll take a look and try.
>
> Thanks,
>
> Nick
>
>
>
>
> On Mon, Jan 27, 2014 at 2:43 PM, Vladimir Rodionov
> wrote:
>
> > Overhead of storing small values is quite high in HBase unless you use
> > DATA_BLOCK_ENCODING
> > (not available in 0.92). I recommend you moving to 0.94.latest.
> >
> > See: https://issues.apache.org/jira/browse/HBASE-4218
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodio...@carrieriq.com
> >
> > 
> > From: Nick Xie [nick.xie.had...@gmail.com]
> > Sent: Monday, January 27, 2014 2:40 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase 6x bigger than raw data
> >
> > Tom,
> >
> > Yes, you are right. According to this analysis (
> >
> >
> http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html
> > )
> > if it is right, then the overhead is quite big if the cell value
> > occupies
> > a small portion.
> >
> > In the analysis in that link, the overhead is actually 10x(the real
> > values only takes 12B and it costs 123B in HBase to store them...) Is
> that
> > real
> >
> > In this case, should we do some combination to reduce the overhead?
> >
> > Thanks,
> >
> > Nick
> >
> >
> >
> >
> > On Mon, Jan 27, 2014 at 2:33 PM, Tom Brown  wrote:
> >
> > > I believe each cell stores its own copy of the entire row key, column
> > > qualifier, and timestamp. Could that account for the increase in size?
> > >
> > > --Tom
> > >
> > >
> > > On Mon, Jan 27, 2014 at 3:12 PM, Nick Xie 
> > > wrote:
> > >
> > > > I'm importing a set of data into HBase. The CSV file contains 82
> > entries
> > > > per line. Starting with 8 byte ID, followed by 16 byte date and the
> > rest
> > > > are 80 numbers with 4 bytes each.
> > > >
> > > > The current HBase schema is: ID as row key, date as a 'date' family
> > with
> > > > 'value' qualifier, the rest is in another family called 'readings'
> with
> > > > 'P0', 'P1', 'P2', ... through 'P79' as qualifiers.
> > > >
> > > > I'm testing this on a single node cluster with HBase running in
> pseudo
> > > > distributed mode (no replication, no compression for HBase)...After
> > > > importing a CSV file with 150MB of size in HDFS(no replication), I
> > > checked
> > > > the the table size, and it shows ~900MB which is 6x times larger than
> > it
> > > is
> > > > in HDFS
> > > >
> > > > Why there is so large overhead on this? Am I doing anything wrong
> here?
> > > >
> > > > Thanks,
> > > >
> > > > Nick
> > > >
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or notificati...@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>


Re: HTable writeAsyncBuffer

2013-12-09 Thread Koert Kuipers
are there downsides if we were to add all these operations to the
writeAsyncBuffer? are there any usages of HTable.put that rely on it being
send off instead of being put in a buffer?


On Mon, Dec 9, 2013 at 5:38 PM, Stack  wrote:

> On Sat, Dec 7, 2013 at 8:52 AM, Koert Kuipers  wrote:
>
> > hey st.ack
> >
> > well i am considering creating lots of deletes from a map-reduce job
> > instead of puts, and was looking at the code to see how efficient that
> > would be...
> >
> >
>
> You  have been writing code for a while (smile)?
>
>
>
> > but now i am more generally wondering if there is any downside to making
> > all these operations go into the buffer instead of treating puts special.
> >
> >
> I'm not sure I understand the question.  If you are asking if doing mass
> individual deletes of cells is to be avoided, the answer is yes.  But maybe
> I have you wrong?
>
> St.Ack
>


Re: hbase rest query params for maxVersions and maxValues

2013-12-09 Thread Koert Kuipers
https://issues.apache.org/jira/browse/HBASE-10112


On Sun, Dec 8, 2013 at 1:07 AM, Ted Yu  wrote:

> Koert:
> Thanks for reporting this issue.
>
> Mind filing a JIRA ?
>
> Cheers
>
>
> On Sun, Dec 8, 2013 at 2:01 AM, Koert Kuipers  wrote:
>
> > i am trying to use maxValues with a "globbed" row resource in stargate.
> > from looking at the source code one has to do something like
> >
> > table/row/column(s)/timestamp(s)/?n=1
> >
> > (except the ?n=1 piece must be urlencoded)
> >
> > however i cannot get the n=1 piece to work. i get this stacktrace:
> >
> > Problem accessing
> > /some_table_name/93%2B002%2B*/cf:tx_CUST_NAME/1,13862892906600/%3Fn%3D1.
> > Reason:
> > String index out of range: 50Caused
> > by:java.lang.StringIndexOutOfBoundsException: String index out
> of
> > range: 50
> > at
> > java.lang.AbstractStringBuilder.charAt(AbstractStringBuilder.java:174)
> > at java.lang.StringBuilder.charAt(StringBuilder.java:55)
> > at
> > org.apache.hadoop.hbase.rest.RowSpec.parseQueryParams(RowSpec.java:260)
> > at org.apache.hadoop.hbase.rest.RowSpec.<init>(RowSpec.java:59)
> > at
> >
> org.apache.hadoop.hbase.rest.RowResource.<init>(RowResource.java:74)
> > at
> >
> >
> org.apache.hadoop.hbase.rest.TableResource.getRowResource(TableResource.java:90)
> >
> > the offending line is (260 in RowSpec):
> > c = query.charAt(i);
> >
> > i think this should be
> > c = query.charAt(j);
> >
> > same for line 248 (which handles the maxVersions)
> >
> > i have not been able to test this.
> >
>


hbase rest query params for maxVersions and maxValues

2013-12-07 Thread Koert Kuipers
i am trying to use maxValues with a "globbed" row resource in stargate.
from looking at the source code one has to do something like

table/row/column(s)/timestamp(s)/?n=1

(except the ?n=1 piece must be urlencoded)

however i cannot get the n=1 piece to work. i get this stacktrace:

Problem accessing
/some_table_name/93%2B002%2B*/cf:tx_CUST_NAME/1,13862892906600/%3Fn%3D1.
Reason:
String index out of range: 50Caused
by:java.lang.StringIndexOutOfBoundsException: String index out of
range: 50
at
java.lang.AbstractStringBuilder.charAt(AbstractStringBuilder.java:174)
at java.lang.StringBuilder.charAt(StringBuilder.java:55)
at
org.apache.hadoop.hbase.rest.RowSpec.parseQueryParams(RowSpec.java:260)
at org.apache.hadoop.hbase.rest.RowSpec.(RowSpec.java:59)
at
org.apache.hadoop.hbase.rest.RowResource.(RowResource.java:74)
at
org.apache.hadoop.hbase.rest.TableResource.getRowResource(TableResource.java:90)

the offending line is (260 in RowSpec):
c = query.charAt(i);

i think this should be
c = query.charAt(j);

same for line 248 (which handles the maxVersions)

i have not been able to test this.


Re: HTable writeAsyncBuffer

2013-12-07 Thread Koert Kuipers
hey st.ack

well i am considering creating lots of deletes from a map-reduce job
instead of puts, and was looking at the code to see how efficient that
would be...

but now i am more generally wondering if there is any downside to making
all these operations go into the buffer instead of treating puts special.


On Sat, Dec 7, 2013 at 8:40 AM, Stack  wrote:

> On Fri, Dec 6, 2013 at 3:06 PM, Koert Kuipers  wrote
>
>
> > i noticed that puts are put into a bugger (writeAsyncBuffer) that gets
> > flushed if it gets to a certain size.
> > writeAsyncBuffer can take objects of type Row, which includes besides the
> > Put also Deletes, Appends, and RowMutations.
> >
> > but when i look at the code for the delete method it does not use
> > writeAsyncBuffer. same for append and mutateRow methods. why do Puts get
> > buffered but other mutations do not? or did i misunderstand?
> >
>
>
> This is how it 'evolved'.  What are you thinking Koert?  We should probably
> be clearer in javadoc about the sequence in which these ops can go over to
> the server.
>
> Serverside, it doesn't care what is in the batch.  It will just work its
> way through the 'Rows' as they come in.
>
> St.Ack
>


HTable writeAsyncBuffer

2013-12-06 Thread Koert Kuipers
hello all,
i was just taking a look at HTable source code to get a bit more
understanding about hbase from a client perspective.
i noticed that puts are put into a bugger (writeAsyncBuffer) that gets
flushed if it gets to a certain size.
writeAsyncBuffer can take objects of type Row, which includes besides the
Put also Deletes, Appends, and RowMutations.

but when i look at the code for the delete method it does not use
writeAsyncBuffer. same for append and mutateRow methods. why do Puts get
buffered but other mutations do not? or did i misunderstand?

thanks! koert


Re: one column family but lots of tables

2013-08-24 Thread Koert Kuipers
thanks i think it makes sense. i will go through hbase architecture again
to make sure i fully understand the mapping to "regions" and "stores"


On Fri, Aug 23, 2013 at 10:33 AM, Michael Segel
wrote:

> I think the issue which a lot of people miss is why do you want to use a
> column family in the first place.
>
> Column families are part of the same table structure, and each family is
> kept separate.
>
> So in your design, do you have tables which are related, but are not
> always used at the same time?
>
> The example that I use when I teach about HBase or do a
> lecture/presentation is an Order Entry system.
> Here you have an order being entered, then you have  one or many pick
> slips being generated, same for shipping then there's the invoicing.
> All separate processes which relate back to the same order.
>
> So here it makes sense to use column families.
>
> Other areas could be metadata is surrounding a transaction. Again... few
> column families are tied together.
>
> Does this make sense?
>
>
> On Aug 23, 2013, at 12:35 AM, lars hofhansl  wrote:
>
> > You can think of it this way: Every region and column family is a
> "store" in HBase. Each store has a memstore and its own set of HFiles in
> HDFS.
> > The more stores you have, the more there is to manage.
> >
> > So you want to limit the number of stores. Also note that the word
> "Table" is somewhat a misnomer in HBase it should have better been called
> "Keyspace".
> > The extra caution for the number of column families per table stems from
> the fact that HBase flushes by region (which means all stores of that
> region are flushed). This in turn means that unless are column families
> hold roughly the same amount of data you end up with very lopsided
> distributions of HFile sizes.
> >
> > -- Lars
> >
> >
> >
> > 
> > From: Koert Kuipers 
> > To: user@hbase.apache.org; vrodio...@carrieriq.com
> > Sent: Thursday, August 22, 2013 12:30 PM
> > Subject: Re: one column family but lots of tables
> >
> >
> > if that is the case, how come people keep warning about limiting the
> number
> > of column families to only a handful (with more hbase performance will
> > degrade supposedly), yet there seems to be no similar warnings for number
> > of tables? see for example here:
> > http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/27616
> >
> > if a table means at least one column family then the number of tables
> > should also be kept to a minumum, no?
> >
> >
> >
> >
> > On Thu, Aug 22, 2013 at 1:58 PM, Vladimir Rodionov
> > wrote:
> >
> >> Nope. Column family is per table (its sub-directory inside your table
> >> directory in HDFS).
> >> If you have N tables you will always have , at least, N distinct CFs
> (even
> >> if they have the same name).
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: vrodio...@carrieriq.com
> >>
> >> 
> >> From: Koert Kuipers [ko...@tresata.com]
> >> Sent: Thursday, August 22, 2013 8:06 AM
> >> To: user@hbase.apache.org
> >> Subject: one column family but lots of tables
> >>
> >> i read in multiple places that i should try to limit the number of
> column
> >> families in hbase.
> >>
> >> do i understand it correctly that when i create lots of tables, but they
> >> all use the same column family (by name), that i am just using one
> column
> >> family and i am OK with respect to limiting number of column families ?
> >>
> >> thanks! koert
> >>
> >> Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> to be
> >> read only by the individual or entity to whom this message is
> addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> form,
> >> is strictly prohibited.  If you have received this message in error,
> please
> >> immediately notify the sender and/or notificati...@carrieriq.com and
> >> delete or destroy any copy of this message and its attachments.
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>


Re: one column family but lots of tables

2013-08-22 Thread Koert Kuipers
thanks thats helpful


On Thu, Aug 22, 2013 at 5:16 PM, Vladimir Rodionov
wrote:

> Yes, number of tables must be reasonable as well. Region Servers operates
> on 'regions' .
> Each Table can have multiple CFs, each CF can have multiple regions. The
> more regions you have per Region Server -
> the more data you will need to keep in memory, the more time it will take
> to recover  from Region Server failure, therefore
> there is a practical limit on number of regions per one Region Server
> (hundreds - not thousands). It is recommended
> to keep the number of regions per server much less than 1000 (say , 100-
> 500 max).
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodio...@carrieriq.com
>
> 
> From: Koert Kuipers [ko...@tresata.com]
> Sent: Thursday, August 22, 2013 12:30 PM
> To: user@hbase.apache.org; Vladimir Rodionov
> Subject: Re: one column family but lots of tables
>
> if that is the case, how come people keep warning about limiting the
> number of column families to only a handful (with more hbase performance
> will degrade supposedly), yet there seems to be no similar warnings for
> number of tables? see for example here:
> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/27616
>
> if a table means at least one column family then the number of tables
> should also be kept to a minumum, no?
>
>
>
>
> On Thu, Aug 22, 2013 at 1:58 PM, Vladimir Rodionov <
> vrodio...@carrieriq.com<mailto:vrodio...@carrieriq.com>> wrote:
> Nope. Column family is per table (its sub-directory inside your table
> directory in HDFS).
> If you have N tables you will always have , at least, N distinct CFs (even
> if they have the same name).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com<http://www.carrieriq.com>
> e-mail: vrodio...@carrieriq.com<mailto:vrodio...@carrieriq.com>
>
> 
> From: Koert Kuipers [ko...@tresata.com<mailto:ko...@tresata.com>]
> Sent: Thursday, August 22, 2013 8:06 AM
> To: user@hbase.apache.org<mailto:user@hbase.apache.org>
> Subject: one column family but lots of tables
>
> i read in multiple places that i should try to limit the number of column
> families in hbase.
>
> do i understand it correctly that when i create lots of tables, but they
> all use the same column family (by name), that i am just using one column
> family and i am OK with respect to limiting number of column families ?
>
> thanks! koert
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or notificati...@carrieriq.com notificati...@carrieriq.com> and delete or destroy any copy of this
> message and its attachments.
>
>


Re: one column family but lots of tables

2013-08-22 Thread Koert Kuipers
if that is the case, how come people keep warning about limiting the number
of column families to only a handful (with more hbase performance will
degrade supposedly), yet there seems to be no similar warnings for number
of tables? see for example here:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/27616

if a table means at least one column family then the number of tables
should also be kept to a minumum, no?




On Thu, Aug 22, 2013 at 1:58 PM, Vladimir Rodionov
wrote:

> Nope. Column family is per table (its sub-directory inside your table
> directory in HDFS).
> If you have N tables you will always have , at least, N distinct CFs (even
> if they have the same name).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodio...@carrieriq.com
>
> ________
> From: Koert Kuipers [ko...@tresata.com]
> Sent: Thursday, August 22, 2013 8:06 AM
> To: user@hbase.apache.org
> Subject: one column family but lots of tables
>
> i read in multiple places that i should try to limit the number of column
> families in hbase.
>
> do i understand it correctly that when i create lots of tables, but they
> all use the same column family (by name), that i am just using one column
> family and i am OK with respect to limiting number of column families ?
>
> thanks! koert
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or notificati...@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>


Re: one column family but lots of tables

2013-08-22 Thread Koert Kuipers
i suspect i might be misunderstanding some things. i thought column
families were related to packing things together physically. i do not yet
have any particular needs in that respect. and hearing about how hbase can
only have a few column families i figured i would just stick to 1 for now.

i do have many different datasets. to avoid namespace clashes i figured i
would put them in different tables. but now i suspect that approach was
wrong and i should use column qualifiers to keep them apart (within one
table and one column family).



On Thu, Aug 22, 2013 at 12:06 PM, Ted Yu  wrote:

> Roughly how many column families in total do you have ?
>
> Having many tables would make certain transactions impossible whereas
> putting related column families in the same table would allow.
>
> Cheers
>
>
> On Thu, Aug 22, 2013 at 8:06 AM, Koert Kuipers  wrote:
>
> > i read in multiple places that i should try to limit the number of column
> > families in hbase.
> >
> > do i understand it correctly that when i create lots of tables, but they
> > all use the same column family (by name), that i am just using one column
> > family and i am OK with respect to limiting number of column families ?
> >
> > thanks! koert
> >
>


one column family but lots of tables

2013-08-22 Thread Koert Kuipers
i read in multiple places that i should try to limit the number of column
families in hbase.

do i understand it correctly that when i create lots of tables, but they
all use the same column family (by name), that i am just using one column
family and i am OK with respect to limiting number of column families ?

thanks! koert