Hi Asaf,
This CDC pattern will be used for directing changes to another system,
Assume I have a table hbase_alarms in hbase with columns
Severity,Source,Time and tracking changes with this CDC tool. Some
external system is putting alarms with their severity and source to
hbase_alarms table .
Hi,
Just wanted to make sure if I read in the internet correctly: 0.96 will
support HBase RPC compression thus Replication between master and slave
will enjoy it as well (important since bandwidth between geographically
distant data centers is scarce and more expensive)
0.96 will support HBase RPC compression
Yes
Replication between master and slave
will enjoy it as well (important since bandwidth between geographically
distant data centers is scarce and more expensive)
But I can not see it is being utilized in replication. May be we can do
improvements in
What's your blockCacheHitCachingRatio ? It would tell you about the ratio
of scans requested from cache (default) to the scans actually served from
the block cache. You can get that from the RS web ui. What you are seeing
can almost map to anything, for example: is scanner caching (client side)
If RPC has compression abilities, how come Replication, which also works in
RPC does not get it automatically?
On Tue, Jun 4, 2013 at 12:34 PM, Anoop John anoop.hb...@gmail.com wrote:
0.96 will support HBase RPC compression
Yes
Replication between master and slave
will enjoy it as well
Hello,
I am using thrift thrift2 interfaces (thrift for DDL thrift2 for the
rest), my requests work with thrift but with thrift2 I got a error 400.
Here is my code (coffeescript) :
colValue = new types2.TColumnValue family: 'cf', qualifier:'col',
value:'yoo'
put = new
Can you check region server log around that time ?
Thanks
On Jun 4, 2013, at 8:37 AM, Simon Majou si...@majou.org wrote:
Hello,
I am using thrift thrift2 interfaces (thrift for DDL thrift2 for the
rest), my requests work with thrift but with thrift2 I got a error 400.
Here is my code
No logs there either (in fact no logs are written in any log file when I
execute the request)
Simon
On Tue, Jun 4, 2013 at 5:42 PM, Ted Yu yuzhih...@gmail.com wrote:
Can you check region server log around that time ?
Thanks
On Jun 4, 2013, at 8:37 AM, Simon Majou si...@majou.org wrote:
Hi,
In a HBASE table, there are 200 columns and the read pattern for diffferent
systems invols 70 columns...
In the above case, we cannot have 70 columns in the rowkey which will not
be a good design...
Can you please suggest how to handle this problem?
Also can we do indexing in HBASE apart
Replication doesn't need to know about compression at the RPC level so
it won't refer to it and as far as I can tell you need to set
compression only on the master cluster and the slave will figure it
out.
Looking at the code tho, I'm not sure it works the same way it used to
work before
Just a quick thought, why don't you create different tables and duplicate
data i.e. go for demoralization and data redundancy. Is your all read
access patterns that would require 70 columns are incorporated into one
application/client? Or it will be bunch of different clients/applications?
If that
Thanks Enis, I'll see if I can backport this patch - it is exactly what I was
going to try. This should solve my scan performance problems if I can get it to
work.
On May 29, 2013, at 1:29 PM, Enis Söztutar e...@hortonworks.com wrote:
Hi,
Regarding running raw scans on top of Hfiles, you
Hi,
The read pattern differs from each application..
Is the below approach fine?
Create one HBASE table with a unique rowkey and put all 200 columns into
it...
create mutiple small HBASE tables where it has the read access pattern
columns and the rowkey it is mapped to the master table...
Quick and dirty...
Create an inverted table for each index
Then you can take the intersection of the result set(s) to get your list of
rows for further filtering.
There is obviously more to this, but its the core idea...
Sent from a remote device. Please excuse any typos...
Mike Segel
Hi Michel,
If you don't mind can you please help explain in detail ...
Also can you pls let me know whether we have secondary index in HBASE?
regards,
Rams
On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel michael_se...@hotmail.comwrote:
Quick and dirty...
Create an inverted table for each
Rams - you might enjoy this blog post from HBase committer Jesse Yates (from
last summer):
http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html
Secondary Indexing doesn't exist in HBase core today, but there are various
proposals and early implementations of it in
Ok...
A little bit more detail...
First, its possible to store your data in multiple tables each with a different
key.
Not a good idea for some very obvious reasons
You could however create a secondary table which is an inverted table where the
rowkey of the index is the value in the
Hi,
We are relatively new to Hbase, and we are hitting a roadblock on our scan
performance. I searched through the email archives and applied a bunch of the
recommendations there, but they did not improve much. So, I am hoping I am
missing something which you could guide me towards. Thanks in
Finally fixed this, my code was at fault.
Protobufs require a builder object which was a (non static) protected object in
an abstract class all parsers extend. The mapper calls a parser factory
depending on the input record. Because we designed the parser instances as
singletons, the builder
hi, folks,
hbase 0.94.3
By reading several documents, I always have the impression that *
Replication* works at the table-*column*-*family level*. However, when I
am setting up a table with two columnfamilies and replicate them to two
different slavers, the whole table replicated. Is this a bug?
On Tue, Jun 4, 2013 at 9:58 PM, Rob Verkuylen r...@verkuylen.net wrote:
Finally fixed this, my code was at fault.
Protobufs require a builder object which was a (non static) protected
object in an abstract class all parsers extend. The mapper calls a parser
factory depending on the input
On Tue, Jun 4, 2013 at 6:48 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:
Replication doesn't need to know about compression at the RPC level so
it won't refer to it and as far as I can tell you need to set
compression only on the master cluster and the slave will figure it
out.
Looking
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
with an update in the meantime.
I tried a number of different approaches to eliminate latency and
bubbles in the scan pipeline, and eventually arrived at adding a
streaming scan API to the region server, along with
Hi,
I have a few small questions regarding HBase. I've searched the forum but
couldn't find clear answers hence asking them here:
1. Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that? I found this jira:
Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that
Yes it applies for Minor compactions.
Is there a way of configuring major compaction to compact only files
older than a certain time or to compress all the files except the latest
bq. I found this jira: https://issues.apache.org/jira/browse/HBASE-5199 but
I dont' know if the
compaction being talked about there is minor or major.
The optimization above applies to minor compaction selection.
Cheers
On Tue, Jun 4, 2013 at 7:15 PM, Pankaj Gupta pan...@brightroll.com
bq. But i am not very sure if we can control the files getting selected for
compaction in the older verisons.
Same mechanism is available in 0.94
Take a look
at src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
where you would find the following methods (and more):
On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com wrote:
Hi,
We are relatively new to Hbase, and we are hitting a roadblock on our scan
performance. I searched through the email archives and applied a bunch of
the recommendations there, but they did not improve much. So, I
Yes the replication can be specified at the CF level.. You have used
HCD#setScope() right?
S = '3', BLOCKSIZE = '65536'}, {*NAME = 'cf2', REPLICATION_SCOPE =
'2'*,
You set scope as 2?? You have to set one CF to be replicated to one cluster
and another to to another cluster. I dont think it is
Our row-keys do not contain time. By time-based scans, I mean, an MR over the
Hbase table where the scan object has no startRow or endRow but has a startTime
and endTime.
Our row key format is MD5 of UUID+UUID, so, we expect good distribution. We
have pre-split initially to prevent any initial
Thanks for the replies. I'll take a look at src/main/java/org/apache/
hadoop/hbase/coprocessor/BaseRegionObserver.java.
@ramkrishna: I do want to have bloom filter and block index all the time.
For good read performance they're critical in my workflow. The worry is
that when HBase is restarted it
4. This one is related to what I read in the HBase definitive guide
bloom filter section
Given a random row key you are looking for, it is very likely that this
key will fall in between two block start keys. The only way for HBase to
figure out if the key actually exists is by loading
If you will read HFile v2 document on HBase site you will understand
completely how the search for a record works and why there is linear search
in the block but binary search to get to the right block.
Also bear in mind the amount of keys in a blocks is not big since a block
in HFile by default
for the question whether you will be able to do a warm up for the bloom and
block cache i don't think it is possible now.
Regards
Ram
On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika asaf.mes...@gmail.com wrote:
If you will read HFile v2 document on HBase site you will understand
completely how
When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.
-Anoop-
On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran rahu...@yahoo.com wrote:
On Tuesday, June 4, 2013, Rahul Ravindran wrote:
Hi,
We are relatively new to Hbase, and we are hitting a roadblock on our scan
performance. I searched through the email archives and applied a bunch of
the recommendations there, but they did not improve much. So, I am hoping I
am missing
When you do the first read of this region, wouldn't this load all bloom
filters?
On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan
ramkrishna.s.vasude...@gmail.com wrote:
for the question whether you will be able to do a warm up for the bloom and
block cache i don't think it is possible
Thanks for that confirmation. This is what we hypothesized as well.
So, if we are dependent on timerange scans, we need to completely avoid major
compaction and depend only on minor compactions? Is there any downside? We do
have a TTL set on all the rows in the table.
~Rahul.
38 matches
Mail list logo