Hi all,
I am encountering a performance when building an application to scan data thru
Stargate (HBase Restful service).
Stargate asks two steps to fetch data from hbase table, the first step is to
put a scanner resource and then to get next thru another http request. My
observation is that it
Is there any tutorials available in the net to connect Spark Java API
with HBase ?
Thanks and Regards
Deepa
From: "Dai, Kevin"
To: "user@hbase.apache.org"
Date: 08/07/2014 11:11 AM
Subject:Guava version incompatible
Hi, all
I am now using spark to manipulate hbase. But
Hi, all
I am now using spark to manipulate hbase. But I cant't use HBaseTestingUtility
to do unit test. Because spark needs Guava 15.0 and above while Hbase needs
Guava 14.0.1. These two versions are incompatible. Is there any way to solve
this conflict with maven.
Thanks,
Kevin.
Hi TaeYun,
thanks for explain.
On Thu, Aug 7, 2014 at 12:50 PM, innowireless TaeYun Kim <
taeyun@innowireless.co.kr> wrote:
> Hi Qiang,
> thank you for your help.
>
> 1. Regarding HBASE-5416, I think it's purpose is simple.
>
> "Avoid loading column families that is irrelevant to filtering
Thank you Anoop.
Though it's a bit strange to include CF in the index, since all the block index
is contained in a HFile for a specific CF, I'm sure there would be a good
reason (maybe for the performance of the comparison).
Anyways it should be almost no issue since the length of the CF should
bq. no built-in filter intelligently determines which column family is
essential, except for SingleColumnValueFilter
Mostly right - don't forget about SingleColumnValueExcludeFilter which
extends SingleColumnValueFilter.
Cheers
On Wed, Aug 6, 2014 at 9:34 PM, innowireless TaeYun Kim <
taeyun...
Hi Qiang,
thank you for your help.
1. Regarding HBASE-5416, I think it's purpose is simple.
"Avoid loading column families that is irrelevant to filtering while scanning."
So, it can be applied to my 'dummy CF' case.
That is, a dummy CF can act like an 'relevant' CF to filtering, provided that
H
It will be the key of the KeyValue. Key includes
rk + cf + qualifier + ts + type.
So all these part of key. Your annswer#1 is correct (but with addition of
type also).. Hope this make it clear for you.
-Anoop-
On Tue, Aug 5, 2014 at 9:43 AM, innowireless TaeYun Kim <
taeyun@innowireless.c
Thank you Ted.
But RowFilter class has no method that can be uses to set which column family
is essential. (Actually no built-in filter class provides such a method)
So, if I (ever) want to apply the 'dummy' column family technique(?), it seems
that I must do as follows:
- Write my own filter
Am 06.08.2014 um 19:07 schrieb Andrew Purtell:
> We have no known vulnerabilities that equate to a SQL injection attack
> vulnerability. However, as Esteban says you'd want to treat HBase like any
> other datastore underpinning a production service and out of an abundance
> of caution deploy it i
bq. While scanning, an entire row will be read even for a rowkey filtering
If you specify essential column family in your filter, the above would not
be true - only the essential column family would be loaded into memory
first. Once the filter passes, the other family would be loaded.
Cheers
On
Hi,
the description of hbase-5416 stated why it was introduced, if you only
have 1 CF, dummy CF does not help. it is helpful for multi-CF case,
e.g. "putting
them in one column family. And "Non frequently" ones in another. "
bq. "Field name will be included in rowkey."
Please read the chapter 9 "A
You are just starting up a service and want the load split between multiple
region servers from the start, instead of waiting for the manual splitting.
Say you had 5 region servers, one way to create your table via HBase shell is
like this
create 'tablename', 'f', {NUMREGIONS => 5, SPLITALGO =>
We have no known vulnerabilities that equate to a SQL injection attack
vulnerability. However, as Esteban says you'd want to treat HBase like any
other datastore underpinning a production service and out of an abundance
of caution deploy it into a secure enclave behind an internal service API,
so r
bq. HBase will first check if the data exist in memstore, if not, it will
check the disk
For read path, don't forget block cache / bucket cache.
Cheers
On Wed, Aug 6, 2014 at 7:54 AM, yonghu wrote:
> I did not quite understand your problem. You store your data in HBase, and
> I guess later yo
I did not quite understand your problem. You store your data in HBase, and
I guess later you also will read data from it. Generally, HBase will first
check if the data exist in memstore, if not, it will check the disk. If you
set the memstore to 0, it denotes every read will directly forward to dis
Hi All,
I am trying to run JUnit for SortingCoprocessor(HBase-7474) in HBase0.98.
I am getting this error:
*14/08/06 07:06:09 ERROR namenode.FSNamesystem: FSNamesystem initialization
failed. org.apache.hadoop.metrics2.MetricsException: Metrics source
RetryCache/NameNodeRetryCache already exists!*
Hi Ted,
Now I finished reading the filtering section and the source code of
TestJoinedScanners(0.94).
Facts learned:
- While scanning, an entire row will be read even for a rowkey filtering.
(Since a rowkey is not a physically separate entity and stored in KeyValue
object, it's natural. Am I
Thanks John,
This is a very good answer, now I understand why you use manual split, thanks.
And I have a typo in my previous post,
The C is very close to A not to B-A/2. So every split in middle of key range
will result a big region and a small region. So very bad.
So HBase only do auto split
to be honest, we were doing manual splits for the main reason that we
wanted to make sure it was done on our schedule.
but it also occurred to me that the automatic splits, at least by default,
split the region in half. normally the idea is that both new halves
continue to grow, but with a sequent
Thanks Arun, and John,
Both of your scenarios make a lot of sense to me. But for the "sequence-based
key" case, I am still confused. It is like an append-only operation, so new
data are always written into the same region, but that region will eventually
reach the hbase.hregion.max.filesize and
i had a customer with a sequence-based key (yes, he knew all the downsides
for that). being able to split manually meant he could split a region that
got too big at the end vice right down the middle. with a sequentially
increasing key, splitting the region in half left one region half the
desired
22 matches
Mail list logo