subject:"HBase and MapReduce"

Try to confirm my understanding of HBase and MapReduce behavior.

2020-06-05 Thread Brian Hsu

I'm trying to do some process on my HBase dataset in our company. But I'm
pretty new to the HBase and Hadoop ecosystem.

I would like to get some feedback from this community, to see if my
understanding of HBase and the MapReduce operation on it is correct.

Some backgrounds here:

1. We have a HBase table that is about 1TB, and exceeds 100 million records.2.
It has 3 region servers and each region server contains about 80 regions,
making the total region 240.3. The records in the table should be pretty
uniform distributed to each region, from what I know.

And what I'm trying to achieve is that I could filter out rows based on some
column values, and export those rows to HDFS filesystem or something like that.

For example, we have a column named "type" and it might contain value 1 or 2 or
3. I would like to have 3 distinct HDFS files (or directories, as data on HDFS
is partitioned) that have records of type 1, 2, 3 respectively.

>From what I can tell, MapReduce seems like a good approach to attack these
>kinds of problems.

I've done some research and experiment, and could get the result I want. But
I'm not sure if I understand the behavior of HBase TableMapper and Scan, yet
it's crucial for our code's performance, as our dataset is really large.

To simplify the issue, I would take the official RowCounter implementation[1]
as an example, and I would like to confirm my knowledge is correct.

[1]
https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java

So my questions about HBase with MapReduce is that:

1. In the simplest form of RowCounter (without any optional argument), it is
actually a full table scan. HBase iterates over all records in the table, and
emits the row to the map method in RowCounterMapper. Is this correct?

2. The TableMapper will divide the task based on how many regions we have in a
table. For example, if we have only 1 region in our HBase table, it will only
have 1 map task, and it effectively equals to a single thread, and does not
utilize any parallel processing of our hadoop cluster?

3. If the above is correct, is it possible that we could configure HBase to
spawn multiple tasks for a region? For example, when we do a RowCounter on a
table that only has 1 region, it still has 10 or 20 tasks, and counting the row
in parallel manner?
Since TableMapper also depends on Scan operation, I would also like to confirm
my understanding about the Scan operation and performance.

1. If I use setStartRow / setEndRow to limit the scope of my dataset, as rowkey
is indexed, it does not impact our performance, because it does not emit full
table scan.

2. In our case, we might need to filter our data based on their modified time.
In this case, we might use scan.setTimeRange() to limit the scope of our
dataset. My question is that since HBase does not index the timestamp, will
this scan become a full table scan, and does not have any advantage compared to
we just filter it by our MapReduce job itself?

Finally, actually we have some discussion on how we should do this export. And
we have the following two approaches, yet not sure which one is better.
1. Using the MapReduce approach described above. But we are not sure if the
parallelism will be bound by how many regions a table has. ie, the concurrency
never exceeds the region counts, and we could not improve our performance
unless we increase the region.

2. We maintain a rowkey list in a separate place (might be on HDFS), and we use
spark to read the file, then just get the record using a simple Get operation.
All the concurrency occurs on the spark / hadoop side.

I would like to have some suggestions about which solution is better from this
community, it will be really helpful. Thanks.

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Atif Khan


Thanks for the confirmation.  There is also a good/detailed discussion thread
on this issue found at 
http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html
http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html
.


Michael Segel-3 wrote:
> 
> It depends... There are some reasons to do this however in general, you
> don't need to do this... 
> 
> The course is wrong to suggest this as a best practice.
> 
> Sent from my iPhone
> 
> On Jun 5, 2012, at 5:00 PM, "Atif Khan" 
> wrote:
> 
>> 
>> During a recent Cloudera course we were told that it is "Best practice"
>> to
>> isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two
>> when
>> sharing the same HDFS cluster could lead to performance problems.  I am
>> not
>> sure if this is entirely true given the fact that the main concept behind
>> Hadoop is to export computation to the data and not import data to the
>> computation.  If I were to segregate HBase and MapReduce clusters, then
>> when
>> using MapReduce on HBase data would I not have to transfer large amounts
>> of
>> data from HBase/HDFS cluster to MapReduce/HDFS cluster?
>> 
>> Cloudera on their best practice page
>> (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the
>> following:
>> "Be careful when running mixed workloads on an HBase cluster. When you
>> have
>> SLAs on HBase access independent of any MapReduce jobs (for example, a
>> transformation in Pig and serving data from HBase) run them on separate
>> clusters. HBase is CPU and Memory intensive with sporadic large
>> sequential
>> I/O access while MapReduce jobs are primarily I/O bound with fixed memory
>> and sporadic CPU. Combined these can lead to unpredictable latencies for
>> HBase and CPU contention between the two. A shared cluster also requires
>> fewer task slots per node to accommodate for HBase CPU requirements
>> (generally half the slots on each node that you would allocate without
>> HBase). Also keep an eye on memory swap. If HBase starts to swap there is
>> a
>> good chance it will miss a heartbeat and get dropped from the cluster. On
>> a
>> busy cluster this may overload another region, causing it to swap and a
>> cascade of failures."
>> 
>> All my initial investigation/reading lead me believe that I should a
>> create
>> a common HDFS cluster and then I can run MapReduce and HBase against the
>> common HDFS cluster.   But from the above Cloudera best practice it seems
>> like I should create two HDFS clusters, one for MapReduce and one for
>> HBase
>> and then move data around when required.  Something does not make sense
>> with
>> this best practice recommendation.
>> 
>> Any thoughts and/or feedback will be much appreciated.
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33967219.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33973918.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Michael Segel

It depends... There are some reasons to do this however in general, you don't 
need to do this... 

The course is wrong to suggest this as a best practice.

Sent from my iPhone

On Jun 5, 2012, at 5:00 PM, "Atif Khan"  wrote:

> 
> During a recent Cloudera course we were told that it is "Best practice" to
> isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two when
> sharing the same HDFS cluster could lead to performance problems.  I am not
> sure if this is entirely true given the fact that the main concept behind
> Hadoop is to export computation to the data and not import data to the
> computation.  If I were to segregate HBase and MapReduce clusters, then when
> using MapReduce on HBase data would I not have to transfer large amounts of
> data from HBase/HDFS cluster to MapReduce/HDFS cluster?
> 
> Cloudera on their best practice page
> (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the
> following:
> "Be careful when running mixed workloads on an HBase cluster. When you have
> SLAs on HBase access independent of any MapReduce jobs (for example, a
> transformation in Pig and serving data from HBase) run them on separate
> clusters. HBase is CPU and Memory intensive with sporadic large sequential
> I/O access while MapReduce jobs are primarily I/O bound with fixed memory
> and sporadic CPU. Combined these can lead to unpredictable latencies for
> HBase and CPU contention between the two. A shared cluster also requires
> fewer task slots per node to accommodate for HBase CPU requirements
> (generally half the slots on each node that you would allocate without
> HBase). Also keep an eye on memory swap. If HBase starts to swap there is a
> good chance it will miss a heartbeat and get dropped from the cluster. On a
> busy cluster this may overload another region, causing it to swap and a
> cascade of failures."
> 
> All my initial investigation/reading lead me believe that I should a create
> a common HDFS cluster and then I can run MapReduce and HBase against the
> common HDFS cluster.   But from the above Cloudera best practice it seems
> like I should create two HDFS clusters, one for MapReduce and one for HBase
> and then move data around when required.  Something does not make sense with
> this best practice recommendation.
> 
> Any thoughts and/or feedback will be much appreciated.
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33967219.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Andrew Purtell

On Wed, Jun 6, 2012 at 2:20 AM, Amandeep Khurana  wrote:
>> These are general recommendations and definitely change based on the
>> access patterns and the way you will be using HBase and MapReduce. In
>> general, if you are building a latency sensitive application on top of
>> HBase, running a MapReduce job at the same time will impact performance due
>> to I/O contention. If your main access patterns is going to be running
>> MapReduce over HBase tables, you should absolutely consider collocating the
>> two frameworks.

Perhaps this exact point can be made (more clearly?) in the referenced
Cloudera documentation?

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Tim Robertson

Like Amandeep says, it really depends on the access patterns and jobs
running on the cluster.

We are using a single cluster for HBase and MR, with each node running DN,
TT and RS.
We have tried mixed clusters with only some running RS but you start to
suffer from data locality issues during scans.  Our primary access patterns
are a checkOrPut on HBase and a full scan over HBase.

To give you an idea of the impact of data locality for scan performance in
HBase, see the blog [1] I wrote on how we monitored scan performance.
 There is a 10x order of magnitude scanning our HBase tables when you don't
have data locality, and we clearly hit all network limits (the traffic
between scan clients running in mappers calling RS on other machines).  We
have not got our work to production yet, so it is possible we will see
issues when (e.g.) regions start to split, but we'll blog about it if it
comes up.  Whatever you do, we have found Ganglia absolutely critical to
understand what is happening on the cluster, and we use Puppet [2] so we
can quickly test different setups.

Cheers,
Tim

[1]
http://gbif.blogspot.dk/2012/05/optimizing-hbase-mapreduce-scans-for.html
[2] E.g. https://github.com/lfrancke/puppet-cdh but there are others needed
too



On Wed, Jun 6, 2012 at 2:20 AM, Amandeep Khurana  wrote:

> Atif,
>
> These are general recommendations and definitely change based on the
> access patterns and the way you will be using HBase and MapReduce. In
> general, if you are building a latency sensitive application on top of
> HBase, running a MapReduce job at the same time will impact performance due
> to I/O contention. If your main access patterns is going to be running
> MapReduce over HBase tables, you should absolutely consider collocating the
> two frameworks. Now, these recommendations might change based on the
> resources you have on your nodes (CPU, disk, memory).
>
> Having a single HDFS cluster and using some hosts for HBase and others for
> MapReduce only gets you a common storage fabric. It doesn't solve the
> problem of reading data into MapReduce tasks from remote hosts (region
> servers in this case) and is pretty much the same as having two separate
> clusters. In case of two separate clusters, you'll be running your
> MapReduce jobs to talk to a remote HBase instance. You don't have to export
> data out of that cluster manually onto the MapReduce cluster to run jobs on
> it.
>
> Hope that makes it clearer.
>
> -Amandeep
>
>
> On Tuesday, June 5, 2012 at 5:00 PM, Atif Khan wrote:
>
> >
> > During a recent Cloudera course we were told that it is "Best practice"
> to
> > isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two
> when
> > sharing the same HDFS cluster could lead to performance problems. I am
> not
> > sure if this is entirely true given the fact that the main concept behind
> > Hadoop is to export computation to the data and not import data to the
> > computation. If I were to segregate HBase and MapReduce clusters, then
> when
> > using MapReduce on HBase data would I not have to transfer large amounts
> of
> > data from HBase/HDFS cluster to MapReduce/HDFS cluster?
> >
> > Cloudera on their best practice page
> > (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the
> > following:
> > "Be careful when running mixed workloads on an HBase cluster. When you
> have
> > SLAs on HBase access independent of any MapReduce jobs (for example, a
> > transformation in Pig and serving data from HBase) run them on separate
> > clusters. HBase is CPU and Memory intensive with sporadic large
> sequential
> > I/O access while MapReduce jobs are primarily I/O bound with fixed memory
> > and sporadic CPU. Combined these can lead to unpredictable latencies for
> > HBase and CPU contention between the two. A shared cluster also requires
> > fewer task slots per node to accommodate for HBase CPU requirements
> > (generally half the slots on each node that you would allocate without
> > HBase). Also keep an eye on memory swap. If HBase starts to swap there
> is a
> > good chance it will miss a heartbeat and get dropped from the cluster.
> On a
> > busy cluster this may overload another region, causing it to swap and a
> > cascade of failures."
> >
> > All my initial investigation/reading lead me believe that I should a
> create
> > a common HDFS cluster and then I can run MapReduce and HBase against the
> > common HDFS cluster. But from the above Cloudera best practice it seems
> > like I should create two HDFS clusters, one for MapReduce and one for
> HBase
> > and then move data around when required. Something does not make sense
> with
> > this best practice recommendation.
> >
> > Any thoughts and/or feedback will be much appreciated.
> >
> > --
> > View this message in context:
> http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33967219.html
> > Sent from the HBase User mailing list archive at Nabble.com (
> http://Nabble.com).
> >
> >
>
>
>

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Amandeep Khurana

Atif,

These are general recommendations and definitely change based on the access 
patterns and the way you will be using HBase and MapReduce. In general, if you 
are building a latency sensitive application on top of HBase, running a 
MapReduce job at the same time will impact performance due to I/O contention. 
If your main access patterns is going to be running MapReduce over HBase 
tables, you should absolutely consider collocating the two frameworks. Now, 
these recommendations might change based on the resources you have on your 
nodes (CPU, disk, memory).

Having a single HDFS cluster and using some hosts for HBase and others for 
MapReduce only gets you a common storage fabric. It doesn't solve the problem 
of reading data into MapReduce tasks from remote hosts (region servers in this 
case) and is pretty much the same as having two separate clusters. In case of 
two separate clusters, you'll be running your MapReduce jobs to talk to a 
remote HBase instance. You don't have to export data out of that cluster 
manually onto the MapReduce cluster to run jobs on it.

Hope that makes it clearer.

-Amandeep 

On Tuesday, June 5, 2012 at 5:00 PM, Atif Khan wrote:

> 
> During a recent Cloudera course we were told that it is "Best practice" to
> isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two when
> sharing the same HDFS cluster could lead to performance problems. I am not
> sure if this is entirely true given the fact that the main concept behind
> Hadoop is to export computation to the data and not import data to the
> computation. If I were to segregate HBase and MapReduce clusters, then when
> using MapReduce on HBase data would I not have to transfer large amounts of
> data from HBase/HDFS cluster to MapReduce/HDFS cluster?
> 
> Cloudera on their best practice page
> (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the
> following:
> "Be careful when running mixed workloads on an HBase cluster. When you have
> SLAs on HBase access independent of any MapReduce jobs (for example, a
> transformation in Pig and serving data from HBase) run them on separate
> clusters. HBase is CPU and Memory intensive with sporadic large sequential
> I/O access while MapReduce jobs are primarily I/O bound with fixed memory
> and sporadic CPU. Combined these can lead to unpredictable latencies for
> HBase and CPU contention between the two. A shared cluster also requires
> fewer task slots per node to accommodate for HBase CPU requirements
> (generally half the slots on each node that you would allocate without
> HBase). Also keep an eye on memory swap. If HBase starts to swap there is a
> good chance it will miss a heartbeat and get dropped from the cluster. On a
> busy cluster this may overload another region, causing it to swap and a
> cascade of failures."
> 
> All my initial investigation/reading lead me believe that I should a create
> a common HDFS cluster and then I can run MapReduce and HBase against the
> common HDFS cluster. But from the above Cloudera best practice it seems
> like I should create two HDFS clusters, one for MapReduce and one for HBase
> and then move data around when required. Something does not make sense with
> this best practice recommendation.
> 
> Any thoughts and/or feedback will be much appreciated.
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33967219.html
> Sent from the HBase User mailing list archive at Nabble.com 
> (http://Nabble.com).
> 
>

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Paul Mackles

For our setup we went with 2 clusters. We call one our "hbase cluster" and
the other our "analytics cluster". For M/R jobs where hbase is the source
and/or sink we usually run the jobs on the "hbase cluster" and so far its
been fine (and you definitely want the data locality for these jobs). We
also export data from our "hbase cluster" to HDFS on the analytics cluster
for M/R jobs where we need to join with data that lives outside of hbase.
In my experience, you can run M/R jobs on the same cluster as hbase but
you need to limit the number of tasks that you run on that cluster to make
sure hbase gets its share of resources. For example, our nodes have 8
cores and we reserve 3 of them for hbase. On the analytics cluster we use
all of the cores for M/R tasks. Given the ad-hoc nature of our analytics
workload (lots of hive/pig queries), I sleep a lot better at night knowing
that no matter how bad a query someone comes up with, it won't take down
hbase since we keep it on a separate cluster.

On 6/5/12 8:00 PM, "Atif Khan"  wrote:

>
>During a recent Cloudera course we were told that it is "Best practice" to
>isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two
>when
>sharing the same HDFS cluster could lead to performance problems.  I am
>not
>sure if this is entirely true given the fact that the main concept behind
>Hadoop is to export computation to the data and not import data to the
>computation.  If I were to segregate HBase and MapReduce clusters, then
>when
>using MapReduce on HBase data would I not have to transfer large amounts
>of
>data from HBase/HDFS cluster to MapReduce/HDFS cluster?
>
>Cloudera on their best practice page
>(http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the
>following:
>"Be careful when running mixed workloads on an HBase cluster. When you
>have
>SLAs on HBase access independent of any MapReduce jobs (for example, a
>transformation in Pig and serving data from HBase) run them on separate
>clusters. HBase is CPU and Memory intensive with sporadic large sequential
>I/O access while MapReduce jobs are primarily I/O bound with fixed memory
>and sporadic CPU. Combined these can lead to unpredictable latencies for
>HBase and CPU contention between the two. A shared cluster also requires
>fewer task slots per node to accommodate for HBase CPU requirements
>(generally half the slots on each node that you would allocate without
>HBase). Also keep an eye on memory swap. If HBase starts to swap there is
>a
>good chance it will miss a heartbeat and get dropped from the cluster. On
>a
>busy cluster this may overload another region, causing it to swap and a
>cascade of failures."
>
>All my initial investigation/reading lead me believe that I should a
>create
>a common HDFS cluster and then I can run MapReduce and HBase against the
>common HDFS cluster.   But from the above Cloudera best practice it seems
>like I should create two HDFS clusters, one for MapReduce and one for
>HBase
>and then move data around when required.  Something does not make sense
>with
>this best practice recommendation.
>
>Any thoughts and/or feedback will be much appreciated.
>
>-- 
>View this message in context:
>http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp3396721
>9p33967219.html
>Sent from the HBase User mailing list archive at Nabble.com.
>

Shared Cluster between HBase and MapReduce

2012-06-05 Thread Atif Khan


During a recent Cloudera course we were told that it is "Best practice" to
isolate a MapReduce/HDFS cluster from an HBase/HDFS cluster as the two when
sharing the same HDFS cluster could lead to performance problems.  I am not
sure if this is entirely true given the fact that the main concept behind
Hadoop is to export computation to the data and not import data to the
computation.  If I were to segregate HBase and MapReduce clusters, then when
using MapReduce on HBase data would I not have to transfer large amounts of
data from HBase/HDFS cluster to MapReduce/HDFS cluster?

Cloudera on their best practice page
(http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) has the
following:
"Be careful when running mixed workloads on an HBase cluster. When you have
SLAs on HBase access independent of any MapReduce jobs (for example, a
transformation in Pig and serving data from HBase) run them on separate
clusters. HBase is CPU and Memory intensive with sporadic large sequential
I/O access while MapReduce jobs are primarily I/O bound with fixed memory
and sporadic CPU. Combined these can lead to unpredictable latencies for
HBase and CPU contention between the two. A shared cluster also requires
fewer task slots per node to accommodate for HBase CPU requirements
(generally half the slots on each node that you would allocate without
HBase). Also keep an eye on memory swap. If HBase starts to swap there is a
good chance it will miss a heartbeat and get dropped from the cluster. On a
busy cluster this may overload another region, causing it to swap and a
cascade of failures."

All my initial investigation/reading lead me believe that I should a create
a common HDFS cluster and then I can run MapReduce and HBase against the
common HDFS cluster.   But from the above Cloudera best practice it seems
like I should create two HDFS clusters, one for MapReduce and one for HBase
and then move data around when required.  Something does not make sense with
this best practice recommendation.

Any thoughts and/or feedback will be much appreciated.

-- 
View this message in context: 
http://old.nabble.com/Shared-Cluster-between-HBase-and-MapReduce-tp33967219p33967219.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase and MapReduce

2012-05-23 Thread Dave Revell

>
> 1. HBase guarantees data locality of store files and Regionserver only if
> it stays up for long. If there are too many region movements or the server
> has been recycled recently, there is a high probability that store file
> blocks are not local to the region server.  But the getSplits command
> always return the RegionServer of the StoreFile. So in this scenario,
> MapReduce loses its data locality?
>

It's impossible to get data locality in this case since mapreduce reads
from the regionserver, and the data is not local to the regionserver. The
data moves from datanode->regionserver->mapreduce. If the blocks are not
local to the regionserver, you cannot avoid using the network from
datanode->regionserver even if the regionserver->mapreduce step is local.


2. As the getSplits return only the RegionServer, the MR job is not aware
> of the multiple replicates of the StoreFile block. It only accesses one
> block (which is local if the point above is not applicable). This can
> constrain the MR processing as you cannot distribute the data processing
> in the best possible manner. Is this correct?
>

I think there's a misunderstanding. The mapreduce job does not read from
HDFS when using TableInputFormat. The mapreduce tasks use the HBase client
API to talk to a regionserver, and the *regionserver* reads from HDFS.

Also yes, the locality of data blocks to regionservers can be suboptimal,
and the locality of mapreduce tasks to regionservers can also be suboptimal.

3. A guess - since the MR processing goes through the RegionServer, it may
> impact the RegionServer performance for other random operations?
>

Yes, absolutely. Some people use separate HBase clusters for mapreduce
versus real-time traffic for this reason. You can also try to limit the
rate of data consumption by your mapreduce job by reducing the number of
map tasks, or sleeping for short periods in your mapper, or any other hack
that will slow your job down.

Good luck!
-Dave

HBase and MapReduce

2012-05-23 Thread Hemant Bhanawat

I have couple of questions related to MapReduce over HBase

 

1. HBase guarantees data locality of store files and Regionserver only if
it stays up for long. If there are too many region movements or the server
has been recycled recently, there is a high probability that store file
blocks are not local to the region server.  But the getSplits command
always return the RegionServer of the StoreFile. So in this scenario,
MapReduce loses its data locality? 

 

2. As the getSplits return only the RegionServer, the MR job is not aware
of the multiple replicates of the StoreFile block. It only accesses one
block (which is local if the point above is not applicable). This can
constrain the MR processing as you cannot distribute the data processing
in the best possible manner. Is this correct? 

 

3. A guess - since the MR processing goes through the RegionServer, it may
impact the RegionServer performance for other random operations? 

 

Thanks in advance,

Hemant

Try to confirm my understanding of HBase and MapReduce behavior.

Re: Shared Cluster between HBase and MapReduce

Re: Shared Cluster between HBase and MapReduce

Re: Shared Cluster between HBase and MapReduce

Re: Shared Cluster between HBase and MapReduce

Re: Shared Cluster between HBase and MapReduce

Re: Shared Cluster between HBase and MapReduce

Shared Cluster between HBase and MapReduce

Re: HBase and MapReduce

HBase and MapReduce

10 matches

Site Navigation

Mail list logo

Footer information