[
https://issues.apache.org/jira/browse/HBASE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-2000:
----------------------------------
Assignee: Andrew Purtell
Fix Version/s: 0.21.0
Description:
>From Google's Jeff Dean, in a keynote to LADIS 2009
>(http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):
BigTable Coprocessors (New Since OSDI'06)
* Arbitrary code that runs run next to each tablet in table
** As tablets split and move, coprocessor code automatically splits/moves
too
* High-level call interface for clients
** Unlike RPC, calls addressed to rows or ranges of rows
* coprocessor client library resolves to actual locations
** Calls across multiple rows automatically split into multiple
parallelized RPCs
* Very flexible model for building distributed services
** Automatic scaling, load balancing, request routing for apps
Example Coprocessor Uses
* Scalable metadata management for Colossus (next gen GFS-like file system)
* Distributed language model serving for machine translation system
* Distributed query processing for full-text indexing support
* Regular expression search support for code repository
For HBase, adding a coprocessor framework will allow for pluggable incremental
addition of functionality. No more need to subclass the regionserver interface
and implementation classes and set {{hbase.regionserver.class}} and
{{hbase.regionserver.impl}} in hbase-site.xml. That mechanism allows for
extension but at the exclusion of all others.
Also in HBASE-2001 currently there is a in-process map reduce framework for the
regionservers. Coprocessors can optionally implement a 'MapReduce' interface
which clients will be able to invoke concurrently on all regions of the table.
Note this is not MapReduce on the table; this is MapReduce on each region,
concurrently. One can implement MapReduce in a manner very similar to Hadoop's
MR framework, or use shared variables to avoid the overhead of generating (and
processing) a lot of intermediates. An initial application of this could be
support for rapid calculation of aggregates over data stored in HBase.
was:
>From Google's Jeff Dean, in a keynote to LADIS 2009
>(http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):
BigTable Coprocessors (New Since OSDI'06)
* Arbitrary code that runs run next to each tablet in table
** As tablets split and move, coprocessor code automatically splits/moves
too
* High-level call interface for clients
** Unlike RPC, calls addressed to rows or ranges of rows
* coprocessor client library resolves to actual locations
** Calls across multiple rows automatically split into multiple
parallelized RPCs
* Very flexible model for building distributed services
** Automatic scaling, load balancing, request routing for apps
Example Coprocessor Uses
* Scalable metadata management for Colossus (next gen GFS-like file system)
* Distributed language model serving for machine translation system
* Distributed query processing for full-text indexing support
* Regular expression search support for code repository
> Coprocessors
> ------------
>
> Key: HBASE-2000
> URL: https://issues.apache.org/jira/browse/HBASE-2000
> Project: Hadoop HBase
> Issue Type: New Feature
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Fix For: 0.21.0
>
>
> From Google's Jeff Dean, in a keynote to LADIS 2009
> (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):
> BigTable Coprocessors (New Since OSDI'06)
> * Arbitrary code that runs run next to each tablet in table
> ** As tablets split and move, coprocessor code automatically splits/moves
> too
> * High-level call interface for clients
> ** Unlike RPC, calls addressed to rows or ranges of rows
> * coprocessor client library resolves to actual locations
> ** Calls across multiple rows automatically split into multiple
> parallelized RPCs
> * Very flexible model for building distributed services
> ** Automatic scaling, load balancing, request routing for apps
> Example Coprocessor Uses
> * Scalable metadata management for Colossus (next gen GFS-like file system)
> * Distributed language model serving for machine translation system
> * Distributed query processing for full-text indexing support
> * Regular expression search support for code repository
> For HBase, adding a coprocessor framework will allow for pluggable
> incremental addition of functionality. No more need to subclass the
> regionserver interface and implementation classes and set
> {{hbase.regionserver.class}} and {{hbase.regionserver.impl}} in
> hbase-site.xml. That mechanism allows for extension but at the exclusion of
> all others.
> Also in HBASE-2001 currently there is a in-process map reduce framework for
> the regionservers. Coprocessors can optionally implement a 'MapReduce'
> interface which clients will be able to invoke concurrently on all regions of
> the table. Note this is not MapReduce on the table; this is MapReduce on each
> region, concurrently. One can implement MapReduce in a manner very similar to
> Hadoop's MR framework, or use shared variables to avoid the overhead of
> generating (and processing) a lot of intermediates. An initial application of
> this could be support for rapid calculation of aggregates over data stored in
> HBase.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira