Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-13 Thread stack

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review210
---

Ship it!


+1

I did a quick skim.  All looks good.  This is a weird one in that to prewarm we 
are going to do a getClosestOrBefore (which we'll be doing anyway) and then 
we'll open scanner, and scan next 10 rows... then close scanner; i.e. extra 
rpcs inline w/ first lookup against tale.  This latter will actually slow down 
first lookup some but we can make it faster by making open scanner return 
results so we don't have to then go call next (already an issue to do this) and 
also, making it so we scan forward always by changing keys in .META. such that 
region names are keyed by endkey rather than startkey... also another issue to 
do this.

- stack


On 2010-06-13 00:49:50, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-13 00:49:50)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> 03cbf8d 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 8e47f2d 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f11ada7 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-13 Thread Mingjie Lai

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

(Updated 2010-06-13 00:49:50.977413)


Review request for hbase, Todd Lipcon and stack.


Changes
---

1) resolved conflicts from the latest code commits
2) added get/setRegionCachePrefetch() in HTable, removed 
isRegionCachePrefetchEnabled(), enableRegionCachePrefetch(), etc. 
3) in MetaScanner, added Math.min(rowLimit, configuration.getInt(...))


Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region 
cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the 
meta for the table, serialize the meta to a file. MR job can ship a copy of the 
meta for the table in the DistributedCache
- Map deserializeRegionInfo(): MR job can 
deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map regionMap): MR job can 
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some 
number of rows in META. This option could be turned on and off for one 
particular table. 


This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 03cbf8d 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 8e47f2d 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f11ada7 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
---

Unit tests passed locally for me. 


Thanks,

Mingjie



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-11 Thread Jonathan Gray


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > 
> >
> > I guess these are static because of how HTables all share a single HCM 
> > per conf.  The setting of prefetching is set at the HCM level not the 
> > HTable level, however clients are usually not exposed to HCM and only deal 
> > with HTable.
> > 
> > We should probably make it clear in the javadoc for these methods that 
> > they apply to all HTable instances, though that may be clear from being 
> > static.
> > 
> > Maybe since these are more advanced calls, they shouldn't be in HTable? 
> >  If we provide proper documentation, it should be easy enough for a user to 
> > grab the HCM and apply the config at that level?
> 
> Mingjie Lai wrote:
> > I guess these are static because of how HTables all share a single HCM 
> per conf...
> Yes. 
> 
> > Maybe since these are more advanced calls, they shouldn't be in HTable?
> Two alternatives:
> 1) HCM: as jgray said, ``however clients are usually not exposed to HCM 
> and only deal with HTable.''
> 2) HBaseAdmin: it is a more reasonable design choice since these 
> operation are at HCM level. 
> 3) or, make it a configuration. It would be one global configuration 
> applied to all tables, and cannot be changed dynamically. 
> 
> I like 2) better, but not really sure whether we want to expose it there 
> or not. 
> 
> What do you think?
> 
> Jonathan Gray wrote:
> Adding it to HBaseAdmin could make sense.  This one is a bit of an odd 
> one because it's a client-side configuration parameter done at the 
> per-client-jvm level.  Typically we have per-query or per-htable-instance 
> configs.  HBaseAdmin is generally made up of remote administration commands 
> not local client config.
> 
> If we provide sufficient javadoc (including as a class comment on HTable) 
> it doesn't matter so much where we put it.  Since it's distinct from what's 
> currently in HTable and HBaseAdmin, maybe it does make sense as a static in 
> HCM?
> 
> Todd Lipcon wrote:
> I think keeping HConnectionManager an internal interface is a good idea, 
> so kind of -0 there. -1 on HBaseAdmin, since we should keep that for 
> administrative functions that really change something on the cluster. So I'd 
> prefer HTable, but wouldn't cry over HCM.

Since we can knock it down to just two methods, get/set, let's just put it in 
HTable.

But it will be static, right, so people understand it's not per-instance.  
Let's also make sure there is javadoc that also explains that it is for all 
HTable instances for the tables you configure.


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
---


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-11 Thread Todd Lipcon


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > 
> >
> > I guess these are static because of how HTables all share a single HCM 
> > per conf.  The setting of prefetching is set at the HCM level not the 
> > HTable level, however clients are usually not exposed to HCM and only deal 
> > with HTable.
> > 
> > We should probably make it clear in the javadoc for these methods that 
> > they apply to all HTable instances, though that may be clear from being 
> > static.
> > 
> > Maybe since these are more advanced calls, they shouldn't be in HTable? 
> >  If we provide proper documentation, it should be easy enough for a user to 
> > grab the HCM and apply the config at that level?
> 
> Mingjie Lai wrote:
> > I guess these are static because of how HTables all share a single HCM 
> per conf...
> Yes. 
> 
> > Maybe since these are more advanced calls, they shouldn't be in HTable?
> Two alternatives:
> 1) HCM: as jgray said, ``however clients are usually not exposed to HCM 
> and only deal with HTable.''
> 2) HBaseAdmin: it is a more reasonable design choice since these 
> operation are at HCM level. 
> 3) or, make it a configuration. It would be one global configuration 
> applied to all tables, and cannot be changed dynamically. 
> 
> I like 2) better, but not really sure whether we want to expose it there 
> or not. 
> 
> What do you think?
> 
> Jonathan Gray wrote:
> Adding it to HBaseAdmin could make sense.  This one is a bit of an odd 
> one because it's a client-side configuration parameter done at the 
> per-client-jvm level.  Typically we have per-query or per-htable-instance 
> configs.  HBaseAdmin is generally made up of remote administration commands 
> not local client config.
> 
> If we provide sufficient javadoc (including as a class comment on HTable) 
> it doesn't matter so much where we put it.  Since it's distinct from what's 
> currently in HTable and HBaseAdmin, maybe it does make sense as a static in 
> HCM?

I think keeping HConnectionManager an internal interface is a good idea, so 
kind of -0 there. -1 on HBaseAdmin, since we should keep that for 
administrative functions that really change something on the cluster. So I'd 
prefer HTable, but wouldn't cry over HCM.


- Todd


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
---


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-11 Thread Jonathan Gray


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > 
> >
> > I guess these are static because of how HTables all share a single HCM 
> > per conf.  The setting of prefetching is set at the HCM level not the 
> > HTable level, however clients are usually not exposed to HCM and only deal 
> > with HTable.
> > 
> > We should probably make it clear in the javadoc for these methods that 
> > they apply to all HTable instances, though that may be clear from being 
> > static.
> > 
> > Maybe since these are more advanced calls, they shouldn't be in HTable? 
> >  If we provide proper documentation, it should be easy enough for a user to 
> > grab the HCM and apply the config at that level?
> 
> Mingjie Lai wrote:
> > I guess these are static because of how HTables all share a single HCM 
> per conf...
> Yes. 
> 
> > Maybe since these are more advanced calls, they shouldn't be in HTable?
> Two alternatives:
> 1) HCM: as jgray said, ``however clients are usually not exposed to HCM 
> and only deal with HTable.''
> 2) HBaseAdmin: it is a more reasonable design choice since these 
> operation are at HCM level. 
> 3) or, make it a configuration. It would be one global configuration 
> applied to all tables, and cannot be changed dynamically. 
> 
> I like 2) better, but not really sure whether we want to expose it there 
> or not. 
> 
> What do you think?

Adding it to HBaseAdmin could make sense.  This one is a bit of an odd one 
because it's a client-side configuration parameter done at the per-client-jvm 
level.  Typically we have per-query or per-htable-instance configs.  HBaseAdmin 
is generally made up of remote administration commands not local client config.

If we provide sufficient javadoc (including as a class comment on HTable) it 
doesn't matter so much where we put it.  Since it's distinct from what's 
currently in HTable and HBaseAdmin, maybe it does make sense as a static in HCM?


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
---


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-11 Thread Mingjie Lai


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 235
> > 
> >
> > +1 on collapsing with a boolean.  setRegionCachePrefetch(table, 
> > enable)?  Seems self descriptive and with a couple lines of javadoc should 
> > be clear.

yes sir. will modify to use set...() and get...(). 


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > 
> >
> > I guess these are static because of how HTables all share a single HCM 
> > per conf.  The setting of prefetching is set at the HCM level not the 
> > HTable level, however clients are usually not exposed to HCM and only deal 
> > with HTable.
> > 
> > We should probably make it clear in the javadoc for these methods that 
> > they apply to all HTable instances, though that may be clear from being 
> > static.
> > 
> > Maybe since these are more advanced calls, they shouldn't be in HTable? 
> >  If we provide proper documentation, it should be easy enough for a user to 
> > grab the HCM and apply the config at that level?

> I guess these are static because of how HTables all share a single HCM per 
> conf...
Yes. 

> Maybe since these are more advanced calls, they shouldn't be in HTable?
Two alternatives:
1) HCM: as jgray said, ``however clients are usually not exposed to HCM and 
only deal with HTable.''
2) HBaseAdmin: it is a more reasonable design choice since these operation are 
at HCM level. 
3) or, make it a configuration. It would be one global configuration applied to 
all tables, and cannot be changed dynamically. 

I like 2) better, but not really sure whether we want to expose it there or 
not. 

What do you think? 


- Mingjie


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
---


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-10 Thread Todd Lipcon


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 109
> > 
> >
> > Hmm, wouldn't that be Math.min then?  If rowLimit = 5 and 
> > scanner.caching = 100, we want to only do 5?

woops, yes :)


- Todd


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
---


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-10 Thread Jonathan Gray

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
---


This looks great!  I think we should think more about where to expose this, and 
also think about using more of a get/set API to reduce the method calls and 
make it look more like the other client APIs.


src/main/java/org/apache/hadoop/hbase/client/HConnection.java


+1 on collapsing with a boolean.  setRegionCachePrefetch(table, enable)?  
Seems self descriptive and with a couple lines of javadoc should be clear.



src/main/java/org/apache/hadoop/hbase/client/HTable.java


I guess these are static because of how HTables all share a single HCM per 
conf.  The setting of prefetching is set at the HCM level not the HTable level, 
however clients are usually not exposed to HCM and only deal with HTable.

We should probably make it clear in the javadoc for these methods that they 
apply to all HTable instances, though that may be clear from being static.

Maybe since these are more advanced calls, they shouldn't be in HTable?  If 
we provide proper documentation, it should be easy enough for a user to grab 
the HCM and apply the config at that level?



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Should we also make these methods (if we even leave it in HTable) more like 
setRegionCachePrefetch / getRegionCachePrefetch?  HTable is the core client 
class so the less noise we add the better.  We have other methods in client 
APIs of the get/set form like Scan.setCacheBlocks and Put.setWriteToWAL



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


Hmm, wouldn't that be Math.min then?  If rowLimit = 5 and scanner.caching = 
100, we want to only do 5?


- Jonathan


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-09 Thread Todd Lipcon

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review165
---


Looking good! Just a few notes.


src/main/java/org/apache/hadoop/hbase/client/HConnection.java


I thought we were collapsing these two calls into 
setRegionCachePrefetchEnabled(tableName, enabled)?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


I don't entirely understand why we key these hashes by integer, but it 
seems like you're following the status quo, so doesn't need to be addressed in 
this patch.



src/main/java/org/apache/hadoop/hbase/client/HTable.java


I still don't quite understand the logic about why these should be static. 
Previously you pointed to the enable/disable calls, but those are more like 
admin calls, not calls that affect client behavior. Anyone else have an opinion?



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


I think this should be Math.max(rowLimit, configuration.getInt(...)) - if 
we only want to scan 5 rows, we don't want the scanner to prefetch 100 for us.


- Todd


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-09 Thread Mingjie Lai

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

(Updated 2010-06-09 15:50:59.084657)


Review request for hbase, Todd Lipcon and stack.


Changes
---

@St^ack: please see my comments for your feedback regarding getRowOrBefore() 
issue. 

Invite Todd as a reviewer. 


Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region 
cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the 
meta for the table, serialize the meta to a file. MR job can ship a copy of the 
meta for the table in the DistributedCache
- Map deserializeRegionInfo(): MR job can 
deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map regionMap): MR job can 
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some 
number of rows in META. This option could be turned on and off for one 
particular table. 


This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468


Diffs
-

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
---

Unit tests passed locally for me. 


Thanks,

Mingjie



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-09 Thread Mingjie Lai


> On 2010-06-07 14:23:42, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 96
> > 
> >
> > getRowOrBefore is an expensive call.  Are we sure we are not calling 
> > this too often?

I agree it is an expensive call. 

However I don't think it would bring any performance penalty for existing and 
potential use cases:
Use case 1 -- existing MetaScanner users: since this method is newly added, 
existing users won't be affected; 
Use case 2 -- hbase clients when locating a region :  
1) if prefetch is on, it calls this MetaScanner with [table + row combination], 
which calls getRowOrBefore() to get current region info, then number of 
following regions from meta. After that, the client can get the region info 
directly from cache. 
2) if prefetch is disabled (current behavior), it eventually calls similar 
method getClosestRowBefore() to get desired region. 

So no matter prefetch is on or not, getRowOrBefore(or getClosestRowBefore) 
eventually is called. The only difference is whether to scan following regions 
from meta or not. 

For future MetaScanner users which scan from one region with desired use table 
row, it has to take the effort since it is the expected behavior. 


- Mingjie


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review144
---


On 2010-06-02 01:13:42, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-02 01:13:42)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-08 Thread Paul Cowan

On 09/06/10 15:56, tsuna wrote:

Oh, I see.  But that's because the regionCachePrefetchEnabled method
was unnecessarily changing the Map.  If the "read-only" methods like
this one weren't modifying any data structure (and they don't need
to), this problem wouldn't exist.


Yep, exactly. None of the other 3 ways need additional synchronization, 
it's just the one specific way it was implemented (hence my 'won't work, 
as it's written' originally).



Also even though ConcurrentHashSet doesn't exist, it can be
implemented by storing some random, unique Object instance as the
value.  I think that's how HashSet is implemented:


Yeah, it's very easily implemented. In fact, there's a utility method to 
do this easily (which I only just learned about a little while ago, 
after sending the original email -- there's always something new to learn!):


  Collections.newSetFromMap(new ConcurrentHashMap());

which would work correctly in this instance. As I said though, if it's a 
write-rarely, read-frequently, low-number-of-entries scenario then 
CopyOnWriteArraySet will likely work just as well, if not better.


Cheers,

Paul



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-08 Thread tsuna
On Tue, Jun 8, 2010 at 5:59 PM, Paul Cowan  wrote:
> No problem. With that change, there would be a potential race condition:
>
> Thread 1                                Thread 2
> --                              -
> enter regionCachePrefetchEnabled
> call ConcurrentMap.get() - get null
>                                        enter disableRegionCachePrefetch
>                                        ConcurrentMap.put(false);
>                                        exit disableRegionCachePrefetch
> ConcurrentMap.put(true)
> exit regionCachePrefetchEnabled
>
>
> which means that after one (nominally 'read-only') call to
> regionCachePrefetchEnabled and one call to disableRegionCachePrefetch have
> completed, the value in the cache for the table is true, when it really
> should be false.

Oh, I see.  But that's because the regionCachePrefetchEnabled method
was unnecessarily changing the Map.  If the "read-only" methods like
this one weren't modifying any data structure (and they don't need
to), this problem wouldn't exist.

Also even though ConcurrentHashSet doesn't exist, it can be
implemented by storing some random, unique Object instance as the
value.  I think that's how HashSet is implemented:

public class HashSet [...] {
[...]
private transient HashMap map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

[...]
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}



-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-08 Thread Paul Cowan

Hi tsuna,


Changing this class to use ConcurrentHashMap, and completely removing
synchronization, won't work, as it's written. The lazy-loading in
regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
a ConcurrentHashMap without external synchronization.


I'm not sure to understand why, can you explain?


No problem. With that change, there would be a potential race condition:

Thread 1Thread 2
--  -
enter regionCachePrefetchEnabled
call ConcurrentMap.get() - get null
enter disableRegionCachePrefetch
ConcurrentMap.put(false);
exit disableRegionCachePrefetch
ConcurrentMap.put(true)
exit regionCachePrefetchEnabled


which means that after one (nominally 'read-only') call to 
regionCachePrefetchEnabled and one call to disableRegionCachePrefetch 
have completed, the value in the cache for the table is true, when it 
really should be false.


The get-nullcheck-put triad really needs to be atomic, which requires 
either external synchronization or use of some of the other features of 
ConcurrentHashMap like below.



Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
every time it needs to read a value" "Unfortunately, this usage is
very common" so I think this is not a good idea.

I'm still fairly new to Java so I tend to trust what people like
Joshua Bloch write, but I can be convinced otherwise :)


That's purely for performance reasons -- you're right, I should have 
suggested

   get, if (result == null) {putIfAbsent, get}
rather than
   putIfAbsent, get

Both are correct but as Josh talks about on that slide (and the 
subsequent one) it's needless contention. I would be almost certain that 
it's still faster than a synchronized block, but not as much as other 
options. Should have mentioned that.


Of the 4 options I mentioned (putIfAbsent(), get() and default return 
value if missing, CopyOnWriteArraySet, ConcurrentHashSet), the 
putIfAbsent() method is likely the slowest (given the usage pattern 
anyway), and (more importantly, in my opinion) the least clear anyway. I 
think the ConcurrentSet solution is much cleaner.


Hope that makes thing clear.

Cheers,

Paul


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-08 Thread tsuna
On Fri, May 28, 2010 at 8:21 PM, Paul Cowan  wrote:
> Changing this class to use ConcurrentHashMap, and completely removing
> synchronization, won't work, as it's written. The lazy-loading in
> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
> a ConcurrentHashMap without external synchronization.

I'm not sure to understand why, can you explain?

> 1) Replace with a ConcurrentHashMap, and change regionCachePreFetchEnabled()
> to do something like:
>       prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
>       return prefetchRegionCacheEnabled.get(tableName);

Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
every time it needs to read a value" "Unfortunately, this usage is
very common" so I think this is not a good idea.

I'm still fairly new to Java so I tend to trust what people like
Joshua Bloch write, but I can be convinced otherwise :)

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-07 Thread stack

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review144
---

Ship it!


I think this good to go.  Seem my comments below.  See what you think.  My one 
concern is the number of calls to getRowOrBefore... hopefully this patch cuts 
down overall on our need to use this function.  I'd like to hear your opinion 
on that.


src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


This is code duplicated from elsewhere.  Can I help make it so we don't 
have to do this duplication?  Or, for now, since this your fist patch, we can 
put it off IF you file a JIRA to fix the duplication (smile).



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


So we start scanning at 'row'?  Is this the 'row' the user asked for? No, 
it needs to be the row in the .META. table, right?  We need to find the row in 
.META. that contains the asked for row first?  NM, I see below how the row here 
is made.. .this looks right.



src/main/java/org/apache/hadoop/hbase/client/HTable.java


This is a nice little facility.



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


OK.  This looks right.



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


getRowOrBefore is an expensive call.  Are we sure we are not calling this 
too often?


- stack


On 2010-06-02 01:13:42, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-06-02 01:13:42)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-02 Thread Mingjie Lai

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

(Updated 2010-06-02 01:13:42.272750)


Review request for hbase.


Changes
---

1. Fix some issues pointed out by Todd.
2. Re-do MetaScanner so it will start scanning Meta from the region where the 
given user table row resides, instead of scanning from the next region row in 
Meta.


Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region 
cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the 
meta for the table, serialize the meta to a file. MR job can ship a copy of the 
meta for the table in the DistributedCache
- Map deserializeRegionInfo(): MR job can 
deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map regionMap): MR job can 
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some 
number of rows in META. This option could be turned on and off for one 
particular table. 


This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
---

Unit tests passed locally for me. 


Thanks,

Mingjie



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-06-02 Thread Mingjie Lai


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1075
> > 
> >
> > I feel like it would be cleaner to have the following methods be 
> > non-static, so they don't need any arguments. that would reduce the number 
> > of functions by factor of two

That was what I wanted to do in the very beginning. I don't like so many 
functions either.  

But there is one existing method: 
  public static boolean isTableEnabled(String tableName) ...
  public static boolean isTableEnabled(Configuration conf, String tableName) ...

It's one of the reason that I just want to keep the original coding style to be 
consistent. 

In addition, I agree we can make isRegionCachePrefetchEnabled() etc. to be 
non-static. However, enableRegionCachePrefetch(), and 
disableRegionCachePrefetch() must to be static since we want to enable/disable 
a table without instantiate HTable. In another word, we cannot really 
dis/enable cache prefetch for each HTable instance who have the same table 
name. While we can only enable/disable based on table name. It is pretty 
similar as table enable/disable. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 
> > 1608
> > 
> >
> > kill this function

I don't like it either. I will kill all ``is...Disabled()'' methods. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java, line 
> > 3639
> > 
> >
> > I really like these unit tests, but I think you should use a row key 
> > for the Get that isn't also a region start key, as it may expose different 
> > behavior. eg I think you will end up with 11 cached regions instead of 10

As mentioned above, I reimplemented MetaScanner so it will start scanning Meta 
from the region row where the given table row resides, instead of scanning from 
the next region row in Meta. HTable.getRowOrBefore() is called in MetaScanner 
to achieve it, (however I'm not very sure whether it is the most efficient way 
to do it or not). 

So for this unit test, no matter the passed row is a region start key or not, 
we will always get 10 here. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, lines 
> > 776-781
> > 
> >
> > this block should go after the cache check below, right?

I reimplemented MetaScanner so it will start scanning Meta from the region 
where the given user table row resides, instead of scanning from the next 
region row in Meta.

In this case, prefetchRegionCache() also fetches the queried table+row to the 
cache. Here I put it before the cache check block, so it can load the result 
from cache directly. Otherwise it may do an extra meta scan if cache prefetch 
is enabled. 

However if multiple threads accessing this block concurrently, this way will 
cause cache-prefetch executed twice. But I think this case is pretty rare, so 
the penalty should be relatively smaller. 

What do you think? 


- Mingjie


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review107
---


On 2010-05-31 19:49:20, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-05-31 19:49:20)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> 09de2ac 
>   src/main/

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-31 Thread Todd Lipcon

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review107
---



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


typo: locationS



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


I think any of getCachedRegionCount, countCachedRegions, or 
getNumCachedRegions would be a clearer name.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


style-wise, why "final" here?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


remove these @param javadocs - they just take up space if the param names 
are self-explanatory (which these are)

same thing above



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


this needs a Comparator argument, otherwise it does object identity rather 
than bytewise comparison of the table names



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


javadoc out of date - it prefetches the region for the given row, and 
prefetchLimit regions ahead



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


should return false to stop scanning at this point, right?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


// cache this meta entry

(it's not caching all)



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


this block should go after the cache check below, right?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


this function needs to synchronized on cachedRegionLocations which is an 
unsynchronized HashMap



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


return location != null;



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


kill this function



src/main/java/org/apache/hadoop/hbase/client/HTable.java


you should prefix the file with writeInt(allRegions.size()) so you don't 
have to use EOFException catching below



src/main/java/org/apache/hadoop/hbase/client/HTable.java


yuck, see above



src/main/java/org/apache/hadoop/hbase/client/HTable.java


I feel like it would be cleaner to have the following methods be 
non-static, so they don't need any arguments. that would reduce the number of 
functions by factor of two



src/main/java/org/apache/hadoop/hbase/client/HTable.java


incorrect javadoc, also a few places below



src/main/java/org/apache/hadoop/hbase/client/HTable.java


why do we need a separate function for enabled and disabled? aren't they 
always inverse of each other?



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java


should make clear this is the row name in the user table, not the meta 
table.

should also be clarified that it will start scanning with the region 
*after* row, because it doesn't use getClosestRowBefore



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


useless @throws



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


you should use util.getTestDir here



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


import java.io.File



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


I really like these unit tests, but I think you should use a row key for 
the Get that isn't also a region start key, as it may expose different 
behavior. eg I think you will end up with 11 cached regions instead of 10


- Todd


On 2010-05-31 19:49:20, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-05-31 19:49:20)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on 

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-31 Thread Mingjie Lai

Paul.

> [Commenting on-list, not sure if I should just sign up + put these
> comments direct on reviews?]

Yes. You're supposed to sign in and write comment directly to 
http://review.hbase.org


> So assuming that
> disabling/enabling prefetch for a region is rare (both in terms of
> frequency, and in terms of a small number of regions being disabled),
> the best performance will probably come from something like
> private final Set prefetchDisabledRegions
> = new CopyOnWriteArraySet()

Excellent suggestion. It's the exact use case. I will follow your 
suggestion and use CopyOnWriteArraySet instead of HashMap.


Thanks,
Mingjie

On 05/28/2010 08:21 PM, Paul Cowan wrote:

[Commenting on-list, not sure if I should just sign up + put these
comments direct on reviews?]

On 29/05/2010 7:59 AM, Ryan Rawson wrote:
 > Belay that order, I need to inspect this code review... unfortunately
a concurrent collection is not the answer to all multi threading woes.

Changing this class to use ConcurrentHashMap, and completely removing
synchronization, won't work, as it's written. The lazy-loading in
regionCachePreFetchEnabled() (get - if null, put) won't work correctly
with a ConcurrentHashMap without external synchronization.

As it is, the class is thread-safe in the way it accesses the HashMap.
To remove the synchronization (if desired), you could:

1) Replace with a ConcurrentHashMap, and change
regionCachePreFetchEnabled() to do something like:
prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
return prefetchRegionCacheEnabled.get(tableName);
Alternatively, remove the 'true' -- that's the default, right? So there
isn't actually any reason to put true values in the map if there's no
value otherwise. Then just use:
Boolean result = this.prefetchRegionCacheEnabled.get(tableName);
return result == null ? Boolean.TRUE : result;
(and it's simpler, right?)


2) Again, given that 'true' is the default value, there's probably
actually no reason to use a Map at all. All you want to know is which
ones have specifically had fetching disabled, right? So assuming that
disabling/enabling prefetch for a region is rare (both in terms of
frequency, and in terms of a small number of regions being disabled),
the best performance will probably come from something like
private final Set prefetchDisabledRegions
= new CopyOnWriteArraySet()
then disable is just:
prefetchDisabledRegions.add(name);
enable is:
prefetchDisabledRegions.remove(name);
and isEnabled is just
prefetchDisabledRegions.contains(name);
which is probably going to be quicker if disabling is done only for a
few regions (so linear walking of the underlying array is quicker than
hashing + probing) -- and remember, scans of CopyOnWriteArraySet are
completely lock-free. Also, it's cleaner IMHO...


3) Use the solution from #2 with a good ConcurrentHashSet (there isn't
one in the JDK, unfortunately, but maybe you have one available), which
is more useful if large numbers of regions will be disabled or if
regions will be toggled on/off frequently.

Cheers,

Paul

PS: Minor suggestion: this class is inconsistent with variable/method
names as to whether 'prefetch' is one camel-cased word (prefetch) or two
(preFetch). Just thought I'd mention it!



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-31 Thread Mingjie Lai

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

(Updated 2010-05-31 19:49:20.012863)


Review request for hbase.


Changes
---

Improved to address comments from tsuna, Paul Crown, and jdcryans. 


Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region 
cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the 
meta for the table, serialize the meta to a file. MR job can ship a copy of the 
meta for the table in the DistributedCache
- Map deserializeRegionInfo(): MR job can 
deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map regionMap): MR job can 
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some 
number of rows in META. This option could be turned on and off for one 
particular table. 


This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
---

Unit tests passed locally for me. 


Thanks,

Mingjie



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-31 Thread Mingjie Lai


> On 2010-05-28 15:05:08, Ryan Rawson wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 
> > 1604
> > 
> >
> > What does this method actually do? If its a predicate start it with 
> > 'is'.  Also precaching should be off by default...
> >  Is it?

Will add ``is'' prefix to this method. 

Pre-caching (in case of a cache miss) is set to ``on'' by default right now, 
according to stack:

``I'm for default behavior being grabbing more than just the one row – ten or 
something. Regards full-table scan as default, I think it should be an option 
(Its a nice option to have). ''

https://issues.apache.org/jira/browse/HBASE-2468?focusedCommentId=12870321&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12870321


- Mingjie


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review95
---


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-28 Thread Paul Cowan
[Commenting on-list, not sure if I should just sign up + put these 
comments direct on reviews?]


On 29/05/2010 7:59 AM, Ryan Rawson wrote:
> Belay that order, I need to inspect this code review... unfortunately 
a concurrent collection is not the answer to all multi threading woes.


Changing this class to use ConcurrentHashMap, and completely removing 
synchronization, won't work, as it's written. The lazy-loading in 
regionCachePreFetchEnabled() (get - if null, put) won't work correctly 
with a ConcurrentHashMap without external synchronization.


As it is, the class is thread-safe in the way it accesses the HashMap. 
To remove the synchronization (if desired), you could:


1) Replace with a ConcurrentHashMap, and change 
regionCachePreFetchEnabled() to do something like:

   prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
   return prefetchRegionCacheEnabled.get(tableName);
Alternatively, remove the 'true' -- that's the default, right? So there 
isn't actually any reason to put true values in the map if there's no 
value otherwise. Then just use:

   Boolean result = this.prefetchRegionCacheEnabled.get(tableName);
   return result == null ? Boolean.TRUE : result;
(and it's simpler, right?)


2) Again, given that 'true' is the default value, there's probably 
actually no reason to use a Map at all. All you want to know is which 
ones have specifically had fetching disabled, right? So assuming that 
disabling/enabling prefetch for a region is rare (both in terms of 
frequency, and in terms of a small number of regions being disabled), 
the best performance will probably come from something like

  private final Set prefetchDisabledRegions
= new CopyOnWriteArraySet()
then disable is just:
  prefetchDisabledRegions.add(name);
enable is:
  prefetchDisabledRegions.remove(name);
and isEnabled is just
  prefetchDisabledRegions.contains(name);
which is probably going to be quicker if disabling is done only for a 
few regions (so linear walking of the underlying array is quicker than 
hashing + probing) -- and remember, scans of CopyOnWriteArraySet are 
completely lock-free. Also, it's cleaner IMHO...



3) Use the solution from #2 with a good ConcurrentHashSet (there isn't 
one in the JDK, unfortunately, but maybe you have one available), which 
is more useful if large numbers of regions will be disabled or if 
regions will be toggled on/off frequently.


Cheers,

Paul

PS: Minor suggestion: this class is inconsistent with variable/method 
names as to whether 'prefetch' is one camel-cased word (prefetch) or two 
(preFetch). Just thought I'd mention it!


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-28 Thread Ryan Rawson

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review95
---



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


What does this method actually do? If its a predicate start it with 'is'.  
Also precaching should be off by default...
 Is it?


- Ryan


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-28 Thread Ryan Rawson


> On 2010-05-28 14:13:14, Benoit Sigoure wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 
> > 306
> > 
> >
> > Please use a concurrent collection and remove the synchronized blocks.  
> > For guidance, see around slide 30 of this presentation:
> > http://www.cs.umd.edu/class/fall2009/cmsc132H/slides/still-effective.pdf

Belay that order, I need to inspect this code review... unfortunately a 
concurrent collection is not the answer to all multi threading woes.


> On 2010-05-28 14:13:14, Benoit Sigoure wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 
> > 1594
> > 
> >
> > Use `Boolean.TRUE' instead of `new Boolean(true)'.

Please use the autoboxing features of java6 (our target platform). Ie: just 
'true' and 'false' for this and all others.


- Ryan


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review93
---


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> ---
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>



Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-28 Thread Benoit Sigoure

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review93
---



src/main/java/org/apache/hadoop/hbase/client/HConnection.java


Properly document this argument.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Please use a concurrent collection and remove the synchronized blocks.  For 
guidance, see around slide 30 of this presentation:
http://www.cs.umd.edu/class/fall2009/cmsc132H/slides/still-effective.pdf



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Coding style: put the catch on the previous line.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


We don't typically call methods using `this.methodname(args)' – remove the 
`this.'



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Add a space before the `:'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Use `Boolean.TRUE' instead of `new Boolean(true)'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Use `Boolean.FALSE' instead of `new Boolean(false)'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Use `Boolean.TRUE' instead of `new Boolean(true)'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Remove the outer parentheses.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Add a space before the `:'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


If this fits on the previous line while staying under 80 columns, please 
wrap it around.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


Remove the space after `cacheLocation'



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Use {...@link #readFields readFields} instead of readFields



src/main/java/org/apache/hadoop/hbase/client/HTable.java


You can remove this 



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Use  not 



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Add a space before the `:'.



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Use {...@link #getRegionsInfo getRegionsInfo}



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Use {...@link ...} here and below.



src/main/java/org/apache/hadoop/hbase/client/HTable.java


Wrap this around with the previous line.



src/main/java/org/apache/hadoop/hbase/client/HTable.java


I believe you can remove this block.


- Benoit


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> ---
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> ---
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and 
> region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the 
> meta for the table, serialize the meta to a file. MR job can ship a copy of 
> the meta for the table in the DistributedCache
> - Map deserializeRegionInfo(): MR job can 
> deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map regionMap): MR job can 
> prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some 
> number of rows in META. This option could be turned on and off for one 
> particular table. 
> 
> 
> This addresses bug HBASE-2468.
> http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
> 09de2ac 
>   src/main/java/org/apache/ha

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-28 Thread Mingjie Lai

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

(Updated 2010-05-28 13:20:33.843332)


Review request for hbase.


Changes
---

Improved a little, to address the comments from Benoit Sigoure, at 
http://review.hbase.org/r/78/#review91.


Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region 
cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the 
meta for the table, serialize the meta to a file. MR job can ship a copy of the 
meta for the table in the DistributedCache
- Map deserializeRegionInfo(): MR job can 
deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map regionMap): MR job can 
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some 
number of rows in META. This option could be turned on and off for one 
particular table. 


This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
---

Unit tests passed locally for me. 


Thanks,

Mingjie



Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

2010-05-28 Thread Mingjie Lai

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

Review request for hbase.


Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region 
cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the 
meta for the table, serialize the meta to a file. MR job can ship a copy of the 
meta for the table in the DistributedCache
- Map deserializeRegionInfo(): MR job can 
deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map regionMap): MR job can 
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some 
number of rows in META. This option could be turned on and off for one 
particular table. 


This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468


Diffs
-

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
---

Unit tests passed locally for me. 


Thanks,

Mingjie