Re: Settings

2009-08-27 Thread Andrew Purtell
> dfs.datanode.socket.write.timeout => 0

This isn't needed any more given the locally patched Hadoop jar we distribute 
containing the fix for HDFS-127





From: stack 
To: hbase-user@hadoop.apache.org
Sent: Thursday, August 27, 2009 6:29:38 AM
Subject: Re: Settings

On Wed, Aug 26, 2009 at 7:40 AM, Lars George  wrote:

> Hi,
>
> It seems over the years I tried various settings in both Hadoop and HBase
> and when redoing a cluster it is always a question if we should keep that
> setting or not - since the issue it "suppressed" was fixed already. Maybe we
> should have a wiki page with the current settings and more advanced ones and
> when and how to use them. I find often that the description itself in the
> various default files are often as ambiguous as the setting key itself.



I'd rather fix the description so its clear rather than add extra info out
in a wiki; wiki pages tend to rot.



- fs.default.name => hdfs://:9000/
>
> This is usually in core-site.xml in Hadoop. Is the client or server needing
> this key at all? Did I copy it in the hbase site file by mistake?
>


There probably was a reason long ago but, yeah, you shouldn't need this (as
Schubert says).



> - hbase.cluster.distributed => true
>
> For true replication and stand alone ZK installations.
>
> - dfs.datanode.socket.write.timeout => 0
>
> This is used in DataNode but here more importantly in DFSClient. Its
> default is fixed to apparently 8 minutes, no default file (I would have
> assumed hdfs-default.xml) has it listed.
>
> We set it to 0 to avoid the socket timing out on low use etc. because the
> DFSClient reconnect is not handled gracefully. I trust setting it to 0 is
> what we recommend for HBase and is still valid?
>


For background on this, see
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#6.  It shouldn't be
needed anymore, especially with hadoop-4681 in place but IIRC, apurtell had
trouble bringing up a cluster one time when it shouldn't have been needed
but the only way to get it up was to set this to zero.   We should test.
BTW, this is a client-side config.  You have it below in hadoop.  Shouldn't
be needed there, not by hbase at least (maybe you have other clients of hdfs
that had this issue?).



>
> - hbase.regionserver.lease.period => 60
>
> Default was changed from 60 to 120 seconds. Over time I had issues and have
> set it to 10mins. Good or bad?



There is an issue to check that this is even used any more. Lease is in zk
now.  I don't think this has an effect any more.


>
> - hbase.hregion.memstore.block.multiplier => 4
>
> This is up from the default 2. Good or bad?
>


Means that we'll fill more RAM before we bring down the writes gate, up to
2x the flush size (So if 64MB is default time to flush, we'll keep taking on
writes till we get to 2x64MB).  2x is good for the 64M default I'd say --
especially during virulent upload with lots of Stores.



>
> - hbase.hregion.max.filesize => 536870912
>
> Again twice as much as the default. Opinions?


Means you should have less regions overall for perhaps some small compromise
in performance (TBD).  I think that in 0.21 we'll likely up the region
default size to this or larger.  Need to test.  Leave it I'd say if
performance is OK for you and if you have lots of regions.


> - hbase.regions.nobalancing.count => 20
>
> This seems to be missing from the hbase-default.xml but is set to 4 in the
> code if not specified. The above I got from Ryan to improve startup of
> HBase. It means that while a RS is still opening up to 20 regions it can
> start rebalance regions. Handled by the ServerManager during message
> processing. Opinions?
>


If it works for you, keep it.  This whole startup and region reassignment is
going to be redone in 0.21.  These configurations will likely change at that
time.



>
> - hbase.regions.percheckin => 20
>
> This is the count of regions assigned in one go. Handled in RegionmManager
> and the default is 10. Here we tell it to assign regions in larger batches
> to speed up the cluster start. Opinions?


See previous note.



>
>
> - hbase.regionserver.handler.count => 30
>
> Up from 10 as I had often the problem that the UI was not responsive while
> a import MR job would run. All handlers were busy doing the inserts. JD
> mentioned it may be set to a higher default value?
>

No harm here.  Do the math.  Is it likely that you'll have 30 clients
concurrently trying to get stuff out of a regionserver?  If so, keep it I'd
say.



>
>
> Hadoop:
> --
>
> - dfs.block.size => 134217728
>
> Up from the default 64MB. I have done this in the past as my data size per
> "cell" is larger than the usual few bytes. I can have a few KB up to just
> above 1 MB per value. Still making sense?



No opinion.  Whatever works for you.


>
>
> - dfs.namenode.handler.count => 20
>
> This was upped from the default 10 quite some time ago (more than a year
> ago). So is this still required?
>

Probably.  Check it duri

Re: Settings

2009-08-27 Thread Lars George
Thanks guys, am going through your feedback now. Jim, I think Stack is 
right as far as documentation is concerned. We should have a proper 
description in the default file. But on the other hand, what is also 
difficult for the new guys - especially non ex-GOOG's (they seem to have 
been brainwashed with this topic and have apparently no problem 
understanding all settings ;) ) - is to learn what options apart from 
the required once warrant a closer look if the initial attempt does not 
yield the expected result.


Maybe we should add an extra detail to the documentation like 
"Required", "Common", "Rare" etc. to indicate the importance of an 
option. Another idea is to group them that way in the hbase-default.xml 
and put in a header for each section. But that of course would somewhat 
rip apart settings that belong together semantically.


Or of course have a Cloudera like config generator that guides one 
through this by importance.


Lars

Andrew Purtell wrote:

dfs.datanode.socket.write.timeout => 0



This isn't needed any more given the locally patched Hadoop jar we distribute 
containing the fix for HDFS-127





From: stack 
To: hbase-user@hadoop.apache.org
Sent: Thursday, August 27, 2009 6:29:38 AM
Subject: Re: Settings

On Wed, Aug 26, 2009 at 7:40 AM, Lars George  wrote:

  

Hi,

It seems over the years I tried various settings in both Hadoop and HBase
and when redoing a cluster it is always a question if we should keep that
setting or not - since the issue it "suppressed" was fixed already. Maybe we
should have a wiki page with the current settings and more advanced ones and
when and how to use them. I find often that the description itself in the
various default files are often as ambiguous as the setting key itself.





I'd rather fix the description so its clear rather than add extra info out
in a wiki; wiki pages tend to rot.



- fs.default.name => hdfs://:9000/
  

This is usually in core-site.xml in Hadoop. Is the client or server needing
this key at all? Did I copy it in the hbase site file by mistake?





There probably was a reason long ago but, yeah, you shouldn't need this (as
Schubert says).



  

- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.

- dfs.datanode.socket.write.timeout => 0

This is used in DataNode but here more importantly in DFSClient. Its
default is fixed to apparently 8 minutes, no default file (I would have
assumed hdfs-default.xml) has it listed.

We set it to 0 to avoid the socket timing out on low use etc. because the
DFSClient reconnect is not handled gracefully. I trust setting it to 0 is
what we recommend for HBase and is still valid?





For background on this, see
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#6.  It shouldn't be
needed anymore, especially with hadoop-4681 in place but IIRC, apurtell had
trouble bringing up a cluster one time when it shouldn't have been needed
but the only way to get it up was to set this to zero.   We should test.
BTW, this is a client-side config.  You have it below in hadoop.  Shouldn't
be needed there, not by hbase at least (maybe you have other clients of hdfs
that had this issue?).



  

- hbase.regionserver.lease.period => 60

Default was changed from 60 to 120 seconds. Over time I had issues and have
set it to 10mins. Good or bad?





There is an issue to check that this is even used any more. Lease is in zk
now.  I don't think this has an effect any more.


  

- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad?





Means that we'll fill more RAM before we bring down the writes gate, up to
2x the flush size (So if 64MB is default time to flush, we'll keep taking on
writes till we get to 2x64MB).  2x is good for the 64M default I'd say --
especially during virulent upload with lots of Stores.



  

- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?




Means you should have less regions overall for perhaps some small compromise
in performance (TBD).  I think that in 0.21 we'll likely up the region
default size to this or larger.  Need to test.  Leave it I'd say if
performance is OK for you and if you have lots of regions.


  

- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 in the
code if not specified. The above I got from Ryan to improve startup of
HBase. It means that while a RS is still opening up to 20 regions it can
start rebalance regions. Handled by the ServerManager during message
processing. Opinions?





If it works for you, keep it.  This whole startup and region reassignment is
going to be redone in 0.21.  These configurations will likely change at that
time.



  

- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled in RegionmManager
and the default is 10. Here we tell it to assign regions in large

Re: Settings

2009-08-27 Thread Lars George

Hi Schubert,

See my comments inline below.


 HBase:
-
- fs.default.name => hdfs://:9000/

This is usually in core-site.xml in Hadoop. Is the client or server needing
this key at all? Did I copy it in the hbase site file by mistake

[schubert] I think it's better not to copy it into HBase conf file. I
suggest you modify you hbase-env.sh to add the conf path of hadoop into you
HBASE_CLASSPATH, e.g. export
HBASE_CLASSPATH=${HBASE_HOME}/../hadoop-0.20.0/conf.
Except for that, we also should config GC options here.
  


I agree, but I think this is a remnant of old and can be removed. I do 
not think I had a need to add the Hadoop conf to HBase. But it may be 
necessary for example for the replication factor. I am think on lowering 
this to 2 on smaller clusters and for the DFSClient to get the default 
we need this value available.


Another option would be to symlink the hadoop site files to the 
hbase/conf, which explicitly only wires the site files. With the above 
you are also adding the second log4j.properties and metrics file etc. 
Not sure if that could have a side effect?



- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.


[schubert] also should export HBASE_MANAGES_ZK=false in hbase-env.sh to make
consistent.
  


Agreed, I have that set, but did not mention it.


- dfs.datanode.socket.write.timeout => 0


[schubert] This parameper should be for hadoop, HDFS. It should be in
hadoop-0.20.0/conf/hdfs-site.xml. But I think it should be not useful now.
  


See Stacks and Andrews replies.


- hbase.regionserver.lease.period => 60

Default was changed from 60 to 120 seconds. Over time I had issues and have
set it to 10mins. Good or bad

[schubert] I think if you select right jvm GC options, the default 6 is
ok.
  


OK.


- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad?


[schubert] I do not think it is necessary, do you describe you reason?
  


I got that recommended but wanted to make sure I understand its 
implications. Stack has it described nicely.



- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?


[schubert] If you want bigger region size, I think its fine. We
even had tried 1GB in some tests.
  


OK.


- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 in the
code if not specified. The above I got from Ryan to improve startup of
HBase. It means that while a RS is still opening up to 20 regions it can
start rebalance regions. Handled by the ServerManager during message
processing. Opinions

[schubert] I think it make sense.
  


OK.


- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled in RegionmManager
and the default is 10. Here we tell it to assign regions in larger batches
to speed up the cluster start. Opinions?


[schubert] I have no idea about it. I think the region assignment will
occupy some CPU and memory overheads on regionserver, if there are too many
HLog to be processed.
  


OK.


- hbase.regionserver.handler.count => 30

Up from 10 as I had often the problem that the UI was not responsive while
a import MR job would run. All handlers were busy doing the inserts. JD
mentioned it may be set to a higher default value

[schubert] It make sense. I my small 5 nodes cluster, I set it 20.
  


OK.


Hadoop:
--

- dfs.block.size => 134217728

Up from the default 64MB. I have done this in the past as my data size per
"cell" is larger than the usual few bytes. I can have a few KB up to just
above 1 MB per value. Still making sense?


[schubert] I think you reason make sense.
  


OK.


- dfs.namenode.handler.count => 20

This was upped from the default 10 quite some time ago (more than a year
ago). So is this still required?



[schubert] I also set it 20.
  


OK.


- dfs.datanode.socket.write.timeout => 0

This is the matching entry to the above I suppose. This time for the
DataNode. Still required

[schubert]  I think it is not necessary now.
  


OK, yes, as Andrew notes.


- dfs.datanode.max.xcievers => 4096

Default is 256 and often way to low. What is a good value you would use?
What is the drawback setting it high?



[schubert] It should make sense. I use 3072 in my small cluster.
  


OK.

Thanks Schubert!

Lars


Re: Settings

2009-08-27 Thread Lars George

Hi Stack,

Comments inline below.


Hi,

It seems over the years I tried various settings in both Hadoop and HBase
and when redoing a cluster it is always a question if we should keep that
setting or not - since the issue it "suppressed" was fixed already. Maybe we
should have a wiki page with the current settings and more advanced ones and
when and how to use them. I find often that the description itself in the
various default files are often as ambiguous as the setting key itself.


I'd rather fix the description so its clear rather than add extra info out
in a wiki; wiki pages tend to rot.
  


I agree, but as mentioned in my earlier post the question is also how to 
communicate what are required, what commonly changed (and to what and 
why), which are more rarely changed and finally which are hardly touched 
ever.



- fs.default.name => hdfs://:9000/
  

This is usually in core-site.xml in Hadoop. Is the client or server needing
this key at all? Did I copy it in the hbase site file by mistake

There probably was a reason long ago but, yeah, you shouldn't need this (as
Schubert says).
  


OK.


- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.

- dfs.datanode.socket.write.timeout => 0


For background on this, see
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#6.  It shouldn't be
needed anymore, especially with hadoop-4681 in place but IIRC, apurtell had
trouble bringing up a cluster one time when it shouldn't have been needed
but the only way to get it up was to set this to zero.   We should test.
BTW, this is a client-side config.  You have it below in hadoop.  Shouldn't
be needed there, not by hbase at least (maybe you have other clients of hdfs
that had this issue?).
  


OK, as Andrew also points out. And not even sure anymore why I have it 
on the DN side as well. Maybe back then it was like "hey, let's try this 
as well"?



- hbase.regionserver.lease.period => 60

Default was changed from 60 to 120 seconds. Over time I had issues and have
set it to 10mins. Good or bad?


There is an issue to check that this is even used any more. Lease is in zk
now.  I don't think this has an effect any more.
  


Interesting. Should be removed from hbase-default.xml then? This raises 
another question I wanted to ask. Wouldn't make sense to print out 
unknown settings as a WARN at startup? That way people do not trip over 
things like "memcache" renamed to "memstore" and not noticing that the 
old key had no effect after an update. I suggest a warning only (or make 
this a Apache like switch to check config?) so that custom settings are 
only warned about, but the admin who added it would know about those and 
approve of them.



- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad

Means that we'll fill more RAM before we bring down the writes gate, up to
2x the flush size (So if 64MB is default time to flush, we'll keep taking on
writes till we get to 2x64MB).  2x is good for the 64M default I'd say --
especially during virulent upload with lots of Stores.
  


OK.


- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?


Means you should have less regions overall for perhaps some small compromise
in performance (TBD).  I think that in 0.21 we'll likely up the region
default size to this or larger.  Need to test.  Leave it I'd say if
performance is OK for you and if you have lots of regions.
  


OK.


- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 in the
code if not specified. The above I got from Ryan to improve startup of
HBase. It means that while a RS is still opening up to 20 regions it can
start rebalance regions. Handled by the ServerManager during message
processing. Opinions?


If it works for you, keep it.  This whole startup and region reassignment is
going to be redone in 0.21.  These configurations will likely change at that
time.
  


Makes sense, yes I read the roadmap. Looking forward to it!


- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled in RegionmManager
and the default is 10. Here we tell it to assign regions in larger batches
to speed up the cluster start. Opinions?


See previous note.
  


OK.


- hbase.regionserver.handler.count => 30

Up from 10 as I had often the problem that the UI was not responsive while
a import MR job would run. All handlers were busy doing the inserts. JD
mentioned it may be set to a higher default value?


No harm here.  Do the math.  Is it likely that you'll have 30 clients
concurrently trying to get stuff out of a regionserver?  If so, keep it I'd
say.
  


OK.


Hadoop:
--

- dfs.block.size => 134217728

Up from the default 64MB. I have done this in the past as my data size per
"cell" is larger than the usual few bytes. I can have a few KB up to just
above 1 MB per value. Still making sense?

Re: Settings

2009-08-27 Thread stack
On Thu, Aug 27, 2009 at 3:23 AM, Lars George  wrote:

> Hi Stack,



>
>  I'd rather fix the description so its clear rather than add extra info out
>> in a wiki; wiki pages tend to rot.
>>
>>
>
> I agree, but as mentioned in my earlier post the question is also how to
> communicate what are required, what commonly changed (and to what and why),
> which are more rarely changed and finally which are hardly touched ever.
>


I think the required should be in the 'Getting Started' section.

What to do about the other degrees is a little harder.

We could break up hbase-default by degree putting the rarely changed into an
hbase-arcane.xml?

We could drop options that are never changed or that look useless (if you
really need to change them, you can find them in the src): e.g.
hbase.master.meta.thread.rescanfrequency, hbase.regionserver.info.port.auto,
hbase.regionserver.msginterval, etc.

We could make hbase-client-default.xml and hbase-server-default.xml or
partition in some other way that made sense.

(I believe you can xinclude or some equivalient files for hadoop
Configuration)



>
> OK, as Andrew also points out. And not even sure anymore why I have it on
> the DN side as well. Maybe back then it was like "hey, let's try this as
> well"?
>


Yes.  We have done lots of this in the past and will probably continue to do
so going forward ("Try X!").


>
>
>  - hbase.regionserver.lease.period => 60
>>>
>>> Default was changed from 60 to 120 seconds. Over time I had issues and
>>> have
>>> set it to 10mins. Good or bad?
>>>
>>>
>> There is an issue to check that this is even used any more. Lease is in zk
>> now.  I don't think this has an effect any more.
>>
>>
>
> Interesting. Should be removed from hbase-default.xml then?


There is an issue to do so IIRC.



> This raises another question I wanted to ask. Wouldn't make sense to print
> out unknown settings as a WARN at startup?


That'd be nice but we probably ain't that disciplined and perhaps you want
to pollute your config. with "unknowns"?  (e.g. you are a subclass of hbase
as are THBase and ITHBase?).  I suppose we could read in the
hbase-default.xml and any config. not present there would be flagged?  What
about hadoop configs?  Read in the hadoop hdfs|common|etc|-xml and flag any
not present there?

St.Ack


Replication of -ROOT- and .META. directories

2009-08-27 Thread Alexandra Alecu

Hi guys,

Would you recommend using 'hadoop fs -setrep' on the -ROOT- and .META.
directories under /hbase to increase the replication?
Intuitively this would give a better chance in case of hardware/region
failure or even could give faster access to the data regions? However I am
not sure if I am right in my intuition and if this would benefit HBase or
not.

Thanks,
Alexandra.

-- 
View this message in context: 
http://www.nabble.com/Replication-of--ROOT--and-.META.-directories-tp25172985p25172985.html
Sent from the HBase User mailing list archive at Nabble.com.



Re: hbase/jython outdated

2009-08-27 Thread stack
On Wed, Aug 26, 2009 at 3:29 AM, Andrei Savu  wrote:

> I have fixed the code samples and opened a feature request on JIRA for
> the jython command.
>
> https://issues.apache.org/jira/browse/HBASE-1796
>

Thanks.  Patch looks good.  Will commit soon.   Did you update the jython
wiki page?  It seems to be using old API still.

>
> Is there any python library for REST interface? How stable is the REST
> interface?
>

Not that I know of (a ruby one, yes IIRC).  Write against stargate if you
are going to do one since o.a.h.h.rest is deprecated in 0.20.0.

St.Ack


Re: hbase/jython outdated

2009-08-27 Thread Andrei Savu
See comments bellow.

On Thu, Aug 27, 2009 at 7:58 PM, stack wrote:
> On Wed, Aug 26, 2009 at 3:29 AM, Andrei Savu  wrote:
>
>> I have fixed the code samples and opened a feature request on JIRA for
>> the jython command.
>>
>> https://issues.apache.org/jira/browse/HBASE-1796
>>
>
> Thanks.  Patch looks good.  Will commit soon.   Did you update the jython
> wiki page?  It seems to be using old API still.

I have updated the Jython wiki page to use the latest API. After the
commit I will also
update the instruction for running the sample code.

>
>>
>> Is there any python library for REST interface? How stable is the REST
>> interface?
>>
>
> Not that I know of (a ruby one, yes IIRC).  Write against stargate if you
> are going to do one since o.a.h.h.rest is deprecated in 0.20.0.
>

I am going give it a try and post the results back here.

What about thrift? It's going to be deprecated?

> St.Ack
>

-- 
Savu Andrei

Website: http://www.andreisavu.ro/


Re: hbase/jython outdated

2009-08-27 Thread stack
On Wed, Aug 26, 2009 at 3:29 AM, Andrei Savu  wrote:

>
> Until recently I have used the python thrift interface but it has some
> serious issues with unicode.


Why does it matter?  HBase is all about byte arrays?  Before you give
content to hbase, do your decode/encode?  See o.a.h.h.util.Bytes for utility
to help here.

The good thing about python over thrift is that you won't be alone.   Other
fellas are coming in that way over the python causeway.

St.Ack


Re: hbase/jython outdated

2009-08-27 Thread stack
On Thu, Aug 27, 2009 at 11:18 AM, Andrei Savu  wrote:

>
> What about thrift? It's going to be deprecated?
>

Thrift is not going anywhere.  It will get a freshener during hbase 0.21
development to align it better with the new hbase client API.  At that time
it will likely be moved out to src/contrib but the thrift interface is here
to stay, at least for the near future.

St.Ack


Re: Settings

2009-08-27 Thread Lars George

Hi Stack,


I think the required should be in the 'Getting Started' section.

What to do about the other degrees is a little harder.

We could break up hbase-default by degree putting the rarely changed into an
hbase-arcane.xml?

We could drop options that are never changed or that look useless (if you
really need to change them, you can find them in the src): e.g.
hbase.master.meta.thread.rescanfrequency, hbase.regionserver.info.port.auto,
hbase.regionserver.msginterval, etc.

We could make hbase-client-default.xml and hbase-server-default.xml or
partition in some other way that made sense.

(I believe you can xinclude or some equivalient files for hadoop
Configuration)
  


I would not drop any but them them all listed, but with the additional 
notes as to how common they are.



This raises another question I wanted to ask. Wouldn't make sense to print
out unknown settings as a WARN at startup?




That'd be nice but we probably ain't that disciplined and perhaps you want
to pollute your config. with "unknowns"?  (e.g. you are a subclass of hbase
as are THBase and ITHBase?).  I suppose we could read in the
hbase-default.xml and any config. not present there would be flagged?  What
about hadoop configs?  Read in the hadoop hdfs|common|etc|-xml and flag any
not present there?
  


Yes, simply load the matching "-default.xml" and diff them. Should leave 
only user defined settings, which most people do not use. I do in one 
cluster where I define a hostname and use that as a variable in all 
places that need to specify an explicit host. Those would be washed out 
but that is fine.


If others think this is useful I would have a look at it. But only if 
more people are interested.


Lars


Re: Replication of -ROOT- and .META. directories

2009-08-27 Thread Andrew Purtell
It wouldn't hurt as you mention for increased tolerance of loss of data nodes. 
But the clients cache ROOT and META so read performance gain will be negligible.

   - Andy





From: Alexandra Alecu 
To: hbase-user@hadoop.apache.org
Sent: Thursday, August 27, 2009 6:05:06 PM
Subject: Replication of -ROOT- and .META. directories


Hi guys,

Would you recommend using 'hadoop fs -setrep' on the -ROOT- and .META.
directories under /hbase to increase the replication?
Intuitively this would give a better chance in case of hardware/region
failure or even could give faster access to the data regions? However I am
not sure if I am right in my intuition and if this would benefit HBase or
not.

Thanks,
Alexandra.

-- 
View this message in context: 
http://www.nabble.com/Replication-of--ROOT--and-.META.-directories-tp25172985p25172985.html
Sent from the HBase User mailing list archive at Nabble.com.


  

XML Input?

2009-08-27 Thread llpind

Can someone please point me to a XML input format example.  I'm using .20
code.  Thanks
-- 
View this message in context: 
http://www.nabble.com/XML-Input--tp25179786p25179786.html
Sent from the HBase User mailing list archive at Nabble.com.



RE: REST or Stargate?

2009-08-27 Thread Greg Cottman

Actually... It doesn't look like either of them will work for me at the moment. 
 I downloaded 0.20.0 RC2 and gave it a whirl.

Stargate is good except that the scanner still seems a little under-cooked.  I 
saw a note saying that scanner filters are in the 0.21 roadmap.  Presumably 
those filters will include selecting column family, specifying start and/or 
stop rows, and supporting the limit parameter.  Until that time I assume that 
unless you know the row keys then every access requires a full table scan.

HBase rest package is good except for scanners as well.  There are some subtle 
differences in other bits, like base 64 encoding the column family names in the 
table description, but the API seems largely unchanged.  My problem is an 
internal server error when I try to get next on a scanner.

I successfully create the scanner and get the location from the HTTP 200 
response.

Then I've tried both PUT and POST with a path of 
"/api//scanner/".

This gives me an HTTP 500 error with the following data:


   500
   org.apache.hadoop.hbase.rest.exception.HBaseRestException: Object 
does not conform to the ISerializable interface.  Unable to generate xml 
output.
   true


I can't see anything obvious in the master or rest logs, and I'm not sure where 
else that exception would be dumped.

Any thoughts?

Cheers,
Greg.

From: Greg Cottman
Sent: Tuesday, 18 August 2009 6:30 PM
To: hbase-user@hadoop.apache.org
Subject: REST or Stargate?


I'm still a little confused by 0.20.0 releasing both a rewrite of the REST 
interface and giving Stargate contrib status as part of the release.  This 
seems like having your REST *and* eating it too!  Presumably if one of these 
implementations gains enough momentum it would lead to the deprecation of the 
other.

It seems like the Stargate API is more comprehensive, but having said that, I 
would usually go for core functionality over a contribution if they both met my 
requirements.  I don't want to invest code and time in one to see it fall by 
the wayside over the next few releases.

Is anyone backing one of these to become the preferred method of serving REST 
from HBase in the near future?

Cheers,
Greg.