date:20130420

Re: Overwrite a row

2013-04-20 Thread Kristoffer Sjögren

The schema is known beforehand so this is exactly what I need. Great!

One more question. What guarantees does the batch operation have? Are the
operations contained within each batch atomic? I.e. all mutations will be
given the same timestamp? If something fails, all operation fail or can it
fail partially?

Thanks for your help, much appreciated.

Cheers,
-Kristoffer

On Sat, Apr 20, 2013 at 4:47 AM, Ted Yu yuzhih...@gmail.com wrote:

I don't know details about Kristoffer's schema.
If all the column qualifiers are known a priori, mutateRow() should serve
his needs.

HBase allows arbitrary number of columns in a column family. If the schema
is dynamic, mutateRow() wouldn't suffice.
If the column qualifiers are known but the row is very wide (and a few
columns are updated per call), performance would degrade.

Just some factors to consider.

Cheers

On Fri, Apr 19, 2013 at 1:41 PM, Mohamed Ibrahim mibra...@mibrahim.net
wrote:

Actually I do see it in the 0.94 JavaDocs (

http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
),
so may be it was added in 0.94.6 even though the jira says fixed in 0.95
.
I haven't used it though, but it seems that's what you're looking for.

Sorry for confusion.

Mohamed

On Fri, Apr 19, 2013 at 4:35 PM, Mohamed Ibrahim mibra...@mibrahim.net
wrote:

It seems that 0.95 is not released yet, mutateRow won't be a solution
for
now. I saw it in the downloads and I thought it was released.

On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim
mibra...@mibrahim.net
wrote:

Just noticed you want to delete as well. I think that's supported
since
0.95 in mutateRow (

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
).
You can do multiple puts and deletes and they will be performed
atomically.
So you can remove qualifiers and put new ones.

Mohamed

On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren sto...@gmail.com
wrote:

What would you suggest? I want the operation to be atomic.

On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu yuzhih...@gmail.com wrote:

What is the maximum number of versions do you allow for the
underlying
table ?

Thanks

On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren
sto...@gmail.com
wrote:

Is it possible to completely overwrite/replace a row in a single
_atomic_
action? Already existing columns and qualifiers should be removed
if
they
do not exist in the data inserted into the row.

The only way to do this is to first delete the row then insert
new
data
in
its place, correct? Or is there an operation to do this?

Cheers,
-Kristoffer

Re: RefGuide schema design examples

2013-04-20 Thread Ravindranath Akila

+1

R. A.
On 20 Apr 2013 12:07, Viral Bajaria viral.baja...@gmail.com wrote:

 +1!


 On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda 
 marcosluis2...@gmail.com wrote:

  Wow, great work, Doug.
 
 
  2013/4/19 Doug Meil doug.m...@explorysmedical.com
 
   Hi folks,
  
   I reorganized the Schema Design case studies 2 weeks ago and
 consolidated
   them into here, plus added several cases common on the dist-list.
  
   http://hbase.apache.org/book.html#schema.casestudies
  
   Comments/suggestions welcome.  Thanks!
  
  
   Doug Meil
   Chief Software Architect, Explorys
   doug.m...@explorysmedical.com
  
  
  
 
 
  --
  Marcos Ortiz Valmaseda,
  *Data-Driven Product Manager* at PDVSA
  *Blog*: http://dataddict.wordpress.com/
  *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
  *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186

Re: Slow region server recoveries

2013-04-20 Thread Nicolas Liochon

Hi,

I looked at it again with a fresh eye. As Varun was saying, the root cause
is the wrong order of the block locations.

The root cause of the root cause is actually simple: HBASE started the
recovery while the node was not yet stale from an HDFS pov.

Varun mentioned this timing:
Lost Beat: 27:30
Became stale: 27:50 - * this is a guess and reverse engineered (stale
timeout 20 seconds)
Became dead: 37:51

But the recovery started at 27:13 (15 seconds before we have this log
line)
2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.156.194.94:50010 for file
/hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/02f639bb43944d4ba9abcf58287831c0
for block
BP-696828882-10.168.7.226-1364886167971:blk_-5977178030490858298_99853:java.net.SocketTimeoutException:
15000 millis timeout while waiting for channel to be ready for connect. ch
: java.nio.channels.SocketChannel[connection-pending remote=/
10.156.194.94:50010]

So when we took the blocks from the NN, the datanode was not stale, so you
have the wrong (random) order.

ZooKeeper can expire a session before the timeout. I don't what why it does
this in this case, but I don't consider it as a ZK bug: if ZK knows that a
node is dead, it's its role to expire the session. There is something more
fishy: we started the recovery while the datanode was still responding to
heartbeat. I don't know why. Maybe the OS has been able to kill 15 the RS
before vanishing away.

Anyway, we then have an exception when we try to connect, because the RS
does not have a TCP connection to this datanode. And this is retried many
times.

You would not have this with trunk, because HBASE-6435 reorders the blocks
inside the client, using an information not available to the NN, excluding
the datanode of the region server under recovery.

Some conclusions:
- we should likely backport hbase-6435 to 0.94.
- I will revive HDFS-3706 and HDFS-3705 (the non hacky way to get
hbase-6435).
- There are some stuff that could be better in HDFS. I will see.
- I'm worried by the SocketTimeoutException. We should get NoRouteToHost
at a moment, and we don't. That's also why it takes ages. I think it's an
AWS thing, but it brings to issue: it's slow, and, in HBase, you don't know
if the operation could have been executed or not, so it adds complexity to
some scenarios. If someone with enough network and AWS knowledge could
clarify this point it would be great.

Cheers,

Nicolas

On Fri, Apr 19, 2013 at 10:10 PM, Varun Sharma va...@pinterest.com wrote:

This is 0.94.3 hbase...

On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma va...@pinterest.com wrote:

Hi Ted,

I had a long offline discussion with nicholas on this. Looks like the
last
block which was still being written too, took an enormous time to
recover.
Here's what happened.
a) Master split tasks and region servers process them
b) Region server tries to recover lease for each WAL log - most cases are
noop since they are already rolled over/finalized
c) The last file lease recovery takes some time since the crashing server
was writing to it and had a lease on it - but basically we have the
lease 1
minute after the server was lost
d) Now we start the recovery for this but we end up hitting the stale
data
node which is puzzling.

It seems that we did not hit the stale datanode when we were trying to
recover the finalized WAL blocks with trivial lease recovery. However,
for
the final block, we hit the stale datanode. Any clue why this might be
happening ?

Varun

On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu yuzhih...@gmail.com wrote:

Can you show snippet from DN log which mentioned UNDER_RECOVERY ?

Here is the criteria for stale node checking to kick in (from

https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
):

+ * Check if the datanode is in stale state. Here if
+ * the namenode has not received heartbeat msg from a
+ * datanode for more than staleInterval (default value is
+ * {@link
DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
+ * the datanode will be treated as stale node.

On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma va...@pinterest.com
wrote:

Is there a place to upload these logs ?

On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma va...@pinterest.com
wrote:

Hi Nicholas,

Attached are the namenode, dn logs (of one of the healthy replicas
of
the
WAL block) and the rs logs which got stuch doing the log split.
Action
begins at 2013-04-19 00:27*.

Also, the rogue block is 5723958680970112840_174056. Its very
interesting
to trace this guy through the HDFS logs (dn and nn).

Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ?
Also
does
the stale node stuff kick in for that state ?

Thanks
Varun

On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon

Re: Overwrite a row

2013-04-20 Thread Ted Yu

Operations within each batch are atomic.
They would either all succeed or all fail.

Time stamps would all refer to the latest cell (KeyVal).

Cheers

On Apr 20, 2013, at 12:17 AM, Kristoffer Sjögren sto...@gmail.com wrote:

The schema is known beforehand so this is exactly what I need. Great!

Thanks for your help, much appreciated.

Cheers,
-Kristoffer

On Sat, Apr 20, 2013 at 4:47 AM, Ted Yu yuzhih...@gmail.com wrote:

I don't know details about Kristoffer's schema.
If all the column qualifiers are known a priori, mutateRow() should serve
his needs.

Just some factors to consider.

Cheers

On Fri, Apr 19, 2013 at 1:41 PM, Mohamed Ibrahim mibra...@mibrahim.net
wrote:

Actually I do see it in the 0.94 JavaDocs (
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
),
so may be it was added in 0.94.6 even though the jira says fixed in 0.95
.
I haven't used it though, but it seems that's what you're looking for.

Sorry for confusion.

Mohamed

On Fri, Apr 19, 2013 at 4:35 PM, Mohamed Ibrahim mibra...@mibrahim.net
wrote:

It seems that 0.95 is not released yet, mutateRow won't be a solution
for
now. I saw it in the downloads and I thought it was released.

On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim
mibra...@mibrahim.net
wrote:

Just noticed you want to delete as well. I think that's supported
since
0.95 in mutateRow (
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
).
You can do multiple puts and deletes and they will be performed
atomically.
So you can remove qualifiers and put new ones.

Mohamed

On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren sto...@gmail.com
wrote:

What would you suggest? I want the operation to be atomic.

On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu yuzhih...@gmail.com wrote:

What is the maximum number of versions do you allow for the
underlying
table ?

Thanks

On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren
sto...@gmail.com
wrote:

Is it possible to completely overwrite/replace a row in a single
_atomic_
action? Already existing columns and qualifiers should be removed
if
they
do not exist in the data inserted into the row.

The only way to do this is to first delete the row then insert
new
data
in
its place, correct? Or is there an operation to do this?

Cheers,
-Kristoffer

Re: talk list table

2013-04-20 Thread Amit Sela

Hope I'm not too late here... regarding hot spotting with sequential keys,
I'd suggest you read this Sematext blog -
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
They present a nice idea there for this kind of issues.

Good Luck!

On Mon, Apr 15, 2013 at 11:18 PM, Ted Yu yuzhih...@gmail.com wrote:

bq. write performance would be lower

The above means poorer performance.

bq. I could batch them up application side

Please do that.

bq. I guess there is no way to turn that off?

That's right.

On Mon, Apr 15, 2013 at 11:15 AM, Kireet kir...@feedly.com wrote:

Thanks for the reply. write performance would be lower - this means
better?

Also I think I used the wrong terminology regarding batching. I meant to
ask if it uses the client side write buffer. I would think not since the
append() method returns a Result. I could batch them up application side
I
suppose. Append also seems to return the updated value. This seems like a
lot of unnecessary I/O in my case since I am not immediately interested
in
the updated value. I guess there is no way to turn that off?

On 4/15/13 1:28 PM, Ted Yu wrote:

I assume you would select HBase 0.94.6.1 (the latest release) for this
project.

For #1, write performance would be lower if you choose to use Append
(vs.
using Put).

bq. Can appends be batched by the client or do they execute immediately?
This depends on your use case. Take a look at the following method in
HTable where you can send a list of actions (Appends):

public void batch(final List?extends Row actions, final Object[]
results)
For #2
bq. The other would be to prefix the timestamp row key with a random
leading byte.

This technique has been used elsewhere and is better than the first one.

Cheers

On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy
kireet-Teh5dPVPL8nQT0dZR+*
*a...@public.gmane.org
kireet-teh5dpvpl8nqt0dzr%2ba...@public.gmane.org
wrote:

I are planning to create a scheduled task list table in our hbase
cluster. Essentially we will define a table with key timestamp and then
the
row contents will be all the tasks that need to be processed within
that
second (or whatever time period). I am trying to do the reasonably
wide
rows design mentioned in the hbasecon opentsdb talk. A couple of
questions:

1. Should we use append or put to create tasks? Since these rows will
not
live forever, storage space in not a concern, read/write performance is
more important. As concurrency increases I would guess the row lock may
become an issue in append? Can appends be batched by the client or do
they
execute immediately?

2. I am a little worried about hotspots. This basic design may cause
issues in terms of the table's performance. Many tasks will execute and
reschedule themselves using the same interval, t + 1 hour for example.
So
many the writes may all go to the same block. Also, we have a lot of
other
data so I am worried it may impact performance of unrelated data if the
region server gets too busy servicing the task list table. I can think
of 2
strategies to avoid this. One would be to create N different tables and
read/write tasks to them randomly. This may spread load across servers,
but
there is no guarantee hbase will place the tables on different region
servers, correct? The other would be to prefix the timestamp row key
with a
random leading byte. Then when reading from the task list table,
consumers
could scan from any/all possible values of the random byte + current
timestamp to obtain tasks. Both strategies seem like they could spread
out
load, but at the cost of more work/complexity to read tasks from the
table.
Do either of those approaches make sense?

On the read side, it seems like a similar problem exists in that all
consumers will be reading rows based on the current timestamp. Is this
good
because the block will very likely be cached or bad because the region
server may become overloaded? I have a feeling the answer is going to
be
it depends. :)

I did see the previous posts on queues and the tips there - use
zookeeper
for coordination, schedule major compactions, etc. Sorry if these
questions
are basic, I am pretty new to hbase. Thanks!

hbase + mapreduce

2013-04-20 Thread Adrian Acosta Mitjans

Hello:

I'm working in a proyect, and i'm using hbase for storage the data, y have this 
method that work great but without the performance i'm looking for, so i want 
is to make the same but using mapreduce.


public ArrayListMyObject findZ(String z) throws IOException {

ArrayListMyObject rows = new ArrayListMyObject();
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, test);
Scan s = new Scan();
s.addColumn(Bytes.toBytes(x), Bytes.toBytes(y));
ResultScanner scanner = table.getScanner(s);
try {
for (Result rr : scanner) {
if (Bytes.toString(rr.getValue(Bytes.toBytes(x), 
Bytes.toBytes(y))).equals(z)) {
rows.add(getInformation(Bytes.toString(rr.getRow(;
}
}
} finally {
scanner.close();
}
return archivos;
}

the getInformation method take all the columns and convert the row in MyObject 
type.

I just want a example or a link to a tutorial that make something like this,  i 
want to get a result type as answer and not a number to count words, like many 
a found.
My natural language is spanish, so sorry if something is not well writing.

Thanths
http://www.uci.cu

Re: Overwrite a row

2013-04-20 Thread Kristoffer Sjögren

Just to absolutely be clear, is this also true for a batch that span
multiple rows?

On Sat, Apr 20, 2013 at 2:42 PM, Ted Yu yuzhih...@gmail.com wrote:

Operations within each batch are atomic.
They would either all succeed or all fail.

Time stamps would all refer to the latest cell (KeyVal).

Cheers

On Apr 20, 2013, at 12:17 AM, Kristoffer Sjögren sto...@gmail.com wrote:

The schema is known beforehand so this is exactly what I need. Great!

Thanks for your help, much appreciated.

Cheers,
-Kristoffer

On Sat, Apr 20, 2013 at 4:47 AM, Ted Yu yuzhih...@gmail.com wrote:

I don't know details about Kristoffer's schema.
If all the column qualifiers are known a priori, mutateRow() should
serve
his needs.

HBase allows arbitrary number of columns in a column family. If the
schema
is dynamic, mutateRow() wouldn't suffice.
If the column qualifiers are known but the row is very wide (and a few
columns are updated per call), performance would degrade.

Just some factors to consider.

Cheers

On Fri, Apr 19, 2013 at 1:41 PM, Mohamed Ibrahim mibra...@mibrahim.net
wrote:

Actually I do see it in the 0.94 JavaDocs (

http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
),
so may be it was added in 0.94.6 even though the jira says fixed in
0.95
.
I haven't used it though, but it seems that's what you're looking for.

Sorry for confusion.

Mohamed

On Fri, Apr 19, 2013 at 4:35 PM, Mohamed Ibrahim
mibra...@mibrahim.net
wrote:

It seems that 0.95 is not released yet, mutateRow won't be a solution
for
now. I saw it in the downloads and I thought it was released.

On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim
mibra...@mibrahim.net
wrote:

Just noticed you want to delete as well. I think that's supported
since
0.95 in mutateRow (

Mohamed

On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren
sto...@gmail.com
wrote:

What would you suggest? I want the operation to be atomic.

On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu yuzhih...@gmail.com
wrote:

What is the maximum number of versions do you allow for the
underlying
table ?

Thanks

On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren
sto...@gmail.com
wrote:

Is it possible to completely overwrite/replace a row in a single
_atomic_
action? Already existing columns and qualifiers should be removed
if
they
do not exist in the data inserted into the row.

The only way to do this is to first delete the row then insert
new
data
in
its place, correct? Or is there an operation to do this?

Cheers,
-Kristoffer

Re: Slow region server recoveries

2013-04-20 Thread Varun Sharma

Hi Nicholas,

Regarding the following, I think this is not a recovery - the file below is
an HFIle and is being accessed on a get request. On this cluster, I don't
have block locality. I see these exceptions for a while and then they are
gone, which means the stale node thing kicks in.

2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.156.194.94:50010 for file
/hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
02f639bb43944d4ba9abcf58287831c0
for block

This is the real bummer. The stale datanode is 1st even 90 seconds
afterwards.

*2013-04-19 00:28:35*,777 WARN
org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
hdfs://
ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed,
returning error
java.io.IOException: Cannot obtain block length for
LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
10.156.192.106:50010, 10.156.195.38:50010]}*
---at
org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
---at
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
---at
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
---at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:117)
---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
---at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
---at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
---at
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
---at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
---at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1707)
---at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
---at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
---at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
---at
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
---at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
---at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
---at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
---at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
---at
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
---at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
---at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
---at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
---at java.lang.Thread.run(Thread.java:662)



On Sat, Apr 20, 2013 at 1:16 AM, Nicolas Liochon nkey...@gmail.com wrote:

 Hi,

 I looked at it again with a fresh eye. As Varun was saying, the root cause
 is the wrong order of the block locations.

 The root cause of the root cause is actually simple: HBASE started the
 recovery while the node was not yet stale from an HDFS pov.

 Varun mentioned this timing:
 Lost Beat: 27:30
 Became stale: 27:50 - * this is a guess and reverse engineered (stale
 timeout 20 seconds)
 Became dead: 37:51

 But the  recovery started at 27:13 (15 seconds before we have this log
 line)
 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
 connect to /10.156.194.94:50010 for file

 /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/02f639bb43944d4ba9abcf58287831c0
 for block

 BP-696828882-10.168.7.226-1364886167971:blk_-5977178030490858298_99853:java.net.SocketTimeoutException:
 15000 millis timeout while waiting for channel to be ready for connect. ch
 : java.nio.channels.SocketChannel[connection-pending remote=/
 10.156.194.94:50010]

 So when we took the blocks from the NN, the datanode was not stale, so you
 have the wrong (random) order.

 ZooKeeper can expire a session before the timeout. I don't what why it does
 this in this case, but I don't consider it as a ZK bug: if ZK knows that a
 node is dead, it's its role to expire the session. There is something more
 fishy: we started the recovery while the datanode was still responding to
 heartbeat. I don't know why. Maybe the OS has been able to kill 15 the RS
 before vanishing away.

 Anyway, we then have an exception when we try to connect, because the RS
 does not have a TCP connection to this datanode. And this is retried many
 times.

 You would not have this with trunk, because HBASE-6435 reorders the blocks
 inside the client, using

default region splitting on which value?

2013-04-20 Thread Pal Konyves

Hi,

I am just reading about region splitting. By default - as I understand -
Hbase handles splitting the regions. I just don't know how to imagine on
which key it splits the regions.

1) For example when I write MD5 hash of rowkeys, they are most probably
evenly distributed from
00... to F... right? When  Hbase starts with one region, all the
writes goes into that region, and when the HFile get's too big, it just
gets for example the median value of the stored keys, and split the region
by this?

2) I want to bulk load tons of data with the HBase java client API put
operations. I want it to perform well. My keys are numeric sequential
values (which I know from this post, I cannot load into Hbase sequentially,
because the Hbase tables are going to be sad
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
 )
So I thought I would pre-split the table into regions, and load the data
randomized. This way I will get good distribution among region servers in
terms of network IO from the beginning. Is that a good idea?

3) If my rowkeys are not evenly distributed in the keyspace, but they show
some peaks or bursts. e.g. 000-999, but most of the keys gather around 020
and 060 values, is it a good idea to have the pre region splits at those
peaks?

Thanks in advance,
Pal

Re: Slow region server recoveries

2013-04-20 Thread Varun Sharma

The important thing to note is the block for this rogue WAL is
UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
thing kicks in correctly for UNDER_RECOVERY blocks but failed.


On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma va...@pinterest.com wrote:

 Hi Nicholas,

 Regarding the following, I think this is not a recovery - the file below
 is an HFIle and is being accessed on a get request. On this cluster, I
 don't have block locality. I see these exceptions for a while and then they
 are gone, which means the stale node thing kicks in.

 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
 connect to /10.156.194.94:50010 for file
 /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
 02f639bb43944d4ba9abcf58287831c0
 for block

 This is the real bummer. The stale datanode is 1st even 90 seconds
 afterwards.

 *2013-04-19 00:28:35*,777 WARN
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
 hdfs://
 ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed,
  returning error
 java.io.IOException: Cannot obtain block length for
 LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
 getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
 10.156.192.106:50010, 10.156.195.38:50010]}*
 ---at
 org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
 ---at
 org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
 ---at
 org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
 ---at
 org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:117)
 ---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
 ---at
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
 ---at
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
 ---at
 org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
 ---at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1707)
 ---at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
 ---at
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
 ---at
 org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
 ---at
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
 ---at
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
 ---at
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
 ---at java.lang.Thread.run(Thread.java:662)



 On Sat, Apr 20, 2013 at 1:16 AM, Nicolas Liochon nkey...@gmail.comwrote:

 Hi,

 I looked at it again with a fresh eye. As Varun was saying, the root cause
 is the wrong order of the block locations.

 The root cause of the root cause is actually simple: HBASE started the
 recovery while the node was not yet stale from an HDFS pov.

 Varun mentioned this timing:
 Lost Beat: 27:30
 Became stale: 27:50 - * this is a guess and reverse engineered (stale
 timeout 20 seconds)
 Became dead: 37:51

 But the  recovery started at 27:13 (15 seconds before we have this log
 line)
 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
 connect to /10.156.194.94:50010 for file

 /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/02f639bb43944d4ba9abcf58287831c0
 for block

 BP-696828882-10.168.7.226-1364886167971:blk_-5977178030490858298_99853:java.net.SocketTimeoutException:
 15000 millis timeout while waiting for channel to be ready for connect. ch
 : java.nio.channels.SocketChannel[connection-pending remote=/
 10.156.194.94:50010]

 So when we took the blocks from the NN, the datanode was not stale, so you
 have the wrong (random) order.

 ZooKeeper can expire a session before the timeout. I don't what why it
 does
 this in this case, but I don't consider it as a ZK bug: if ZK knows that a
 node is dead, it's its role to expire the session. There is something more
 fishy: we started the recovery while the datanode was still responding to
 heartbeat. I

Re: default region splitting on which value?

2013-04-20 Thread Ted Yu

How many column families do you have ?

For #3, per-splitting table at the row keys corresponding to peaks makes sense.

On Apr 20, 2013, at 10:52 AM, Pal Konyves paul.kony...@gmail.com wrote:

Hi,

I am just reading about region splitting. By default - as I understand -
Hbase handles splitting the regions. I just don't know how to imagine on
which key it splits the regions.

1) For example when I write MD5 hash of rowkeys, they are most probably
evenly distributed from
00... to F... right? When Hbase starts with one region, all the
writes goes into that region, and when the HFile get's too big, it just
gets for example the median value of the stored keys, and split the region
by this?

2) I want to bulk load tons of data with the HBase java client API put
operations. I want it to perform well. My keys are numeric sequential
values (which I know from this post, I cannot load into Hbase sequentially,
because the Hbase tables are going to be sad
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
)
So I thought I would pre-split the table into regions, and load the data
randomized. This way I will get good distribution among region servers in
terms of network IO from the beginning. Is that a good idea?

3) If my rowkeys are not evenly distributed in the keyspace, but they show
some peaks or bursts. e.g. 000-999, but most of the keys gather around 020
and 060 values, is it a good idea to have the pre region splits at those
peaks?

Thanks in advance,
Pal

Re: default region splitting on which value?

2013-04-20 Thread Pal Konyves

Hi Ted,
Only one family, my data is very simple key-value, although I want to make
sequential scan, so making a hash of the key is not an option.

On Sat, Apr 20, 2013 at 10:07 PM, Ted Yu yuzhih...@gmail.com wrote:

How many column families do you have ?

For #3, per-splitting table at the row keys corresponding to peaks makes
sense.

On Apr 20, 2013, at 10:52 AM, Pal Konyves paul.kony...@gmail.com wrote:

Hi,

I am just reading about region splitting. By default - as I understand -
Hbase handles splitting the regions. I just don't know how to imagine on
which key it splits the regions.

http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
)
So I thought I would pre-split the table into regions, and load the data
randomized. This way I will get good distribution among region servers in
terms of network IO from the beginning. Is that a good idea?

3) If my rowkeys are not evenly distributed in the keyspace, but they
show
some peaks or bursts. e.g. 000-999, but most of the keys gather around
020
and 060 values, is it a good idea to have the pre region splits at those
peaks?

Thanks in advance,
Pal

Re: default region splitting on which value?

2013-04-20 Thread Ted Yu

The answer to your first question is yes - midkey of the key range would be
chosen as split key.

For #2, can you tell us how you plan to randomize the loading ?
Bulk load normally means preparing HFiles which would be loaded directly into
your table.

Cheers

On Apr 20, 2013, at 1:11 PM, Pal Konyves paul.kony...@gmail.com wrote:

Hi Ted,
Only one family, my data is very simple key-value, although I want to make
sequential scan, so making a hash of the key is not an option.

On Sat, Apr 20, 2013 at 10:07 PM, Ted Yu yuzhih...@gmail.com wrote:

How many column families do you have ?

For #3, per-splitting table at the row keys corresponding to peaks makes
sense.

On Apr 20, 2013, at 10:52 AM, Pal Konyves paul.kony...@gmail.com wrote:

Hi,

I am just reading about region splitting. By default - as I understand -
Hbase handles splitting the regions. I just don't know how to imagine on
which key it splits the regions.

2) I want to bulk load tons of data with the HBase java client API put
operations. I want it to perform well. My keys are numeric sequential
values (which I know from this post, I cannot load into Hbase
sequentially,
because the Hbase tables are going to be sad
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
)
So I thought I would pre-split the table into regions, and load the data
randomized. This way I will get good distribution among region servers in
terms of network IO from the beginning. Is that a good idea?

3) If my rowkeys are not evenly distributed in the keyspace, but they
show
some peaks or bursts. e.g. 000-999, but most of the keys gather around
020
and 060 values, is it a good idea to have the pre region splits at those
peaks?

Thanks in advance,
Pal

Re: default region splitting on which value?

2013-04-20 Thread Pal Konyves

I am making a paper for school about HBase, so the data I chose is not a
real usable example. I am familiar with GTFS that is a de facto standard
for storing information about public transportation schedules: when vehicle
arrives to a stop and where it goes toward.

I chose to genrate the rows on the fly, where each row represents a
sequence of 'bus' stops that make a route from the first stop until the
last stop.
e.g.: [first_stop_id,last_stop_id],string_sequence_of_stops
where within the [...] is the rowkey.

So long story short, I generate the data. I want to use the HBase java
client api to store the rows with Put. I plan to randomize it by picking
random first_stop_id-s, and use more threads.

the rowkeys will still have a sequence, because the way I generate the rows
will output about 100-1000 rows starting with the same first_stop_id within
the rowkey. The total ammount of rows will be about billions, and would
take up about 1TB.

On Sat, Apr 20, 2013 at 10:54 PM, Ted Yu yuzhih...@gmail.com wrote:

The answer to your first question is yes - midkey of the key range would
be chosen as split key.

For #2, can you tell us how you plan to randomize the loading ?
Bulk load normally means preparing HFiles which would be loaded directly
into your table.

Cheers

On Apr 20, 2013, at 1:11 PM, Pal Konyves paul.kony...@gmail.com wrote:

Hi Ted,
Only one family, my data is very simple key-value, although I want to
make
sequential scan, so making a hash of the key is not an option.

On Sat, Apr 20, 2013 at 10:07 PM, Ted Yu yuzhih...@gmail.com wrote:

How many column families do you have ?

For #3, per-splitting table at the row keys corresponding to peaks makes
sense.

On Apr 20, 2013, at 10:52 AM, Pal Konyves paul.kony...@gmail.com
wrote:

Hi,

I am just reading about region splitting. By default - as I understand
-
Hbase handles splitting the regions. I just don't know how to imagine
on
which key it splits the regions.

1) For example when I write MD5 hash of rowkeys, they are most probably
evenly distributed from
00... to F... right? When Hbase starts with one region, all
the
writes goes into that region, and when the HFile get's too big, it just
gets for example the median value of the stored keys, and split the
region
by this?

http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
)
So I thought I would pre-split the table into regions, and load the
data
randomized. This way I will get good distribution among region servers
in
terms of network IO from the beginning. Is that a good idea?

3) If my rowkeys are not evenly distributed in the keyspace, but they
show
some peaks or bursts. e.g. 000-999, but most of the keys gather around
020
and 060 values, is it a good idea to have the pre region splits at
those
peaks?

Thanks in advance,
Pal

Re: talk list table

2013-04-20 Thread Otis Gospodnetic

+
http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/
if you use Maven and want to use HBaseWD.

Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html

On Sat, Apr 20, 2013 at 11:24 AM, Amit Sela am...@infolinks.com wrote:
Hope I'm not too late here... regarding hot spotting with sequential keys,
I'd suggest you read this Sematext blog -
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
They present a nice idea there for this kind of issues.

Good Luck!

On Mon, Apr 15, 2013 at 11:18 PM, Ted Yu yuzhih...@gmail.com wrote:

bq. write performance would be lower

The above means poorer performance.

bq. I could batch them up application side

Please do that.

bq. I guess there is no way to turn that off?

That's right.

On Mon, Apr 15, 2013 at 11:15 AM, Kireet kir...@feedly.com wrote:

Thanks for the reply. write performance would be lower - this means
better?

On 4/15/13 1:28 PM, Ted Yu wrote:

I assume you would select HBase 0.94.6.1 (the latest release) for this
project.

For #1, write performance would be lower if you choose to use Append
(vs.
using Put).

bq. Can appends be batched by the client or do they execute immediately?
This depends on your use case. Take a look at the following method in
HTable where you can send a list of actions (Appends):

public void batch(final List?extends Row actions, final Object[]
results)
For #2
bq. The other would be to prefix the timestamp row key with a random
leading byte.

This technique has been used elsewhere and is better than the first one.

Cheers

On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy
kireet-Teh5dPVPL8nQT0dZR+*
*a...@public.gmane.org
kireet-teh5dpvpl8nqt0dzr%2ba...@public.gmane.org
wrote:

I did see the previous posts on queues and the tips there - use
zookeeper
for coordination, schedule major compactions, etc. Sorry if these
questions
are basic, I am pretty new to hbase. Thanks!

Re: Overwrite a row

2013-04-20 Thread Ted Yu

Here is code from 0.94 code base:

  public void mutateRow(final RowMutations rm) throws IOException {
new ServerCallableVoid(connection, tableName, rm.getRow(),
operationTimeout) {
  public Void call() throws IOException {
server.mutateRow(location.getRegionInfo().getRegionName(), rm);
return null;

where RowMutations has the following check:

  private void internalAdd(Mutation m) throws IOException {
int res = Bytes.compareTo(this.row, m.getRow());
if(res != 0) {
  throw new IOException(The row in the recently added Put/Delete  +
  Bytes.toStringBinary(m.getRow()) +  doesn't match the original
one  +
  Bytes.toStringBinary(this.row));

This means you need to issue multiple mutateRow() calls for different rows.

I think you should consider the potential impact on performance due to this
limitation.

For advanced usage, take a look at MultiRowMutationEndpoint:

 * This class demonstrates how to implement atomic multi row transactions
using
 * {@link HRegion#mutateRowsWithLocks(java.util.Collection,
java.util.Collection)}
 * and Coprocessor endpoints.

Cheers

On Sat, Apr 20, 2013 at 10:11 AM, Kristoffer Sjögren sto...@gmail.comwrote:

 Just to absolutely be clear, is this also true for a batch that span
 multiple rows?


 On Sat, Apr 20, 2013 at 2:42 PM, Ted Yu yuzhih...@gmail.com wrote:

  Operations within each batch are atomic.
  They would either all succeed or all fail.
 
  Time stamps would all refer to the latest cell (KeyVal).
 
  Cheers
 
  On Apr 20, 2013, at 12:17 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
   The schema is known beforehand so this is exactly what I need. Great!
  
   One more question. What guarantees does the batch operation have? Are
 the
   operations contained within each batch atomic? I.e. all mutations will
 be
   given the same timestamp? If something fails, all operation fail or can
  it
   fail partially?
  
   Thanks for your help, much appreciated.
  
   Cheers,
   -Kristoffer
  
  
   On Sat, Apr 20, 2013 at 4:47 AM, Ted Yu yuzhih...@gmail.com wrote:
  
   I don't know details about Kristoffer's schema.
   If all the column qualifiers are known a priori, mutateRow() should
  serve
   his needs.
  
   HBase allows arbitrary number of columns in a column family. If the
  schema
   is dynamic, mutateRow() wouldn't suffice.
   If the column qualifiers are known but the row is very wide (and a few
   columns are updated per call), performance would degrade.
  
   Just some factors to consider.
  
   Cheers
  
   On Fri, Apr 19, 2013 at 1:41 PM, Mohamed Ibrahim 
 mibra...@mibrahim.net
   wrote:
  
   Actually I do see it in the 0.94 JavaDocs (
  
 
 http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
   ),
   so may be it was added in 0.94.6 even though the jira says fixed in
  0.95
   .
   I haven't used it though, but it seems that's what you're looking
 for.
  
   Sorry for confusion.
  
   Mohamed
  
  
   On Fri, Apr 19, 2013 at 4:35 PM, Mohamed Ibrahim 
  mibra...@mibrahim.net
   wrote:
  
   It seems that 0.95 is not released yet, mutateRow won't be a
 solution
   for
   now. I saw it in the downloads and I thought it was released.
  
  
   On Fri, Apr 19, 2013 at 4:18 PM, Mohamed Ibrahim 
   mibra...@mibrahim.net
   wrote:
  
   Just noticed you want to delete as well. I think that's supported
   since
   0.95 in mutateRow (
  
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)
   ).
   You can do multiple puts and deletes and they will be performed
   atomically.
   So you can remove qualifiers and put new ones.
  
   Mohamed
  
  
   On Fri, Apr 19, 2013 at 3:44 PM, Kristoffer Sjögren 
  sto...@gmail.com
   wrote:
  
   What would you suggest? I want the operation to be atomic.
  
  
   On Fri, Apr 19, 2013 at 8:32 PM, Ted Yu yuzhih...@gmail.com
  wrote:
  
   What is the maximum number of versions do you allow for the
   underlying
   table ?
  
   Thanks
  
   On Fri, Apr 19, 2013 at 10:53 AM, Kristoffer Sjögren 
   sto...@gmail.com
   wrote:
  
   Hi
  
   Is it possible to completely overwrite/replace a row in a single
   _atomic_
   action? Already existing columns and qualifiers should be
 removed
   if
   they
   do not exist in the data inserted into the row.
  
   The only way to do this is to first delete the row then insert
   new
   data
   in
   its place, correct? Or is there an operation to do this?
  
   Cheers,
   -Kristoffer

Re: default region splitting on which value?

2013-04-20 Thread Ted Yu

Thanks for sharing the information below.

How do you plan to store time (when the bus gets to each stop) in the row ?
Or maybe it is not of importance to you ?

On Sat, Apr 20, 2013 at 2:24 PM, Pal Konyves paul.kony...@gmail.com wrote:

So long story short, I generate the data. I want to use the HBase java
client api to store the rows with Put. I plan to randomize it by picking
random first_stop_id-s, and use more threads.

On Sat, Apr 20, 2013 at 10:54 PM, Ted Yu yuzhih...@gmail.com wrote:

The answer to your first question is yes - midkey of the key range would
be chosen as split key.

For #2, can you tell us how you plan to randomize the loading ?
Bulk load normally means preparing HFiles which would be loaded directly
into your table.

Cheers

On Apr 20, 2013, at 1:11 PM, Pal Konyves paul.kony...@gmail.com wrote:

Hi Ted,
Only one family, my data is very simple key-value, although I want to
make
sequential scan, so making a hash of the key is not an option.

On Sat, Apr 20, 2013 at 10:07 PM, Ted Yu yuzhih...@gmail.com wrote:

How many column families do you have ?

For #3, per-splitting table at the row keys corresponding to peaks
makes
sense.

On Apr 20, 2013, at 10:52 AM, Pal Konyves paul.kony...@gmail.com
wrote:

Hi,

I am just reading about region splitting. By default - as I
understand
-
Hbase handles splitting the regions. I just don't know how to imagine
on
which key it splits the regions.

1) For example when I write MD5 hash of rowkeys, they are most
probably
evenly distributed from
00... to F... right? When Hbase starts with one region, all
the
writes goes into that region, and when the HFile get's too big, it
just
gets for example the median value of the stored keys, and split the
region
by this?

http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
)
So I thought I would pre-split the table into regions, and load the
data
randomized. This way I will get good distribution among region
servers
in
terms of network IO from the beginning. Is that a good idea?

3) If my rowkeys are not evenly distributed in the keyspace, but they
show
some peaks or bursts. e.g. 000-999, but most of the keys gather
around
020
and 060 values, is it a good idea to have the pre region splits at
those
peaks?

Thanks in advance,
Pal

Re: Overwrite a row

Re: RefGuide schema design examples

Re: Slow region server recoveries

Re: Overwrite a row

Re: talk list table

hbase + mapreduce

Re: Overwrite a row

Re: Slow region server recoveries

default region splitting on which value?

Re: Slow region server recoveries

Re: default region splitting on which value?

Re: default region splitting on which value?

Re: default region splitting on which value?

Re: default region splitting on which value?

Re: talk list table

Re: Overwrite a row

Re: default region splitting on which value?

17 matches

Site Navigation

Mail list logo

Footer information